HearItServer: Your Offline TTS Server for Local Speech Synthesis

Nowadays AI-driven text-to-speech (TTS) solutions are dominated by cloud-based APIs, HearItServer emerges as a powerful alternative, bringing blazing-fast speech synthesis to local machines. Built on top of Kokoro-ONNX, the fastest and most efficient open-source TTS model, HearItServer provides developers with a ready-to-use, high-performance text-to-speech solution that can seamlessly integrate into their applications, enabling offline speech synthesis without requiring an internet connection. I built HearItServer as a core component of a larger project I'm working on at the moment, a tool designed to help users read books, documents, and other text-based content faster and more efficiently. My goal is to develop an app that enables users to consume more books while making reading more engaging, all offline. HearItServer powers the offline TTS functionality of this project, but I realized it could also be useful to developers looking for a lightweight, private, and fast text-to-speech solution. So, I decided to make it free and open for others to build on. If you need real-time speech synthesis without latency, data privacy concerns, or API rate limits, this is the ultimate local TTS solution. Why Use HearItServer? Unlike traditional TTS services that require online APIs, HearItServer is designed to run entirely on your local machine. This means: ✅ Lightning-Fast Inference – Thanks to Kokoro-ONNX, the inference is optimized for speed. ✅ Privacy-Preserving – No data is sent to external servers, making it ideal for secure environments. ✅ Fully Offline – No need for API keys or internet connectivity. ✅ Easy Integration into any application – Exposes a simple REST API for seamless integration into any application you built. How It Works HearItServer is essentially a lightweight Flask-based REST API that hosts Kokoro-ONNX, allowing any application to send text and receive high-quality, natural-sounding speech in response. This makes it incredibly easy to integrate into desktop applications, automation workflows, and AI assistants. Setting Up HearItServer 1️⃣ Install HearIt Download and install the HearItServer application on your machine. Once installed, launch it, and a menu bar icon will appear on macOS. 2️⃣ Start the TTS Server Click on the menu icon and select "Start TTS Server". The server will now be running locally at: http://localhost:7008 Using the API (100% local) The HearItServer provides a simple API endpoint to generate speech from text. Endpoint: POST http://localhost:7008/v1/audio/speech Request Body (JSON): { "text": "Hello, this is a test message!", "voice": "af_sarah", "speed": 1.0, "lang": "en-us" } Available Voices: af_sarah af_bella af_nicole af_sky am_adam am_michael bf_emma bf_isabella bm_george bm_lewis Response: Success: A .wav file is returned as a binary response. Error: A JSON object containing an error message. Example: Using HearItServer in TypeScript To integrate HearIt into your application, you can send requests using TypeScript and Axios: import axios from 'axios'; import * as fs from 'fs'; const url = "http://localhost:7008/v1/audio/speech"; const headers = { "Content-Type": "application/json" }; const data = { text: "Hello, world!", voice: "af_sarah", speed: 1.0, lang: "en-us" }; axios.post(url, data, { responseType: 'arraybuffer' }) .then(response => { fs.writeFileSync("output.wav", Buffer.from(response.data)); console.log("Audio saved as output.wav"); }) .catch(error => { console.error("Error:", error.response ? error.response.data : error.message); }); This script sends a request to the local TTS server, receives the audio response, and saves it as a .wav file. Stopping the TTS Server Click on the menu bar icon. Select "Stop TTS Server" to terminate the service. Build Anything with Local TTS The beauty of HearItServer is its flexibility, it provides a universal interface for local TTS inference, meaning anyone can build applications on top of it! Some potential use cases include:

Jan 19, 2025 - 20:54
HearItServer: Your Offline TTS Server for Local Speech Synthesis

Nowadays AI-driven text-to-speech (TTS) solutions are dominated by cloud-based APIs, HearItServer emerges as a powerful alternative, bringing blazing-fast speech synthesis to local machines. Built on top of Kokoro-ONNX, the fastest and most efficient open-source TTS model, HearItServer provides developers with a ready-to-use, high-performance text-to-speech solution that can seamlessly integrate into their applications, enabling offline speech synthesis without requiring an internet connection.

I built HearItServer as a core component of a larger project I'm working on at the moment, a tool designed to help users read books, documents, and other text-based content faster and more efficiently. My goal is to develop an app that enables users to consume more books while making reading more engaging, all offline. HearItServer powers the offline TTS functionality of this project, but I realized it could also be useful to developers looking for a lightweight, private, and fast text-to-speech solution. So, I decided to make it free and open for others to build on.

If you need real-time speech synthesis without latency, data privacy concerns, or API rate limits, this is the ultimate local TTS solution.

Why Use HearItServer?

Unlike traditional TTS services that require online APIs, HearItServer is designed to run entirely on your local machine. This means:

Lightning-Fast Inference – Thanks to Kokoro-ONNX, the inference is optimized for speed.

Privacy-Preserving – No data is sent to external servers, making it ideal for secure environments.

Fully Offline – No need for API keys or internet connectivity.

Easy Integration into any application – Exposes a simple REST API for seamless integration into any application you built.

How It Works

HearItServer is essentially a lightweight Flask-based REST API that hosts Kokoro-ONNX, allowing any application to send text and receive high-quality, natural-sounding speech in response. This makes it incredibly easy to integrate into desktop applications, automation workflows, and AI assistants.

Setting Up HearItServer

1️⃣ Install HearIt

Download and install the HearItServer application on your machine. Once installed, launch it, and a menu bar icon will appear on macOS.

System menu showing options for HearItServer:

2️⃣ Start the TTS Server

Click on the menu icon and select "Start TTS Server". The server will now be running locally at:

http://localhost:7008

Using the API (100% local)

The HearItServer provides a simple API endpoint to generate speech from text.

Endpoint:

POST http://localhost:7008/v1/audio/speech

Request Body (JSON):

{
  "text": "Hello, this is a test message!",
  "voice": "af_sarah",
  "speed": 1.0,
  "lang": "en-us"
}

Available Voices:

  • af_sarah

  • af_bella

  • af_nicole

  • af_sky

  • am_adam

  • am_michael

  • bf_emma

  • bf_isabella

  • bm_george

  • bm_lewis

Response:

  • Success: A .wav file is returned as a binary response.

  • Error: A JSON object containing an error message.

Example: Using HearItServer in TypeScript

To integrate HearIt into your application, you can send requests using TypeScript and Axios:

import axios from 'axios';
import * as fs from 'fs';

const url = "http://localhost:7008/v1/audio/speech";
const headers = { "Content-Type": "application/json" };
const data = {
    text: "Hello, world!",
    voice: "af_sarah",
    speed: 1.0,
    lang: "en-us"
};

axios.post(url, data, { responseType: 'arraybuffer' })
    .then(response => {
        fs.writeFileSync("output.wav", Buffer.from(response.data));
        console.log("Audio saved as output.wav");
    })
    .catch(error => {
        console.error("Error:", error.response ? error.response.data : error.message);
    });

This script sends a request to the local TTS server, receives the audio response, and saves it as a .wav file.

Stopping the TTS Server

  • Click on the menu bar icon.

  • Select "Stop TTS Server" to terminate the service.

Build Anything with Local TTS

The beauty of HearItServer is its flexibility, it provides a universal interface for local TTS inference, meaning anyone can build applications on top of it! Some potential use cases include: