Izwi icon

Izwi

Local audio inference engine for speech, transcription, and voice cloning

Free Text to SpeechTranscription Open Source

Overview

Izwi is a complete audio inference engine that runs entirely on your Mac with no cloud dependency. Built in Rust with Metal acceleration for Apple Silicon, it delivers text-to-speech, speech recognition, voice cloning, voice design, conversational AI, and speaker diarization — all processed locally with zero external dependencies. Izwi achieves 10x faster performance than cloud APIs with under 50ms first token latency. It includes both a desktop GUI and a production-ready API server with OpenAI-compatible endpoints, making it suitable for both personal use and local development workflows. Currently in alpha and open-source under the Apache 2.0 license, Izwi is free forever with no API keys or internet connection required.

Architecture: Apple Silicon, Intel

Key Features

  • Text-to-speech with multiple voices and pitch/speed control
  • Speech recognition with word-level timestamps and multi-language support
  • Voice cloning from just seconds of audio
  • Voice design from text descriptions
  • Conversational AI with real-time dialogue
  • Speaker diarization for multi-speaker audio separation
  • OpenAI-compatible API endpoints for local development
  • Rust-native with Metal GPU acceleration on Apple Silicon
  • 10x faster than cloud APIs with under 50ms first token latency
  • Desktop GUI and production API server modes
  • Zero external dependencies — no internet or API keys required
  • 100% local processing for complete privacy

Tags

voice synthesistranscriptionvoice input