Lemonade Server
Local, multi-modal AI runtime — text, image, and speech on your own hardware
Overview
Lemonade is an open-source local AI runtime that runs text, image, and speech models entirely on your own machine, with no cloud dependency and no telemetry. It exposes OpenAI-compatible API endpoints so existing OpenAI SDK clients can point at it unchanged, and it can serve multiple models simultaneously — large chat models like gpt-oss-120b and Qwen-Coder-Next with 64k+ context, vision models for image analysis, image generation and editing, automatic speech recognition, and speech synthesis. Hardware is auto-detected and optimized on the fly, with specific tuning for AMD Ryzen AI, Radeon, and Strix Halo systems alongside standard macOS, Windows, and Linux support. A built-in control panel app makes it easy to browse, download, and swap models, and the whole thing ships as a portable binary under 10 MB for simple deployment. Free, open source, private by design.
Pricing: Free (open source)
Architecture: Apple Silicon, Intel
Key Features
- Local, private AI runtime — no cloud, no telemetry
- OpenAI-compatible API endpoints (drop-in for existing clients)
- Multi-modal: chat, vision, image generation, transcription, speech synthesis
- Serve multiple models simultaneously
- Support for large models like gpt-oss-120b and Qwen-Coder-Next
- Context windows up to 64k+ tokens
- Hardware auto-detection and optimization
- Tuned for AMD Ryzen AI, Radeon, and Strix Halo (also runs on macOS and Linux)
- Built-in control panel application
- Under 10 MB portable binary
- Open source