Lemonade Server icon

Lemonade Server

Local, multi-modal AI runtime — text, image, and speech on your own hardware

Free LLM Server Open Source

Overview

Lemonade is an open-source local AI runtime that runs text, image, and speech models entirely on your own machine, with no cloud dependency and no telemetry. It exposes OpenAI-compatible API endpoints so existing OpenAI SDK clients can point at it unchanged, and it can serve multiple models simultaneously — large chat models like gpt-oss-120b and Qwen-Coder-Next with 64k+ context, vision models for image analysis, image generation and editing, automatic speech recognition, and speech synthesis. Hardware is auto-detected and optimized on the fly, with specific tuning for AMD Ryzen AI, Radeon, and Strix Halo systems alongside standard macOS, Windows, and Linux support. A built-in control panel app makes it easy to browse, download, and swap models, and the whole thing ships as a portable binary under 10 MB for simple deployment. Free, open source, private by design.

Pricing: Free (open source)

Architecture: Apple Silicon, Intel

Key Features

  • Local, private AI runtime — no cloud, no telemetry
  • OpenAI-compatible API endpoints (drop-in for existing clients)
  • Multi-modal: chat, vision, image generation, transcription, speech synthesis
  • Serve multiple models simultaneously
  • Support for large models like gpt-oss-120b and Qwen-Coder-Next
  • Context windows up to 64k+ tokens
  • Hardware auto-detection and optimization
  • Tuned for AMD Ryzen AI, Radeon, and Strix Halo (also runs on macOS and Linux)
  • Built-in control panel application
  • Under 10 MB portable binary
  • Open source

Tags

chattext generationimage generationimage editingtranscriptionvoice synthesisdocument analysis