Mac supports on-device and cloud AI models in latest Osaurus update

As large language models become widely available, a new tier of software is emerging to manage them. Osaurus, an open-source server designed for macOS, promises to let users swap freely between local and cloud models while keeping data and tools on their own machines — a shift with implications for privacy, cost and cloud demand.

Developed from a desktop-assistant experiment, Osaurus aims to bring model orchestration to everyday users rather than only to developers. Its makers say the goal is simple: give people control over which models do what, and where sensitive files stay.

From personal assistant prototype to a model “harness”

The project began when its co-founder, Terence Pae — who previously worked as an engineer at Tesla and Netflix — explored building a Mac-native assistant. Early testers pushed him to rethink a cloud-dependent business model, prompting a pivot toward software that can run and coordinate models on a user’s own hardware.

Launched publicly and developed as open source, Osaurus now acts as a control layer connecting models, plugins and local tools. That puts it in the same category as other orchestration systems, but its interface and security design aim to be friendlier for non-technical users.

What Osaurus does for users

At its core, Osaurus routes requests to whichever model the user chooses — a local model hosted on the Mac or one running in the cloud. It also keeps the models’ memory, files and tool access on the user’s device if they prefer, rather than sending everything to remote servers.

Two practical benefits stand out: you can pick the model best suited to a task — for example, one that excels at code generation versus one tuned for creative writing — and you can limit what data leaves your machine.

Local-first operation: Run models on your Mac or connect to cloud providers.

Hardware-isolated sandboxing: Processes run in a virtual, restricted environment to reduce attack surface.

MCP-compatible: Works as a Model Context Protocol server so other clients can use the same tools and context.

Plug-ins and integrations: Native connectors for mail, calendar, browser access, filesystem, Git and document formats.

Supported models and connections

Osaurus supports a broad set of models and services, allowing users to mix local and cloud options depending on performance and privacy needs. Examples include:

Local models: Llama variants, GPT-OSS, DeepSeek V4, MiniMax M2.5, Gemma 4, Qwen 3.6

On-device frameworks: Apple’s on-device foundation models and Liquid AI’s LFM family

Cloud connectors: OpenAI, Anthropic, Google’s Gemini, xAI/Grok, Venice AI, OpenRouter and others

The app also ships with more than 20 built-in plugins for common workflows — from spreadsheets and slides to vision and audio — and recently added voice interaction capabilities to expand hands-free use.

Performance limits and the path ahead

Running advanced models locally remains hardware-intensive. Pae recommends at least 64GB of RAM for many models, and around 128GB for larger ones such as DeepSeek V4. That requirement narrows the current audience to users with high-end machines, though he argues the efficiency of local inference is improving rapidly.

In practical terms, local model energy use and responsiveness are improving on their own innovation curve. As on-device models grow more capable per watt of compute, the argument for running private AI at the edge becomes stronger — especially for users and organizations prioritizing data control.

Adoption, competition and commercial potential

Since its public release almost a year ago, Osaurus has recorded more than 112,000 downloads. It competes with tools such as Ollama, LM Studio and others that let users host models locally, but it differentiates itself by focusing on ease of use and consumer-oriented safety features rather than exposing raw developer tooling.

The founding team, which includes Sam Yoo, is taking part in the Alliance accelerator in New York and is exploring enterprise paths. Sectors like healthcare or law — where confidentiality is essential — are natural targets for on-prem deployments that minimize cloud exposure.

Backers of local-first AI also argue for a secondary macro effect: if more inference shifts to end-user devices, demand for centralized AI data-center capacity could moderate, with potential energy and cost implications across the industry.

What to watch next

Osaurus exemplifies a broader trend: as base models become commodities, value moves to orchestration, integration and the user interface. Key indicators to follow include reductions in hardware requirements, growth in on-device model quality, and enterprise adoption in privacy-sensitive fields.

For individuals weighing whether to run models locally, the trade-offs remain clear: stronger control and potentially lower long-term costs versus higher upfront hardware needs. Osaurus is betting that, over time, those scales will tip in favor of private, device-based AI.