● Live312 units online · TKO · SFO · FRA

Run Llama 405B locally.
$1,499 a month.
Not $18,000.

Rent a bare-metal Mac Studio M3 Ultra with 512 GB unified memory — the only consumer-class machine that holds Llama 3.1 405B, DeepSeek-V3, and Kimi-K2 1T MoE entirely in memory. SSH-ready in 90 seconds. Yours alone.

312 machines online4 regions · 24/7SOC 2 · ISO 27001
satoshi@kizai-tko-04 — ssh — 120×34
connected · 11 ms
# Connect over Tailscale — no public IPsatoshi@local ~ ssh kizai-tko-04 Welcome to macOS 15.4 · M3 Ultra · 512 GB satoshi@kizai-tko-04 ~ mlx_lm.generate \ --model meta-llama/Llama-3.1-405B-Instruct \ --prompt "Explain MoE routing in one paragraph" \ --max-tokens 512 Loading weights … 230.4 GB into UMA … done (8.2 s)Mixture-of-Experts routing assigns each token to a small subset ofspecialized sub-networks via a learned gating function, so totalparameter count grows without proportionally raising compute
In production at AI teams from research labs to YC startups
YAMABIKO LABS
stanza/ai
DRIFT.RESEARCH
Forerunner
naoroku
ATLAS·M
How we compare

Same frontier models.
About 1/12 the monthly cost.

For memory-bound inference (MoE, long context, 70B+ dense), Apple Silicon wins on $/token and $/GB — and you get a dedicated machine, not a slot in someone's pool.

AI KIZAI · M3 Ultra 512 GB1× H100 80 GB (cloud)8× H100 cluster
Usable memory for weights512 GB UMA80 GB HBM3640 GB (sharded)
Fits Llama 3.1 405B (4-bit)✓ comfortably✓ (with vLLM TP-8)
Fits Kimi-K2 1T MoE✓ (Q3)tight
List price$1,499 / month~$2.49 / hr ≈ $1,800/mo~$20–32 / hr ≈ $18k/mo
Dedicated hardwarealways — bare metalshared instancemulti-tenant pool
Provision time90 seconds~3 min (when available)quota wait, days
Local dev parityidentical to your MacBookLinux + CUDALinux + CUDA
Idle cost$0 (flat rate)full hourly ratefull hourly rate
What you can run

Frontier open-weights
that actually fit.

Measured throughput on a 512 GB M3 Ultra running MLX with KV cache and a 4k context window. All weights stay resident in unified memory — no sharding, no offload.

M3 Ultra · 512 GB UMA · 800 GB/s$1,499 / month · 8 TB SSD · macOS 15
Llama 3.1 405BMLX 4-bit · 230 GB
12 t/sfits
DeepSeek-V3 671BMoE 4-bit · 380 GB
8 t/sfits
Kimi-K2 1T MoEQ3 · 480 GB
6 t/stight
Llama 4 BehemothMLX 4-bit · 460 GB
5 t/stight
Qwen 2.5 72B × 4parallel agents
parallelfits
Pricing

One machine. One price.

Every plan includes the same hardware. Pick the billing cycle that works for your team.

Mac Studio M3 Ultra · 512 GB
M3 Ultra32-core CPU80-core GPU512 GB UMA8 TB SSDmacOS 15
Billing cycle
Included in every plan
Dedicated Mac Studio M3 Ultra
512 GB unified memory
8 TB NVMe SSD
Tailscale private mesh
Full root / sudo access
macOS 15 Sequoia
24/7 monitoring & alerts
Snapshot backups on exit
Total
$1,499/month
Cancel before next cycle — no penalty
Continue to checkout
SOC 2 Type IIStripe-secured billing4 regions · 99.9% SLA
The machine

Bare-metal Apple Silicon.
No virtualization tax.

512GB UMA
Holds 405B models in one piece.
No sharding, no offload — every weight stays resident in unified memory.
800GB/s
2× faster tokens than an H100.
More effective bandwidth than HBM3 for memory-bound inference.
80cores
Fine-tune locally with MLX + LoRA.
Neural Engine + Metal MPS, optimized for Apple's MLX framework.
90seconds
SSH-ready from card swipe.
Fully automated provisioning. No tickets, no waitlist, no quota.
How it works

SSH into your machine
in under two minutes.

Four steps from card swipe to running a 405B model. No tickets, no infra team, no cold start.

$ kizai rent --m3-ultra-512
01

Pick a billing cycle

Weekly, monthly, or 3 months. Same hardware. Same SLA. Longer commitments unlock a small discount.

✓ paid · stripe
02

Pay with Stripe

USD, EUR, JPY, GBP. Cards, Apple Pay, Google Pay, ACH, bank transfer. Receipts via email.

provisioning… 47 s
03

Auto-provision

We boot a fresh Mac Studio, install Tailscale, mount your home disk, and send you keys.

$ ssh kizai-tko-04
04

Connect and build

SSH, VNC, or macOS Screen Sharing over a private mesh. Full root. Run whatever you want.

Network

Four regions live.
Two more this quarter.

Pick a region at checkout. Swap free, mid-cycle. All units sit on your private Tailnet — no public IP, no port forwarding, no inbound exposure.

TKOonline
Tokyo
142 units · AS · 11 ms RTT
OSAonline
Osaka
48 units · AS · 18 ms RTT
SFOonline
San Francisco
94 units · NA · 88 ms RTT
FRAonline
Frankfurt
28 units · EU · 232 ms RTT
SINQ3 2026
Singapore
Q3 2026
IADQ3 2026
Virginia
Q3 2026
Use cases

One machine, every workload
that needs a lot of memory.

Run frontier open-weights locally

Llama 3.1 405B, DeepSeek-V3, Kimi-K2 1T MoE — all fit in UMA without sharding.

INFERENCE

Private RAG over sensitive data

Index TBs of internal docs on a machine in your Tailnet. Weights never leave your subscription.

ENTERPRISE

MLX fine-tuning + LoRA

Fine-tune 7B–70B models with native Metal. Apple's MLX framework + your own LoRA recipes.

RESEARCH

Agent harnesses & long contexts

Keep 200k-token KV caches in memory and serve agents 24/7. No cold starts, no eviction.

AGENTS

ComfyUI & image/video pipelines

SDXL, Flux, AnimateDiff, HunyuanVideo on Metal. Render farms over Tailscale.

GENERATIVE

Xcode CI for App Store builds

Dedicated build runner with Apple notarization access. Faster than any cloud Mac.

iOS / macOS
Questions

Frequently asked.

A dedicated, bare-metal Mac Studio M3 Ultra with 512 GB unified memory, 8 TB SSD, and macOS 15. It is physically yours for the duration of your subscription — no other tenants, no virtualization, no noisy neighbors.
Ready when you are

Your private supercomputer.
Ninety seconds away.

Spin up a Mac Studio M3 Ultra 512 GB in Tokyo, San Francisco, or Frankfurt. Cancel any time — no GPU quotas, no waitlists.