You're streaming Persona 5 (Japanese) from your PC to your Mac via Moonlight/Sunshine/Apollo. The game has Japanese subtitles. You want to read them in English — in real-time, overlaid on the stream.
Your proposed approach: use Vision AI to read the Japanese text from the bottom of the screen and render an English translation over the top. Let's assess five different methods to achieve this, compare costs and feasibility, and give an honest verdict.
Send a screenshot of the subtitle region to Claude's vision API. Claude reads the Japanese text and translates it in a single API call — no separate OCR step needed. The model understands context (it knows it's a game UI), which improves translation quality.
Cost breakdown: Subtitle crop is ~800x200px = ~213 image tokens. With prompt overhead, each call costs ~$0.00056. At 600 calls/hour (10/min) = $0.34/hour. A 40-hour playthrough costs about $13.60.
Sonnet 4.6 alternative: $1.02/hour (~$41 for a full playthrough). Better quality but overkill for subtitle OCR.
Instead of reading the screen, listen to the audio. Use OpenAI's Whisper API ($0.006/min) to transcribe the Japanese voice acting, then translate with Claude Haiku.
The problem: Game audio mixes dialogue with music, sound effects, and ambient noise. Whisper's accuracy drops significantly with background audio. You'd need to isolate the dialogue channel — but Persona 5 doesn't output dialogue on a separate audio channel.
Note: Claude's API does not support audio input as of March 2026. This requires Whisper or GPT-4o Mini Transcribe ($0.003/min) for speech-to-text.
Google Cloud Vision ($1.50/1k images) for Japanese OCR, then Google Translate ($20/M characters) for translation. Two separate API calls per subtitle.
Cost: 600 images/hour x $0.0015 = $0.90/hour for OCR alone. A full playthrough costs ~$36. That's 2.6x more expensive than Claude Haiku, with worse latency and no context understanding.
Free tier: First 1,000 images/month free. So your first ~1.5 hours each month are free.
Apple ships two powerful on-device frameworks: VNRecognizeTextRequest (Vision, OCR with Japanese support since macOS 14.4) and Apple Translation (on-device JP→EN since macOS 15). Both run on Apple Silicon's Neural Engine — zero API cost, zero internet needed.
You'd build a small Swift app using ScreenCaptureKit to capture the Moonlight window's subtitle region, run OCR, translate, and render an overlay via a transparent NSWindow.
The catch: On-device OCR may struggle with Persona 5's stylized dialogue fonts. Translation quality is decent but less nuanced than Claude for game-specific slang and cultural references.
Persona 5 Royal is available in English on every platform — Steam, PS4/PS5, Switch, Xbox, Game Pass. The English version includes full English text with the option to keep Japanese voice acting (dual audio). Frequently on sale for ~$20.
If you're streaming from a PC via Sunshine, just install the English version on Steam. If from PS4/PS5, change the game language in system settings.
Wait — you can have Japanese voices with English subtitles?
Yes. Persona 5 Royal has a free "Japanese Voice" DLC that gives you the original Japanese voice acting with English text. This is literally the exact experience you're trying to build with AI — and it's built into the game.
| Method | Cost/Hour | Latency | Accuracy | Setup |
|---|---|---|---|---|
| English Version (Non-AI) | $0 | 0ms | Perfect | None |
| Apple Vision + Translation | $0.00 | 100–300ms | Good | Swift app |
| Claude Haiku 4.5 Vision | $0.34 | 0.8–1.5s | Very Good | Script + API key |
| Whisper + Translation | $0.36–0.54 | 1–2s | Fair | Audio capture + APIs |
| Google Vision + Translate | $0.90 | 1–3s | Good | Script + API keys |
Moonlight renders decoded video to a macOS window. Use ScreenCaptureKit (macOS 13+) to capture the specific window and crop the subtitle region. The deprecated CGWindowListCreateImage was removed in macOS 15, so ScreenCaptureKit is the only path.
Don't send every frame. Use perceptual hashing on the subtitle region to detect when text actually changes. This reduces API calls from 1,800/min (30fps) to ~10/min — a 99.4% cost reduction.
Create a transparent NSWindow with level = .floating and backgroundColor = .clear positioned over the Moonlight window. Render translated text with CATextLayer. The window is click-through so it doesn't interfere with gameplay.
This is a cool idea, but you probably don't need to build it.
Persona 5 Royal has a free Japanese Voice DLC that gives you exactly what you want — Japanese voice acting with English subtitles — with zero setup and perfect accuracy.
But if you want to build it anyway (for other Japanese games, or for fun), here's the path:
One more thing. If the real goal is to experience the original Japanese script (not Atlus's English localization, which takes creative liberties), that's a legitimate reason the English version doesn't fully solve. An AI translation would be more literal. In that case, Claude Haiku is your best bet — it understands nuance and can be prompted to preserve the original meaning.