Feasibility Analysis

Real-Time JP→EN Subtitles
for Persona 5

Japanese version streamed via Moonlight / Sunshine / Apollo on Mac
Prepared for Thaddeus

The Idea

You're streaming Persona 5 (Japanese) from your PC to your Mac via Moonlight/Sunshine/Apollo. The game has Japanese subtitles. You want to read them in English — in real-time, overlaid on the stream.

Your proposed approach: use Vision AI to read the Japanese text from the bottom of the screen and render an English translation over the top. Let's assess five different methods to achieve this, compare costs and feasibility, and give an honest verdict.

The pipeline for any AI approach: Moonlight StreamFrame CaptureChange DetectionOCR + TranslateOverlay ~5ms ~1ms ~???ms ~5ms Subtitle region: ~800x200px crop from bottom of screen Change rate: ~10 new subtitles per minute during dialogue Budget: subtitle stays on screen 2-5 seconds, so API must respond within that

Five Methods Compared

2. Speech-to-Text (Whisper) + Translation
Complex
Cost / Hour
~$0.36–0.54
Latency
1–2s
Accuracy
Fair
Complexity
High

Instead of reading the screen, listen to the audio. Use OpenAI's Whisper API ($0.006/min) to transcribe the Japanese voice acting, then translate with Claude Haiku.

The problem: Game audio mixes dialogue with music, sound effects, and ambient noise. Whisper's accuracy drops significantly with background audio. You'd need to isolate the dialogue channel — but Persona 5 doesn't output dialogue on a separate audio channel.

Note: Claude's API does not support audio input as of March 2026. This requires Whisper or GPT-4o Mini Transcribe ($0.003/min) for speech-to-text.

Pros
  • Translates spoken dialogue too
  • Works even without subtitles on screen
  • Captures tone and emotion
Cons
  • Audio mixing ruins accuracy
  • Two API calls per segment
  • Needs audio capture setup (BlackHole)
  • Higher latency than vision
3. Google Cloud Vision OCR + Translate
Expensive
Cost / Hour
~$0.90
Latency
1–3s
Accuracy
Good
Complexity
Medium

Google Cloud Vision ($1.50/1k images) for Japanese OCR, then Google Translate ($20/M characters) for translation. Two separate API calls per subtitle.

Cost: 600 images/hour x $0.0015 = $0.90/hour for OCR alone. A full playthrough costs ~$36. That's 2.6x more expensive than Claude Haiku, with worse latency and no context understanding.

Free tier: First 1,000 images/month free. So your first ~1.5 hours each month are free.

Pros
  • Mature, well-documented APIs
  • Google Translate is solid for JP→EN
  • Small free tier each month
Cons
  • 2.6x more expensive than Claude
  • Two API calls = more latency
  • OCR is context-blind
  • ~$36 for a full playthrough
4. Apple Vision + Translation (On-Device)
Free & Fastest
Cost / Hour
$0.00
Latency
100–300ms
Accuracy
Good
Complexity
Medium

Apple ships two powerful on-device frameworks: VNRecognizeTextRequest (Vision, OCR with Japanese support since macOS 14.4) and Apple Translation (on-device JP→EN since macOS 15). Both run on Apple Silicon's Neural Engine — zero API cost, zero internet needed.

You'd build a small Swift app using ScreenCaptureKit to capture the Moonlight window's subtitle region, run OCR, translate, and render an overlay via a transparent NSWindow.

The catch: On-device OCR may struggle with Persona 5's stylized dialogue fonts. Translation quality is decent but less nuanced than Claude for game-specific slang and cultural references.

Pros
  • Completely free, forever
  • 100–300ms total latency
  • No internet required
  • Runs on Apple Neural Engine
Cons
  • Requires building a Swift app
  • May struggle with game fonts
  • Translation less nuanced
  • macOS 15+ required
5. Just Play the English Version
The Real Answer
Cost
$0–20
Latency
0ms
Accuracy
Perfect
Complexity
None

Persona 5 Royal is available in English on every platform — Steam, PS4/PS5, Switch, Xbox, Game Pass. The English version includes full English text with the option to keep Japanese voice acting (dual audio). Frequently on sale for ~$20.

If you're streaming from a PC via Sunshine, just install the English version on Steam. If from PS4/PS5, change the game language in system settings.

Wait — you can have Japanese voices with English subtitles?

Yes. Persona 5 Royal has a free "Japanese Voice" DLC that gives you the original Japanese voice acting with English text. This is literally the exact experience you're trying to build with AI — and it's built into the game.

Head-to-Head Comparison

MethodCost/HourLatencyAccuracySetup
English Version (Non-AI)$00msPerfectNone
Apple Vision + Translation$0.00100–300msGoodSwift app
Claude Haiku 4.5 Vision$0.340.8–1.5sVery GoodScript + API key
Whisper + Translation$0.36–0.541–2sFairAudio capture + APIs
Google Vision + Translate$0.901–3sGoodScript + API keys

Technical Challenges

Frame Capture from Moonlight

Moonlight renders decoded video to a macOS window. Use ScreenCaptureKit (macOS 13+) to capture the specific window and crop the subtitle region. The deprecated CGWindowListCreateImage was removed in macOS 15, so ScreenCaptureKit is the only path.

Change Detection

Don't send every frame. Use perceptual hashing on the subtitle region to detect when text actually changes. This reduces API calls from 1,800/min (30fps) to ~10/min — a 99.4% cost reduction.

Overlay Rendering

Create a transparent NSWindow with level = .floating and backgroundColor = .clear positioned over the Moonlight window. Render translated text with CATextLayer. The window is click-through so it doesn't interfere with gameplay.

Full pipeline with Apple on-device stack: ScreenCaptureKitCrop subtitlepHash detectVNRecognizeTextTranslationOverlay ~5ms ~1ms ~1ms ~50-150ms ~50-100ms ~5ms Total: ~110-260ms — well within the 2-5 second subtitle window. Cost: $0.00

The Honest Verdict

This is a cool idea, but you probably don't need to build it.

Persona 5 Royal has a free Japanese Voice DLC that gives you exactly what you want — Japanese voice acting with English subtitles — with zero setup and perfect accuracy.

But if you want to build it anyway (for other Japanese games, or for fun), here's the path:

  1. Start with Apple Vision + Translation — it's free, fast, and runs entirely on your Mac. Build a Swift app using ScreenCaptureKit + VNRecognizeTextRequest + Translation framework.
  2. Add Claude Haiku as a fallback — when Apple's OCR fails on stylized fonts, fall back to Claude Vision. At $0.34/hour it's cheap insurance.
  3. Skip Google Vision and Whisper — Google is overpriced and Whisper can't cleanly separate dialogue from game audio.

One more thing. If the real goal is to experience the original Japanese script (not Atlus's English localization, which takes creative liberties), that's a legitimate reason the English version doesn't fully solve. An AI translation would be more literal. In that case, Claude Haiku is your best bet — it understands nuance and can be prompted to preserve the original meaning.