Context window
The max input + output tokens an LLM can handle in a single request. As of 2026, state-of-the-art models handle 200K–2M tokens.
As of 2026, Claude Sonnet handles 200K tokens, Claude Opus 1M, and Gemini 2M. SumTube processes 30K–60K-token transcripts per request — well within capacity — but truncates extremely long talks to keep inference cost bounded.