Prompt caching
Cache the shared prefix of an LLM prompt so subsequent calls run cheaper and faster.
Prompt caching rolled out across Anthropic, Google, and OpenAI from 2024 onwards. On cache hit, the fixed prefix (system prompt, tool definitions, long reference blocks) drops to about 1/10th the token cost. SumTube's 1,500-token system prompt is deliberately cacheable so repeated summaries drive unit cost toward the noise floor.