I've been building an app that uses Claude for two distinct jobs: tagging wardrobe items from photos, and generating daily outfit suggestions. The naive approach — call a capable model for everything, every time — would cost somewhere between "unsustainable" and "immediately bankrupt." The approach I ended up with costs almost nothing at scale. Here's the pattern.
The observation
There are two kinds of AI work in most apps:
- Expensive perception work — understanding an image, parsing a messy document, extracting structured data from unstructured input. This needs a capable model. You can't cheap out here.
- Cheap reasoning work — given clean structured data, make a decision or recommendation. A smaller model handles this fine.
The pattern: do the expensive work once, store the result, run the cheap work against the stored result forever.
Applied to the wardrobe app
When a user uploads a photo of a shirt, I send it to
claude-sonnet-4-6 with a forced tool call that returns
structured tags: color, fabric, category, formality, season, a short
description. That call costs real money. It happens once per item, ever.
The tags go into the database.
When generating outfit suggestions, I never look at images again. I pull
the text tags for all items, assemble a prompt, and call
claude-haiku-4-5 — dramatically cheaper — to pick
combinations and write the rationale. The model is reasoning over text,
not perceiving images. It's very good at that for much less money.
Prompt caching on top of that
The user's wardrobe doesn't change between breakfast and lunch. So I generate suggestions once per day and cache them in the database. Most app opens hit zero model calls — just a database read.
For users whose wardrobe hasn't changed, the suggestion prompt is nearly identical day over day. Claude's prompt caching means repeated similar prompts are heavily discounted.
The numbers roughly
Tagging 50 items: ~50 vision calls at Sonnet pricing. One-time cost per user, paid at onboarding pace.
Daily suggestions: 1 Haiku call per day, cached result serves all opens. With prompt caching on wardrobe-heavy users, marginal cost approaches zero.
Roughly 90% cost reduction versus calling Sonnet for everything, every time.
The general pattern
- Identify which AI work is perception (unstructured → structured) and which is reasoning (structured → decision).
- Use your best model for perception. Accept the cost — it's a one-time extraction.
- Store the structured result durably. This is your asset.
- Use a cheaper model for reasoning against that result.
- Cache aggressively at the application layer. Most AI calls in a mature app should be skippable.
Where it breaks down
If the structured extraction can't capture what downstream reasoning needs, you're stuck calling the big model at reasoning time too. This happens when the interesting signal is visual and can't be expressed in text — fine-grained texture, exact color matching, fit details.
Test the quality of your extraction early. If it's lossy, the downstream reasoning will be wrong in ways that are hard to debug.