Foundation Models and How to Choose Them Like a Confident AI PM

Here is your complete guide to Foundation models: From basics to token economics to the Priority-Scale-Moat Framework for Strategic Decisions with real examples

Jul 15, 2025

∙ Paid

Welcome to Post #2 of From AI Beginner to AI Product Leader—a 12-part series designed to help curious PMs become confident AI Product Leaders.

When I worked on Alexa in the pre-foundation model era, adding a single new capability took months of specialized development. Want weather AND music recommendations? Two separate 6-month projects, each requiring thousands of training examples and dedicated engineering teams.

Today, Alexa+ handles complex multi-step requests like "Book me dinner at that Italian place we talked about last week, then order an Uber there for 7 PM"—all from one conversation using foundation models. GitHub Copilot reached $400M ARR. Notion successfully launched AI capabilities in weeks, not months. All by using foundation models strategically.

None of these companies succeeded by choosing the "best" foundation model. They applied three strategic dimensions that you must know as an AI PM, and we cover in this post.

We'll start with the fundamentals—what foundation models actually are and why they matter for product development—then move to the Priority/Scale/Moat framework that drives strategic model selection, complete with real case studies and strategic guidance for your specific situation.

By the end, you'll understand both the technology and the strategy—from foundation model basics to the framework that drives successful AI product decisions.

Time investment: 20-25 minutes for core framework, 40-45 minutes for complete post with premium content
Value: Everything you need to make strategic foundation model decisions in one comprehensive guide. Most guides cover theory—this covers real decisions using strategic frameworks and case studies

Today's Post

What's covered:

What are foundation models, and their impact on building products
Foundation model fundamentals: Access & Model size explained
Token economics and cost considerations with examples
The Foundation Model Selection Framework:Priority, Scale, and Moat dimensions
Real-world reliability considerations
Performance benchmarks
🔒Foundation Model Strategic Teardowns: TUI, Slack & Discord case studies
🔒Foundation Model Strategies for 10 Product Use Cases
🔒6 Ways to Cut Your AI Costs That AI PMs Must Know
🔒Access to model cost calculator

1. 🧱 What Are Foundation Models?

Foundation models are large-scale AI systems trained on massive datasets—often scraped from the entire internet. Unlike traditional AI systems that are built for a single, narrow task, foundation models can be adapted to many tasks without starting from scratch.

Instead of building custom AI for every use case, you start with a foundation model.

Popular examples of foundation models include OpenAI’s GPT-4, Anthropic’s Claude, Meta’s LLaMA, Google’s Gemini, and Mistral—each with different strengths in reasoning, speed, openness, or multilingual support.

1.1🤔 Why are they called “Foundation”?

Just like a building’s foundation can support many types of structures, foundation models support many types of AI applications: chatbots, writing assistants, code completion tools, legal document analysis—you name it.

The term "Foundation Model" was introduced by Stanford’s Center for Research on Foundation Models in 2021 to highlight this versatility.

Old world: 1 model = 1 task
New world: 1 model = many tasks

This adaptability is what makes them so powerful. A single model can analyze legal briefs and marketing copy because it has learned the deeper patterns of language, context, and reasoning.

1.2🧠 How Did We Get Here?

Through self-supervised learning—models train themselves by predicting what comes next in massive amounts of text, images, or code, without needing human-labeled examples.

This method unlocked scale. For e.g GPT-3 was trained on 500 billion tokens (about 10 human lifetimes of nonstop reading), enabling it to generalize across domains better than any narrow model.

By the way, it’s not just versatility—it’s speed too. Before foundation models: Building Smart Reply for Gmail took Google 18 months—and it only suggested three-word replies. After foundation models: Notion prototyped their AI writing assistant—capable of drafting entire documents, summaries, and more—in just one week after gaining access to GPT-4, launching publicly within months instead of years.

1.3🤖 LLMs vs. Foundation Models: What’s the Difference?

Foundation models are the broad category. Large Language Models (LLMs) are a specific subset of foundation models that focus solely on language—tasks like summarizing, writing, translating, or reasoning with text.

This matters because product teams often say, “We need an LLM,” when what they actually need is something more specialized. For example, generating product imagery or understanding screenshots? That’s a job for a vision model—not an LLM. Understanding these distinctions helps you pick the right model for the job, set more accurate expectations, and build smarter, faster.

1.4💡 How did Foundation Models Impact Product Building?

Foundation models don’t eliminate the hard work—they shift where the work begins.

Instead of starting from zero—collecting data, training custom models, hiring specialized ML talent—you now begin with a powerful general-purpose model. That’s your head start.

But the real differentiation still comes from how you apply, adapt, and elevate it:

🔍 Understanding user problems deeply
🧪 Experimenting with advanced techniques like prompts and fine-tuning (customizing with your company's specific language and data). We'll deep dive into them in next post.
✨ Designing intuitive, trustworthy AI experiences
🧠 Choosing the right model architecture or provider for your context
📈 Measuring performance and iterating to drive real outcomes

So no, you’re not “just applying” a model—you’re crafting a product around it. Foundation models democratize access to intelligence, but building great AI-powered products still takes sharp product thinking, creative UX, and strategic execution.

The foundation model landscape evolves rapidly, creating new opportunities and risks monthly. Smart product teams design abstractions that allow model switching as the landscape evolves, monitor performance gaps as smaller models rapidly close quality gaps, and plan for commoditization—what's premium today may be standard tomorrow. This dynamic environment makes strategic model selection even more critical.

2. Foundation Models Basics Every AI PM Needs to know

2.1🔌 The 3 Ways to Access Foundation Models

When building AI-powered products, there are three main ways you can access foundation models

1. Open Models (Self-Hosted)

You download and run models on your own servers or private cloud.

More control over data and infrastructure
Requires setup, ML engineering, GPUs
Best for: High-volume apps, compliance-heavy use cases

2. Closed Models (via API)

You use powerful models like GPT-4(OpenAI), Claude(Anthropic), or Gemini 1.5 (Google) through the cloud. These models are hosted by the provider. You send input and get output via API.

Fast, simple, zero infrastructure
Less control, unpredictable costs
Best for: Prototypes, fast launches, early-stage apps

2. Hybrid Approaches

You mix both—APIs for some use cases, self-hosting for others.

Flexibility across features
More architectural complexity
Best for: Mature products with diverse AI needs

But open models aren't all the same—the level of openness varies significantly. There are three different levels of openness, and understanding them is crucial for your strategy:

Before diving into open models, pause and ask: "What level of openness makes sense for this use case?"
Most of the time, “open” refers to open weights — which offers flexibility to run and fine-tune the model, but doesn’t reveal how it was trained or what data was used.

2.2 Model Size: Bigger Isn't Always Better

Once you know how you'll access a model, the next consideration is size — which impacts speed, cost, and quality.

👀 What most teams ask: "Which model gives us the best results?"

💡 What smart PMs ask: "What's the smallest model that meets the need of our use case?"

Understanding Model Parameters

When we talk about model size, we're actually referring to parameters. When someone says a model is "7B" or "70B," they're talking about the number of parameters — think of these like the model’s brain cells that help it understand patterns, remember relationships, and generate responses.

More parameters = Bigger brain = Better at complex tasks
But also = More computing power, slower responses, and higher costs

Choosing the Right Model Size

Consider model selection like choosing the right vehicle. You wouldn't use a rocket to pick up groceries—a reliable sedan gets the job done faster and cheaper. Similarly, a 7B model often outperforms a 70B model for routine tasks because speed matters more than raw capability.

For complex analysis or creative writing, the extra capability of larger models justifies the performance trade-off. The key is matching model capacity to actual requirements, not defaulting to the biggest available option.

The table below shows how different model sizes map to real-world infrastructure needs and use cases.

Notice how the 7B models handle most common AI tasks at a fraction of the cost and with much faster response times than their larger counterparts. Smaller models can offer faster inference, better performance, and lower costs — especially when you only need the model to do one specific thing really well

💡 PM Insight: Don’t chase the biggest model. Instead, ask:
“What’s the smallest model that gets the job done well for our users?”

3. Token Economics: Understanding AI's Cost Structure

Now that you have a strategic framework for model selection, let's examine the cost structure that will impact your pricing and business model. Token costs directly affect both your unit economics and user experience decisions.

3.1 What Are Tokens and Why They Matter

Think of tokens as the "currency" of AI interactions—like paying for electricity by the kilowatt hour, but for text processing. Roughly 0.75 words in English, so "Hello world" = 2 tokens. Punctuation, spaces, and special characters all count too.

Every user interaction consumes tokens, and tokens directly translate to costs. Understanding token consumption patterns helps you design sustainable features and pricing models.

A single complex document analysis can consume 10,000+ tokens ($0.30+), while a simple email summary uses ~1,100 tokens ($0.03). Design your features with token consumption in mind.

3.2 The Three Cost Components

Understanding AI costs means understanding three types of tokens, each with different price points that dramatically affect your unit economics.

Input tokens represent everything users send to the AI—their questions, uploaded documents, and any context you provide. These cost about $0.03 per 1,000 tokens (GPT 4) , making them the cheapest part of most interactions.

Output tokens are everything the AI generates—responses, summaries, creative content—costs $0.06 per 1,000 tokens, double the input rate. This happens because generating text requires significantly more computational power than simply reading and understanding it.

Context tokens cost the same as input tokens but represent what the model "remembers"- the AI's "memory"—chat history, system prompts, and any retrieved information that helps the AI understand the conversation. These costs can creep up in long conversations or complex workflows.

The Context Window Cost Multiplier: Context costs multiply quickly in extended conversations. A customer support chat that starts at 200 tokens can grow to 5,000+ tokens after 10 exchanges, with each response paying for the entire conversation history. This is why many AI products implement conversation summarization or session resets.

Checkout this example for context window. Each response pays for the full context, making long conversations exponentially expensive.

This 2x cost difference between input and output influences product economics. Input-heavy features like analysis and summarization have much better unit economics than output-heavy features like content generation.

3.3 How Token Costs Shape Product Decisions

Let's see how these costs add up in practice with a real scenario:

Consider a customer support chatbot handling 10,000 conversations daily. With an average of 100 input and 150 output tokens per conversation, you're looking at $120 per day or $43,800 annually—equivalent to one junior engineer but handling 3.6 million interactions.

The economics become clear when you compare three common AI features: Email sumarization, Blog generation and Document analysis.

For email summarization: with 1,000 input tokens and just 100 output tokens, you're looking at $0.036 per interaction—mostly cheap input processing that makes it perfect for freemium models.

Blog generation tells a different story. You need only 200 input tokens to provide context, but generating 2,000 tokens of content costs $0.120 per piece. That's 3x more expensive because you're paying premium rates for all that generated text. This is why content generation tools like Jasper AI and Copy.ai charge monthly subscriptions rather than offering free tiers.

Document analysis represents the volume challenge. Processing a 10,000-token legal contract costs $0.330—mainly input costs that scale directly with document size. This is why services like LawGeex and DocuSign use usage-based pricing rather than flat monthly fees.

These cost differences drive real pricing strategies:

Analysis features = Lower costs, can support freemium models (Grammarly tone detection, Gmail Smart Reply suggestions, Slack message sentiment)
Generation features = Higher costs, need premium pricing (Jasper AI blog posts, GitHub Copilot code completion, Copy.ai marketing content)
Document processing = Volume-dependent, need usage-based pricing (LawGeex contract analysis, DocuSign CLM document review, Otter.ai meeting transcription)

Suddenly, the economics make sense.

Next up, I want to introduce a framework to help you select the models for your usecase

4. The Foundation Model Selection Framework

You're tasked with adding AI to your product. Your team immediately starts debating GPT-4 vs Claude vs Llama. You're already asking the wrong question.

The right question? "What's most important to us, what's our actual usage, and where does AI create lasting advantage?"

Framework Overview: Every foundation model decision requires evaluating three strategic dimensions together: Priority (what most important), Scale (current and future usage), and Moat (where AI creates lasting advantage). The "best" model for your competitor might be terrible for you. Most teams skip this analysis and jump straight to comparing model capabilities.

4.1 Understanding the Three Strategic Dimensions

Every foundation model decision requires evaluating three strategic dimensions together. Most teams skip this analysis and jump straight to comparing model capabilities. Understanding each dimension first helps you apply the complete framework effectively. Let me first talk through the dimensions and then we will bring them together.

1. Priority: What Matters Most to Your Business Right Now?

Every product team has different priorities depending on their stage and context and that imapcts model selection

Speed : When time-to-market is critical—maybe you’re validating product-market fit, capturing a fleeting opportunity, or reacting to competitors—speed becomes the driving force.

Cost: When margins matter—maybe you’re scaling up, processing high volumes, or operating with budget constraints—cost efficiency takes center stage.

Control: When autonomy and compliance are key—maybe you're handling sensitive data, building core IP, or need guaranteed uptime—control matters most.

2. 📈Scale: Current Reality vs. Future Growth

A smart model choice depends not just on today’s traffic—but on what’s coming.

Current Usage: Your existing user base and interaction volume determine what's economically viable today. Under 10K daily interactions often favor simple API approaches. Above 100K daily interactions often require infrastructure investment.

Growth Trajectory: Your projected scale in 6-12 months influences architecture choices. Exponential growth demands scalable foundations. Predictable growth allows optimized solutions.

3. 🏰 Moat: Where will AI create lasting competitive advantage?

In addition to priority & scale, you need to consider your product’s moat to select the foundation model.

Why do Notion’s AI features feel so different from ChatGPT, even if they use similar models? Because Notion’s strength isn’t the model—it’s how AI is embedded into the workflow. That’s their moat.

🔗 Integration Moat

Your product's value comes from how deeply AI is embedded in user workflows. Switching becomes painful because of UX, trust, or switching costs.
Examples: Notion (documents), GitHub (code), Grammarly (writing), Slack (team workflows)

📊 Data Moat

Your proprietary data gives you a unique AI advantage that others can’t replicate. The more you scale, the better your model gets.
Examples: Spotify (music data), Tesla (driving data), OpenAI (training data), Shopify (merchant data)

🌐 Platform Moat

Your advantage comes from operating at a scale others can’t match—via network effects, infrastructure, or ecosystem dynamics.
Examples: AWS (infrastructure), Uber (network effects), iOS App Store (platform), Discord (scale-driven safety)

4.2 Applying the Complete Framework

Bringing the Priority, Scale & Moat together to arrive at Foundation Model Strategy for Notion . Notion faced a classic startup tradeoff: build AI infrastructure or move fast?

Their priority was speed—validate AI writing assistance quickly and grab early market share.
Their scale was moderate—growing, but not yet at the point where infra optimization was critical.
Their moat was in integration—AI that feels native inside one of the best doc creation tools, not a standalone chatbot.

So, Notion chose to integrate APIs (like OpenAI) for fast iteration, then reinvested saved time into UX improvements. That strategic clarity let them move fast and differentiate where it counted.

What is Notion AI: Everything we know about this project management tool | TechRadar

There is no “best” model. The right choice depends on your Priority, Scale, and Moat.

Use this framework to guide model selection strategically—not just technically. Because the right foundation model is the one that gives your product the edge your business needs most. I cover 10 most impprtant combinations of Priority,Scale & Moat with examples later in the post that will serve as guide for foundation model selection decisions for your use-cases.

5. Real-World Reliability Considerations

When you choose external APIs, you inherit their reliability constraints. Most API providers offer 99.9% uptime (43 minutes downtime monthly), while business-critical functions often need 99.99% uptime.

Ask yourself: "If this AI feature goes down for 4 hours, what's the business impact?"

Business-Critical Functions: Customer authentication, payment processing, safety systems, real-time support escalations need internal models or bulletproof reliability guarantees.
Enhancement Features: Content suggestions, AI-powered search improvements, productivity tools can use APIs with graceful degradation—when APIs fail, users get basic functionality instead of broken experiences.

How Leading Companies Handle This:

Slack: Internal models for core search/messaging (99.99% uptime), APIs for AI features with fallbacks
Discord: Custom safety models on internal infrastructure, API features with circuit breakers
Enterprise requirement: Banking regulations often prohibit external dependencies for critical functions

Match your reliability requirements to business risk, not just technical capabilities.

Your foundation model choice should reflect these reliability needs, not just capability requirements.

6. Performance Benchmarks

You'll see foundation models evaluated using academic benchmarks like MMLU (Massive Multitask Language Understanding), which tests general knowledge across 57 subjects, or HumanEval, which measures coding ability on programming problems. These benchmarks were designed by researchers to compare models' general reasoning and knowledge capabilities in controlled, standardized tests.

Think of them like SAT scores for AI models—they measure broad academic performance but don't predict real-world job performance. A model with a 90% MMLU score might be terrible at customer support conversations, while a smaller model with lower academic scores could excel at your domain-specific tasks. Academic benchmarks are useful for researchers comparing general intelligence, but useless for PMs choosing models for specific business applications.

Focus on these business-relevant metrics instead:

Task-Specific Accuracy: Success rate on your actual use cases, not generic tests

Customer support: 85-95% correct responses
Content generation: 80-90% human approval rate
Code completion: 70-80% suggestion acceptance

Response Time Under Load: End-to-end latency with real user traffic

Real-time chat: <2 seconds
Document processing: <30 seconds
Code completion: <500ms

Cost Per Interaction: Total cost including API, infrastructure, overhead

Simple classification: $0.001-0.01
Content generation: $0.10-0.30
Document analysis: $0.20-0.50

We will talk more on relevant metrics when we get into AI evals in future posts

7. Foundation Model Strategic Teardowns:

How TUI Group Scaled Content Operations with Foundation Models
How Slack Architected Foundation Models for Enterprise Security and Scale
How Discord Scaled Foundation Models for Safety and Reliability at Massive Scale

Keep reading with a 7-day free trial

Subscribe to Unpacking AI & Product to keep reading this post and get 7 days of free access to the full post archives.