$600 million in annualized revenue. Zero to that number in one quarter. The company tripled its ARR from $200M to $600M between December 2025 and March 2026 — and then raised $1.5 billion at a $13 billion valuation to prove it was not a fluke.

🎯
AI inference has become a standalone infrastructure category. The company processes 1 billion inference calls per day across 87 clusters and 18 cloud environments. Its revenue grew 20x year-over-year. Its valuation doubled every five months through three consecutive rounds in 2025–2026.

The numbers describe a market that did not exist three years ago.

Post-trained open-source models now deliver frontier-level performance at a fraction of the cost of closed APIs, and leading AI application companies direct 30–50% of model spend toward custom and post-trained models. The company built the infrastructure layer for that shift.

The company was founded in 2019 by Tuhin Srivastava, Phil Howes, and Amir Haghighat. Their initial insight was straightforward: training AI models was becoming easier, but running them in production reliably, at scale, and cost-effectively was not. The inference problem looked like the early cloud computing market before AWS abstracted away server management.

It built Truss, an open-source framework that packages models into production-ready APIs. From there the platform grew into a full inference stack spanning GPU orchestration, autoscaling, observability, billing, and multi-cloud deployment. Customers including Cursor, Notion, Writer, and Abridge now run production workloads on it, and the system handles models from Llama to DeepSeek to Stable Diffusion.

The funding sequence

The valuation trajectory tells a story of its own. It raised a $75 million Series C in February 2025 at an $825 million valuation. Seven months later, in September 2025, a $150 million Series D doubled that to $2.15 billion. Four months after that, a $300 million Series E in January 2026 pushed the valuation to $5 billion, with Nvidia contributing $150 million.

Then the curve steepened. In June 2026, it closed a $1.5 billion Series F at a $13 billion valuation, a 160% increase in less than five months. The round was structured in two tranches at $13 billion and $11 billion respectively, co-led by Altimeter Capital, Conviction, Spark Capital, Sands Capital, and Wellington Management, with IVP, Greylock, 01A, Blackbird, Durable Capital Partners, and D. E. Shaw Ventures also participating.

The total raised exceeds $2 billion across eight rounds in five years.

The inference gold rush

The market the company occupies did not exist in 2023. That year was defined by training. Companies spent billions building larger foundation models. By 2025 the center of gravity shifted. Open-source models reached parity with proprietary systems on key benchmarks, and the question became not "how do we train a better model" but "how do we serve this model to millions of users for less than the competition."

Inference infrastructure became the bottleneck. Legacy cloud providers were built for CPU-bound workloads, not GPU-orchestrated model serving. Cold starts, KV cache management, continuous batching, and fractional GPU allocation required a new stack. Baseten, Fireworks AI, and Together AI emerged as the dedicated players.

The platform's architectural differentiation is its multi-cloud approach. It operates across 18 cloud environments and 87 global clusters, allowing customers to route inference requests to the cheapest available compute. That capability becomes more valuable as GPU supply remains constrained and prices vary regionally.

Unit economics as competitive moat

Blackbird VC partner Michael Tolo, whose firm made its largest-ever investment in this round, described the market dynamics in concrete terms: for companies building AI into their products, Baseten competes with OpenAI and Anthropic at a lower price point, and "this is the biggest shift that we've seen in both unit economics and competitive leverage within the AI market so far."

The numbers support the claim. The platform delivers up to 30% cost savings versus closed-source APIs for equivalent model performance. The platform's usage-based pricing aligns with customer growth, unlike the seat-based models that defined the SaaS era. As inference volume scales, the marginal cost per call decreases. That is a structural advantage that improves with scale rather than degrading.

Nvidia's participation in the Series E signals silicon-layer validation. The inference market is large enough that the dominant GPU manufacturer is placing strategic bets on the distribution layer, not just the silicon.

Competition and market boundaries

The company competes with Fireworks AI, which reached $800 million ARR and raised at a $15 billion valuation in the same period. Together AI and modal.com also operate in the inference infrastructure space, though with different architectural emphases. The broader market includes cloud hyperscalers (AWS, GCP, Azure) whose inference offerings remain bundled with their general compute platforms rather than specialized for the AI workload.

The market boundary is not yet settled. Inference infrastructure could consolidate into two or three dominant platforms, mirroring the cloud market. Or it could fragment across model-specific and workflow-specific providers. The company's multi-cloud, model-agnostic positioning hedges against both outcomes.

What to watch

📊
Key signals to track

Quarterly ARR growth rate: whether the $200M→$600M quarterly trajectory accelerates or stabilizes
Enterprise customer count: currently 100+; expansion into regulated industries signals maturity
Inference cost curves: the spread between its pricing and closed API pricing determines long-term switching incentives
Hyperscaler response: if AWS/GCP launch dedicated inference tiers with aggressive pricing, the competitive dynamics shift

The meaning

The company's growth is a concrete signal about where the AI industry is heading. The training phase concentrated value in a small number of foundation model labs. The inference phase distributes it across infrastructure, tooling, and application layers. Baseten, valued at $13 billion, is the infrastructure play. Cursor and Notion, running on its platform, are the application plays. Nvidia, investing in it, is betting that the distribution layer matters as much as the silicon.

The company plans to triple its workforce this year, focusing on engineering, research, operations, and go-to-market. The capital will also expand compute capacity, the physical infrastructure that 1 billion daily inference calls requires.

The question it answers is not whether enterprises will deploy AI in production. They already are. The question is whose infrastructure they will trust to run it.

Sources

Baseten Raises $1.5 Billion to Power the Next Era of AI Inference
Official Series F announcement with full investor list and growth metrics — revenue up 20x YoY, 1B inference calls per day, 87 global clusters.
Primary source: the company's press release with verified financial data
AI inference startup Baseten reportedly raising $1.5B months after its last mega-round
TechCrunch reported the WSJ scoop on the Series F — first outlet to break the news of the $13B valuation.
Independent journalism confirming the round structure
AI startup Baseten hits $13 billion valuation as Australia's Blackbird makes record bet
Reuters covered Blackbird VC's record investment and the strategic rationale for the company's valuation from an investor perspective.
Third-party market context from a major wire service