Indian Startup Deep Dive — Artificial Intelligence

INDIA'S
SOVEREIGN
INTELLIGENCE

Founded in August 2023 by two researchers who left IIT Madras to build AI for India's 1.4 billion people, Sarvam AI has in 30 months become the government's chosen partner for India's first sovereign foundational LLM — unveiled at the India AI Impact Summit in February 2026 to an audience that included Sundar Pichai and Sam Altman. The question is no longer whether India can build its own AI. The question is whether Sarvam can make it matter.

$41M+Total Funding Raised
Sarvam-105BFlagship LLM (Feb 2026)
22Indian Languages Supported
4,096NVIDIA H100 GPUs (Gov't)
2023Founded — Bengaluru

Executive Snapshot

Company

Sarvam AI (Sarvamai)

Industry

Foundational AI, Large Language Models, Multimodal AI, Speech Technology

Founded

August 2023, Bengaluru — by Vivek Raghavan & Pratyush Kumar (ex-AI4Bharat, IIT Madras)

Funding

$41M+ Series A — Lightspeed India, Peak XV Partners, Khosla Ventures (Dec 2023) + Government compute support

Key Achievement

Selected by Indian Government (April 2025) to build India's first sovereign LLM under IndiaAI Mission

Models

Sarvam-30B, Sarvam-105B (flagship "Indus"), Saaras V3 (speech), Bulbul V3 (TTS), Sarvam Vision, Kaze smart glasses

Employees

~200–300 (small research-first team by design)

Government Compute

4,096 NVIDIA H100 SXM GPUs via Yotta Data Services — ~₹99Cr in GPU subsidies under IndiaAI Mission

Why It Matters

India has 1.4 billion people speaking 22 scheduled languages. The global LLM giants — OpenAI, Google, Anthropic, Meta — are excellent at English and adequate at a handful of European languages. For a farmer in Maharashtra asking a government AI about crop subsidies in Marathi, or a patient in Tamil Nadu trying to understand their prescription in Tamil, these models are essentially useless. Sarvam AI is building the AI infrastructure layer for a country that the existing large models weren't built for. The February 2026 unveiling of Sarvam-30B and Sarvam-105B at India's AI Impact Summit — trained from scratch, not fine-tuned from Western models, on sovereign Indian compute — is the moment India formally entered the foundational LLM race.

Company Overview

Sarvam AI is India's most significant AI bet — not in terms of funding or valuation, but in terms of strategic ambition. In two and a half years, a team of researchers who left India's premier AI research lab have built a full-stack AI company: foundational language models trained on Indian data, a speech technology platform that supports 22 Indian languages, a vision-language model that outperforms Google and OpenAI on multilingual document understanding, and a physical product — the Kaze smart glasses — that makes AI accessible in audio form for users who don't type. The company received India's highest honour in AI when the government selected it from 67 applicants to build India's sovereign LLM in April 2025.

The foundational models — Sarvam-30B and Sarvam-105B — were unveiled at Bharat Mandapam in New Delhi at the India AI Impact Summit in February 2026. Both were trained from scratch on Indian language data using government-provided compute infrastructure. "Indus," the beta consumer version of Sarvam-105B, launched simultaneously on iOS, Android, and web. It supports reasoning in Indian languages at a standard that, until this moment, no domestic company had achieved.

Sarvam-105BFlagship LLM Parameters
84.3%Vision OCR (vs 80.2% Gemini)
32KContext Window (30B model)
128KContext Window (105B model)

The Founders

Vivek Raghavan and Pratyush Kumar came from AI4Bharat — India's premier open-source AI research initiative, based at IIT Madras — where they spent years building multilingual speech and text datasets for Indian languages. The research was excellent. The impact was limited. Academic outputs and papers don't reach the farmer in Maharashtra who needs AI to understand the government scheme he just read about.

In August 2023 they founded Sarvam AI with the conviction that building a company — with commercial incentives, a product focus, and VC capital — was the only way to take what AI4Bharat had learned and deploy it at population scale. Within five months of founding, they had raised $41 million — one of the fastest Series A raises in Indian AI history — from Lightspeed India, Peak XV Partners, and Khosla Ventures. The speed of the raise reflected both the founders' credibility and the timing: the post-ChatGPT world had just made everyone understand that LLMs were real, and India had nobody building one.

"Sovereignty matters much more in AI than building the biggest models. India needs AI that understands India — not AI that translates English for India."

— Vivek Raghavan, Co-founder, Sarvam AI (India AI Impact Summit, February 2026)

The Problem They Solve

The global AI revolution is, in practice, an English-language revolution with multilingual features bolted on. ChatGPT, Gemini, Claude, and Llama are trained predominantly on internet text, which skews heavily toward English, European languages, and Chinese. Indian languages — Hindi, Bengali, Tamil, Telugu, Marathi, Kannada, Gujarati, and 14 more scheduled languages — are represented marginally in these training datasets. The result is that AI assistants in Indian languages make factual errors, miss cultural context, misunderstand idiomatic expressions, and fail at the document-understanding tasks (reading Aadhaar forms, PAN cards, land records, crop insurance applications) that represent the practical AI use cases for hundreds of millions of Indians.

Sarvam's insight is that building AI for India is not a translation problem — it's a training data problem. You need large amounts of high-quality Indian language text and speech data, and you need models trained on that data from the start, not fine-tuned from English models. That is what the sovereign LLM programme, backed by government compute and Sarvam's own data collection work, is designed to produce.

The Models

Sarvam-30B

Unveiled February 2026

30-billion parameter model, Mixture-of-Experts architecture, ~1B active parameters per token. Context window: 32,000 tokens. Trained on 16 trillion tokens. Designed for real-time conversational use — lower latency, cost-efficient. Benchmarks competitively against Gemma 27B, Mistral-32-24B, Qwen-30B on reasoning and coding.

Sarvam-105B "Indus"

Unveiled February 2026 — Beta on iOS/Android/Web

105-billion parameter flagship, ~9B active parameters per token, 128K context window. Built for enterprise-grade complex reasoning. Consumer beta "Indus" released simultaneously on app stores. Targets agentic workflows, multi-step reasoning in Indian languages.

Sarvam Vision

Vision-Language + OCR System

3-billion-parameter model for document understanding — reads mixed-script text, scanned forms, handwriting across 22 Indian languages. Scored 84.3% on olmOCR-Bench, beating Gemini 3 Pro (80.2%) and ChatGPT (69.8%). Real-world accuracy 93.28% on OmniDocBench. The benchmark results are not incremental improvements — they're decisive leadership in the specific use case that matters most for Indian government services.

Bulbul V3 + Saaras V3

Voice AI Stack

Bulbul V3: Advanced text-to-speech with 35+ voices across 11 Indian languages. Saaras V3: Automatic speech recognition for 22 Indian languages — the widest language coverage of any production ASR system in India. Together they form a complete voice layer that enables AI interfaces for India's non-typing, voice-first users.

Sarvam Kaze — The Smart Glasses

Unveiled in early 2026, the Kaze smart glasses are India's first Made-in-India AI wearable — listening, understanding, and capturing what users see in real time. Supports 10+ Indian languages for voice-based interaction and real-time translation. Launch planned May 2026. The Kaze positions Sarvam as a full-stack AI company, not just an API provider. Think of it as India's version of Meta Ray-Bans, built for India's linguistic complexity.

The IndiaAI Mission — Government Partnership

In April 2025, India's Ministry of Electronics and Information Technology selected Sarvam AI from a pool of 67 applicants to build India's sovereign foundational LLM. The selection gave Sarvam access to 4,096 NVIDIA H100 SXM GPUs via Yotta Data Services — approximately ₹99 crore in compute subsidies. In exchange, the government takes an equity stake in Sarvam, and the sovereign LLM remains managed and governed within India's borders.

India AI Impact Summit — February 2026, New Delhi

At the India AI Impact Summit held at Bharat Mandapam, Sarvam AI unveiled Sarvam-30B and Sarvam-105B to an audience that included Google CEO Sundar Pichai, Sam Altman (OpenAI), and Dario Amodei (Anthropic). Sundar Pichai observed: "The developer energy I find in India is second to none." The Summit committed over $200 billion in AI-related investments and formally positioned India as a sovereign AI nation, not just an AI services provider. For Sarvam, the moment represented institutional validation at the highest level.

The IndiaAI Mission's ₹10,371 crore budget — potentially doubling to ₹20,000 crore following the Summit — makes it the largest government AI investment in any developing country. India has gone from almost no AI infrastructure in 2023 to nearly 40,000 government-accessible GPUs in 2025–26.

Funding History

August 2023 — Founded
Vivek Raghavan and Pratyush Kumar leave AI4Bharat to found Sarvam AI. Their research background and immediate institutional credibility set them apart from most AI startups.
December 2023 — $41M Series A (one of India's fastest)
Led by Lightspeed India, with Peak XV Partners and Khosla Ventures. $41 million raised within 5 months of founding — validates the thesis that India's LLM moment had arrived and these were the right founders. Effectively the seed and Series A combined into one swift round.
April 2025 — IndiaAI Mission selection
Government selects Sarvam from 67 applicants to build India's sovereign LLM. Access to 4,096 H100 GPUs, ₹99Cr in compute subsidies. Government equity stake. Non-trivial terms — but compute access at this scale is transformative for a startup that couldn't otherwise afford it.
February 2026 — Sarvam-30B & 105B unveiled at AI Impact Summit
India's sovereign AI moment. Both models open-sourced under Apache 2.0 on Hugging Face. "Indus" (consumer 105B beta) released simultaneously on iOS, Android, and web. UIDAI Aadhaar integration, startup programme launched. Kaze smart glasses announced for May 2026.

Business Model

Sarvam's monetisation has three channels. The first is API access — developers and enterprises pay for access to Sarvam models through Sarvam's cloud. The second is enterprise contracts — government integrations (UIDAI/Aadhaar collaboration, government services in Indian languages), financial services, healthcare providers, and enterprises that need Indian language AI. The third, longer-term channel is hardware (Kaze smart glasses and future devices) that ship a software service subscription model.

The current stage is primarily pre-revenue at scale — the company is spending most of its capital on model training and infrastructure, using the government compute subsidy to extend its R&D runway without burning VC capital on GPUs. The startup programme announced in March 2026 — offering early-stage companies 6–12 months of free API credits — is a community-building move to create the developer ecosystem that makes Sarvam models the default layer for Indian language AI applications.

Competitive Landscape

CompanyCountryIndian Language FocusScaleStatus
Sarvam AIIndia22 Indian languages — native trainingUp to 105B paramsSovereign LLM
Krutrim (Bhavish Aggarwal)India13 Indian languages — multilingual focus12B (Krutrim-2)Consumer AI
Google GeminiUSAAdequate — trained on limited Indian dataUltra/Pro/NanoGlobal model, Indian gaps
OpenAI GPT-4/oUSALimited — English-dominant trainingFrontierGlobal, not India-first
BharatGen (IIT Bombay)India22 Indian languages — Param2 17B MoE17B MoEGovt-funded academic

Strengths & Challenges

Genuine Advantages

  • Government mandate — no other Indian company has this official backing
  • AI4Bharat heritage — deepest Indian language dataset expertise in the country
  • Vision OCR beating OpenAI and Google on the specific task that matters most for India
  • Apache 2.0 open-source release — developer ecosystem adoption moat
  • Government equity means political alignment and distribution through DPI/Aadhaar
  • Full-stack ambition: model + voice + vision + hardware (Kaze)

Real Vulnerabilities

  • $41M is tiny vs OpenAI ($18B+), Anthropic ($8B+), Google (∞)
  • Revenue near zero — long path to commercial sustainability
  • Government dependency creates mission-drift risk (serving governance vs. commerce)
  • Krutrim has Bhavish Aggarwal's capital and Ola ecosystem distribution
  • Global models improving Indian language support faster than expected
  • Brain drain — global AI companies paying 10× for the same talent

The Sovereign AI Debate

Not everyone is excited about India's sovereign AI model. The critical argument: in a world where DeepSeek built a frontier model for $6 million by standing on Meta's open-source shoulders, why should India spend ₹10,000 crore building models from scratch instead of fine-tuning existing open-source models for Indian languages at a fraction of the cost?

The counter-argument — which the Indian government has clearly found persuasive — is that data sovereignty and infrastructure sovereignty matter independently of the cost efficiency question. A government deploying AI for Aadhaar, citizen services, judicial systems, and national security cannot run that AI on American infrastructure controlled by American companies subject to American law. The sovereign LLM is not primarily a technology decision — it is a geopolitical infrastructure decision, similar to how India runs its own payment network (UPI) rather than operating purely on Visa and Mastercard.

The open-sourcing of Sarvam-30B and Sarvam-105B under Apache 2.0 signals that Sarvam understands it needs a developer ecosystem, not just government contracts, to become the default Indian AI infrastructure layer. Open models that developers can build on create the adoption flywheel that closed models never achieve at the developer level.

Key Lessons

1. Researcher Founders Have a Unique Advantage in Deep Tech

Vivek Raghavan and Pratyush Kumar's academic background at IIT Madras and AI4Bharat gave them two things that most startup founders don't have: a decade of relevant dataset work and credibility with both the government and the global AI research community. When they said they could build a sovereign Indian LLM, people believed them because they had already built the datasets that a sovereign Indian LLM requires. The lesson for deep tech: domain expertise from serious research institutions is not just a credential — it is the starting capital.

2. Government Can Be a Customer AND an Investor AND a Compute Provider

India's AI policy approach — subsidising compute, taking equity, mandating deployment through government services — creates an unusual public-private partnership that Western AI startups don't operate within. Sarvam's government-as-compute-provider model has allowed it to train 105B parameter models on a $41M funding base that would have been impossible in a purely commercial environment. For startups in strategic sectors, understanding how to partner with governments as capital and infrastructure providers is an underrated capability.

Future Outlook

Sarvam's roadmap through 2026 is clear: ship Kaze in May 2026, grow the developer ecosystem through the startup API credit programme, deepen the UIDAI and government service integrations, and pursue enterprise revenue in financial services, healthcare, and education. The longer-term question is whether Sarvam can achieve commercial sustainability before the government compute subsidy ends and before the global AI companies close the Indian language performance gap that Sarvam currently holds.

The 2027 Test

The IndiaAI Mission GPU allocation is finite — 4,096 H100s for a defined period. After that period, Sarvam must fund its own compute. Given that Sarvam-105B-scale training runs cost millions of dollars, the commercial revenue must be substantial well before the compute subsidy expires. If the developer ecosystem builds strongly on Sarvam's open models, API revenue at scale is achievable. If the ecosystem prefers global APIs that are cheaper or better, Sarvam's path gets much harder. The open-source release was the right strategic move to build the ecosystem — the execution on that ecosystem over the next 18 months will determine whether the commercial model closes.

The Bottom Line

Sarvam AI has done, in 30 months, what most AI observers assumed would take India a decade: built a 105-billion parameter foundational LLM from scratch, trained on Indian language data, deployed on sovereign Indian compute, open-sourced under Apache 2.0, and launched a consumer product at India's highest-profile AI event. The models are competitive on the benchmarks that matter for Indian language AI. The government partnership gives distribution channels no commercial company could achieve independently. The remaining questions are not about capability — they are about commercialisation. $41M is modest for frontier AI, the government subsidy is temporary, and the global competition is accelerating Indian language support faster than expected. Whether Sarvam can translate genuine technical leadership into a sustainable business model before those windows close is the story of the next three years.