Founded in August 2023 by two researchers who left IIT Madras to build AI for India's 1.4 billion people, Sarvam AI has in 30 months become the government's chosen partner for India's first sovereign foundational LLM — unveiled at the India AI Impact Summit in February 2026 to an audience that included Sundar Pichai and Sam Altman. The question is no longer whether India can build its own AI. The question is whether Sarvam can make it matter.
Sarvam AI (Sarvamai)
Foundational AI, Large Language Models, Multimodal AI, Speech Technology
August 2023, Bengaluru — by Vivek Raghavan & Pratyush Kumar (ex-AI4Bharat, IIT Madras)
$41M+ Series A — Lightspeed India, Peak XV Partners, Khosla Ventures (Dec 2023) + Government compute support
Selected by Indian Government (April 2025) to build India's first sovereign LLM under IndiaAI Mission
Sarvam-30B, Sarvam-105B (flagship "Indus"), Saaras V3 (speech), Bulbul V3 (TTS), Sarvam Vision, Kaze smart glasses
~200–300 (small research-first team by design)
4,096 NVIDIA H100 SXM GPUs via Yotta Data Services — ~₹99Cr in GPU subsidies under IndiaAI Mission
India has 1.4 billion people speaking 22 scheduled languages. The global LLM giants — OpenAI, Google, Anthropic, Meta — are excellent at English and adequate at a handful of European languages. For a farmer in Maharashtra asking a government AI about crop subsidies in Marathi, or a patient in Tamil Nadu trying to understand their prescription in Tamil, these models are essentially useless. Sarvam AI is building the AI infrastructure layer for a country that the existing large models weren't built for. The February 2026 unveiling of Sarvam-30B and Sarvam-105B at India's AI Impact Summit — trained from scratch, not fine-tuned from Western models, on sovereign Indian compute — is the moment India formally entered the foundational LLM race.
Sarvam AI is India's most significant AI bet — not in terms of funding or valuation, but in terms of strategic ambition. In two and a half years, a team of researchers who left India's premier AI research lab have built a full-stack AI company: foundational language models trained on Indian data, a speech technology platform that supports 22 Indian languages, a vision-language model that outperforms Google and OpenAI on multilingual document understanding, and a physical product — the Kaze smart glasses — that makes AI accessible in audio form for users who don't type. The company received India's highest honour in AI when the government selected it from 67 applicants to build India's sovereign LLM in April 2025.
The foundational models — Sarvam-30B and Sarvam-105B — were unveiled at Bharat Mandapam in New Delhi at the India AI Impact Summit in February 2026. Both were trained from scratch on Indian language data using government-provided compute infrastructure. "Indus," the beta consumer version of Sarvam-105B, launched simultaneously on iOS, Android, and web. It supports reasoning in Indian languages at a standard that, until this moment, no domestic company had achieved.
Vivek Raghavan and Pratyush Kumar came from AI4Bharat — India's premier open-source AI research initiative, based at IIT Madras — where they spent years building multilingual speech and text datasets for Indian languages. The research was excellent. The impact was limited. Academic outputs and papers don't reach the farmer in Maharashtra who needs AI to understand the government scheme he just read about.
In August 2023 they founded Sarvam AI with the conviction that building a company — with commercial incentives, a product focus, and VC capital — was the only way to take what AI4Bharat had learned and deploy it at population scale. Within five months of founding, they had raised $41 million — one of the fastest Series A raises in Indian AI history — from Lightspeed India, Peak XV Partners, and Khosla Ventures. The speed of the raise reflected both the founders' credibility and the timing: the post-ChatGPT world had just made everyone understand that LLMs were real, and India had nobody building one.
"Sovereignty matters much more in AI than building the biggest models. India needs AI that understands India — not AI that translates English for India."
— Vivek Raghavan, Co-founder, Sarvam AI (India AI Impact Summit, February 2026)The global AI revolution is, in practice, an English-language revolution with multilingual features bolted on. ChatGPT, Gemini, Claude, and Llama are trained predominantly on internet text, which skews heavily toward English, European languages, and Chinese. Indian languages — Hindi, Bengali, Tamil, Telugu, Marathi, Kannada, Gujarati, and 14 more scheduled languages — are represented marginally in these training datasets. The result is that AI assistants in Indian languages make factual errors, miss cultural context, misunderstand idiomatic expressions, and fail at the document-understanding tasks (reading Aadhaar forms, PAN cards, land records, crop insurance applications) that represent the practical AI use cases for hundreds of millions of Indians.
Sarvam's insight is that building AI for India is not a translation problem — it's a training data problem. You need large amounts of high-quality Indian language text and speech data, and you need models trained on that data from the start, not fine-tuned from English models. That is what the sovereign LLM programme, backed by government compute and Sarvam's own data collection work, is designed to produce.
30-billion parameter model, Mixture-of-Experts architecture, ~1B active parameters per token. Context window: 32,000 tokens. Trained on 16 trillion tokens. Designed for real-time conversational use — lower latency, cost-efficient. Benchmarks competitively against Gemma 27B, Mistral-32-24B, Qwen-30B on reasoning and coding.
105-billion parameter flagship, ~9B active parameters per token, 128K context window. Built for enterprise-grade complex reasoning. Consumer beta "Indus" released simultaneously on app stores. Targets agentic workflows, multi-step reasoning in Indian languages.
3-billion-parameter model for document understanding — reads mixed-script text, scanned forms, handwriting across 22 Indian languages. Scored 84.3% on olmOCR-Bench, beating Gemini 3 Pro (80.2%) and ChatGPT (69.8%). Real-world accuracy 93.28% on OmniDocBench. The benchmark results are not incremental improvements — they're decisive leadership in the specific use case that matters most for Indian government services.
Bulbul V3: Advanced text-to-speech with 35+ voices across 11 Indian languages. Saaras V3: Automatic speech recognition for 22 Indian languages — the widest language coverage of any production ASR system in India. Together they form a complete voice layer that enables AI interfaces for India's non-typing, voice-first users.
Unveiled in early 2026, the Kaze smart glasses are India's first Made-in-India AI wearable — listening, understanding, and capturing what users see in real time. Supports 10+ Indian languages for voice-based interaction and real-time translation. Launch planned May 2026. The Kaze positions Sarvam as a full-stack AI company, not just an API provider. Think of it as India's version of Meta Ray-Bans, built for India's linguistic complexity.
In April 2025, India's Ministry of Electronics and Information Technology selected Sarvam AI from a pool of 67 applicants to build India's sovereign foundational LLM. The selection gave Sarvam access to 4,096 NVIDIA H100 SXM GPUs via Yotta Data Services — approximately ₹99 crore in compute subsidies. In exchange, the government takes an equity stake in Sarvam, and the sovereign LLM remains managed and governed within India's borders.
The IndiaAI Mission's ₹10,371 crore budget — potentially doubling to ₹20,000 crore following the Summit — makes it the largest government AI investment in any developing country. India has gone from almost no AI infrastructure in 2023 to nearly 40,000 government-accessible GPUs in 2025–26.
Sarvam's monetisation has three channels. The first is API access — developers and enterprises pay for access to Sarvam models through Sarvam's cloud. The second is enterprise contracts — government integrations (UIDAI/Aadhaar collaboration, government services in Indian languages), financial services, healthcare providers, and enterprises that need Indian language AI. The third, longer-term channel is hardware (Kaze smart glasses and future devices) that ship a software service subscription model.
The current stage is primarily pre-revenue at scale — the company is spending most of its capital on model training and infrastructure, using the government compute subsidy to extend its R&D runway without burning VC capital on GPUs. The startup programme announced in March 2026 — offering early-stage companies 6–12 months of free API credits — is a community-building move to create the developer ecosystem that makes Sarvam models the default layer for Indian language AI applications.
| Company | Country | Indian Language Focus | Scale | Status |
|---|---|---|---|---|
| Sarvam AI | India | 22 Indian languages — native training | Up to 105B params | Sovereign LLM |
| Krutrim (Bhavish Aggarwal) | India | 13 Indian languages — multilingual focus | 12B (Krutrim-2) | Consumer AI |
| Google Gemini | USA | Adequate — trained on limited Indian data | Ultra/Pro/Nano | Global model, Indian gaps |
| OpenAI GPT-4/o | USA | Limited — English-dominant training | Frontier | Global, not India-first |
| BharatGen (IIT Bombay) | India | 22 Indian languages — Param2 17B MoE | 17B MoE | Govt-funded academic |
Not everyone is excited about India's sovereign AI model. The critical argument: in a world where DeepSeek built a frontier model for $6 million by standing on Meta's open-source shoulders, why should India spend ₹10,000 crore building models from scratch instead of fine-tuning existing open-source models for Indian languages at a fraction of the cost?
The counter-argument — which the Indian government has clearly found persuasive — is that data sovereignty and infrastructure sovereignty matter independently of the cost efficiency question. A government deploying AI for Aadhaar, citizen services, judicial systems, and national security cannot run that AI on American infrastructure controlled by American companies subject to American law. The sovereign LLM is not primarily a technology decision — it is a geopolitical infrastructure decision, similar to how India runs its own payment network (UPI) rather than operating purely on Visa and Mastercard.
The open-sourcing of Sarvam-30B and Sarvam-105B under Apache 2.0 signals that Sarvam understands it needs a developer ecosystem, not just government contracts, to become the default Indian AI infrastructure layer. Open models that developers can build on create the adoption flywheel that closed models never achieve at the developer level.
Vivek Raghavan and Pratyush Kumar's academic background at IIT Madras and AI4Bharat gave them two things that most startup founders don't have: a decade of relevant dataset work and credibility with both the government and the global AI research community. When they said they could build a sovereign Indian LLM, people believed them because they had already built the datasets that a sovereign Indian LLM requires. The lesson for deep tech: domain expertise from serious research institutions is not just a credential — it is the starting capital.
India's AI policy approach — subsidising compute, taking equity, mandating deployment through government services — creates an unusual public-private partnership that Western AI startups don't operate within. Sarvam's government-as-compute-provider model has allowed it to train 105B parameter models on a $41M funding base that would have been impossible in a purely commercial environment. For startups in strategic sectors, understanding how to partner with governments as capital and infrastructure providers is an underrated capability.
Sarvam's roadmap through 2026 is clear: ship Kaze in May 2026, grow the developer ecosystem through the startup API credit programme, deepen the UIDAI and government service integrations, and pursue enterprise revenue in financial services, healthcare, and education. The longer-term question is whether Sarvam can achieve commercial sustainability before the government compute subsidy ends and before the global AI companies close the Indian language performance gap that Sarvam currently holds.
The IndiaAI Mission GPU allocation is finite — 4,096 H100s for a defined period. After that period, Sarvam must fund its own compute. Given that Sarvam-105B-scale training runs cost millions of dollars, the commercial revenue must be substantial well before the compute subsidy expires. If the developer ecosystem builds strongly on Sarvam's open models, API revenue at scale is achievable. If the ecosystem prefers global APIs that are cheaper or better, Sarvam's path gets much harder. The open-source release was the right strategic move to build the ecosystem — the execution on that ecosystem over the next 18 months will determine whether the commercial model closes.
Sarvam AI has done, in 30 months, what most AI observers assumed would take India a decade: built a 105-billion parameter foundational LLM from scratch, trained on Indian language data, deployed on sovereign Indian compute, open-sourced under Apache 2.0, and launched a consumer product at India's highest-profile AI event. The models are competitive on the benchmarks that matter for Indian language AI. The government partnership gives distribution channels no commercial company could achieve independently. The remaining questions are not about capability — they are about commercialisation. $41M is modest for frontier AI, the government subsidy is temporary, and the global competition is accelerating Indian language support faster than expected. Whether Sarvam can translate genuine technical leadership into a sustainable business model before those windows close is the story of the next three years.