Source: The post “Large Language Models (LLMs) Training in India” has been created, based on “How are Indian firms training LLMs? | Explained” published in “The Hindu” on 26th February 2026.
UPSC Syllabus: GS Paper-3-Science and technology
Context: Bengaluru-based Sarvam AI released two Large Language Models trained on 35 billion and 105 billion parameters at the AI Impact Summit in New Delhi. These models aim to improve performance in Indian languages while maintaining computational efficiency. This initiative aligns with India’s broader effort under the IndiaAI Mission to build indigenous AI capabilities.
Training of Large Language Models (LLMs)
- Large Language Models are trained using clusters of Graphics Processing Units (GPUs), which handle the intensive parallel computations required for deep learning.
- The training process requires enormous datasets, most of which are scraped from the internet and are dominated by English and other globally prominent languages.
- The combined cost of GPUs, electricity, cooling systems, and infrastructure often runs into millions of dollars for a single training run.
- Indian firms are focusing on building mid-sized models that balance performance and efficiency, rather than immediately attempting trillion-parameter frontier systems.
- For instance, Sarvam AI trained its models from scratch using subsidised compute access, while BharatGen developed a 17-billion-parameter multilingual model for sectors such as healthcare and education.
Challenges in Training Large Language Models on Indian Soil
I. Data Scarcity
- Indian languages are underrepresented on the internet, resulting in limited high-quality datasets for training.
- Many LLMs rely on translating Indian-language inputs into English internally, which increases token usage and inference cost.
- Translation-based approaches may reduce efficiency and affect performance in low-resource settings.
II. Capital Constraints
- The acquisition and operation of high-end GPUs require substantial capital investment, which is often beyond the reach of startups.
- Indian firms do not yet possess the financial scale of global technology giants that can absorb massive upfront costs.
- There may be limited immediate commercial returns to justify frontier-scale model training.
III. Infrastructure Limitations
- Access to advanced semiconductor hardware remains dependent on imports.
- Establishing and maintaining large-scale AI data centres requires long-term strategic planning and financial commitment.
Government Support under IndiaAI Mission
- The IndiaAI Mission has commissioned over 36,000 GPUs in domestic data centres to support AI development.
- The government has collaborated with firms such as Yotta to host this infrastructure.
- Startups and research institutions can access GPU clusters at subsidised rates for training and inference.
- Sarvam AI received access to 4,096 GPUs, with an estimated subsidy of nearly ₹100 crore.
- The total cost of the compute cluster is approximately ₹246 crore, and the infrastructure remains reusable for other developers.
- The initiative seeks to promote technological sovereignty, enhance AI talent, and ensure that Indian languages receive adequate representation in AI systems.
Cost Efficiency of the Mixture of Experts (MoE) Architecture
- In traditional dense LLMs, all parameters are activated during inference, making each query computationally expensive.
- The Mixture of Experts (MoE) architecture activates only a subset of parameters for any given query.
- This selective activation reduces computational load and lowers energy consumption.
- As a result, MoE models achieve faster inference speeds at lower operational costs.
- Therefore, a 105-billion-parameter MoE model can be significantly cheaper to operate than a dense model with the same parameter size.
Way Forward
- The government should expand initiatives to create high-quality, publicly available datasets in Indian languages through collaboration with academic institutions and state agencies.
- Greater private sector participation and venture capital investment are required to sustain large-scale AI research and development.
- India should strengthen its semiconductor manufacturing and GPU supply chain capabilities to reduce import dependence.
- Transparent benchmarking and open evaluation of domestic models should be encouraged to build global credibility.
- Sector-specific LLMs tailored for governance, healthcare, education, and agriculture should be prioritised to generate immediate social and economic impact.
- Long-term public–private partnerships should be institutionalised to ensure continuity of compute infrastructure and research funding.
Conclusion: India’s efforts to develop indigenous Large Language Models represent a strategic move toward digital sovereignty and technological self-reliance. Although challenges related to data availability, capital investment, and infrastructure remain significant, targeted government support through the IndiaAI Mission and the adoption of efficient architectures such as MoE have laid a strong foundation. Sustained investment, ecosystem development, and policy stability will determine India’s ability to compete with global frontier AI systems.
Question: Examine how Indian firms are developing Large Language Models (LLMs). Discuss the challenges of training them in India, the role of the IndiaAI Mission, the cost advantage of Mixture of Experts architecture, and suggest the way forward.
Source: The Hindu




