When data is scarce: How India is building military AI differently

by Incbusiness Team

In the global race to build military AI, data has become the most valuable weapon. For superiority in modern warfare is increasingly measured in algorithms, not just in missiles or manpower.

“Whoever owns the data has an edge in AI right now,” says Neeta Trivedi, Co-founder and CEO of defencetech startup Inferigence Quotient and a former scientist at Defence Research & Development Organisation (DRDO), where she spent 28 years.

But India faces a fundamental challenge: the country doesn’t have nearly enough data.

At a time when geopolitical tensions, from the ongoing Russia–Ukraine War to instability across West Asia, are accelerating the global push toward AI-enabled warfare, the ability to build intelligent defence systems has become a strategic priority for many nations.

Even as the country experiments with AI-led warfare, demonstrated during Operation Sindoor, the gap between India and global military powers remains stark. The US Pentagon has sought about $13.4 billion to advance defence AI capabilities. Estimates suggest China’s People’s Liberation Army is investing between $1-2 billion annually on similar technologies.

India’s allocation, by contrast, is far more modest: roughly $60 million spread evenly across five years per 2023 report from think tank Delhi Policy Group.

The disparity extends beyond funding. AI systems require enormous volumes of data to train effectively. The US hosts more than 5,000 AI data centres while India has about 150. In practical terms, this means India is developing military AI with far smaller datasets than its geopolitical rivals.

Yet the question is not just how much data India has—but how it uses what already exists.

India’s military AI has a data gap

Trivedi says large volumes of potentially valuable military data already exist, but remain largely untapped. For decades, unmanned aerial vehicles have captured vast amounts of surveillance footage. Much of that data has remained archived at ground stations.

“The videos come to the ground stations and have just been sitting there for decades,” she says. “The data needs to be extracted and labelled for training. And it’s not just video; there are other sensors like radar. Private companies can’t access them because the data is classified, and the military hasn’t always had the bandwidth to process it.”

Also ReadNew-age defence tech is rising, but is India ready with the talent?

Even when datasets are available, they present another challenge. Much of the data generated in defence testing environments is relatively ‘clean’ compared to the messy, unpredictable conditions of real-world conflict.

One approach, Trivedi says, is to begin training AI models with available test data and then refine them using limited operational data once access is possible. At the same time, researchers are exploring whether effective AI systems can be trained with significantly smaller datasets.

“They say if you want to familiarise a child with an elephant, you show them a few pictures and the child can identify the animal,” she says. “You don’t need millions of images. Researchers are exploring whether something similar can work for AI. Of course, the human brain works differently from machine learning models, but there is interesting work happening in that area.”

The search for an indigenous solution

One suggestion that often comes up is whether India could train its AI systems using datasets from ongoing conflicts such as the Russia-Ukraine war. But experts say that approach is largely impractical.

Shashidhara BP, former managing director of Aeronautical Development Establishment under the Defence Research and Development Organisation (DRDO), argues that such data is unlikely to be accessible.

“It is almost impossible to use data generated in other war scenarios. Such datasets are typically proprietary to the respective defence forces and are almost always encrypted to prevent misuse,” he says.

Instead, he believes India’s long-term strategy must focus on developing indigenous AI systems tailored to its own defence datasets.

“There are a number of initiatives under way in both the government and private sector. As these technologies evolve and integrate, they can support the needs of the armed forces in a rapidly fast changing warfare environment,” he says. “Once we develop our own language models and train them on datasets generated across multiple applications, we will have a proven AI platform that can be deployed not just in defence but across other sectors as well.”

Building sovereign military AI

For many Indian defencetech startups, that push toward indigenous AI development has already begun.

Jayant Khatri, Co-founder and CEO of Apollyon Dynamics, says the company deliberately avoids using international LLMs or APIs, including tools like ChatGPT, to minimise the risk of sensitive data exposure.

“We develop our own algorithms that run on edge computing systems,” he says, referring to computing that processes data close to the source rather than relying on cloud infrastructure. “We combine high-fidelity simulations with hardware-in-the-loop testing to create controlled training environments. Every field deployment feeds back into the system, creating a closed-loop learning process,” he says.

Also ReadThis student-led startup is revolutionising defence tech with agile kamikaze drones

The push toward sovereign AI platforms is also visible in initiatives such as Project Ekam, developed by the startup Neuralix. The platform, described as India’s first proprietary Defence AI-as-a-Service system, was inaugurated by Defence Minister Rajnath Singh in December 2025.

According to Neuralix CEO and Founder Vikram Jayaram, working with military datasets is fundamentally different from handling the curated data used in most commercial GenAI systems.

“Military data is extremely fragmented,” says Jayaram, who has spent nearly 27 years in the AI and machine learning industry. “These data sources were never designed for language models to ingest and process easily. The computing architecture is also limited for large-scale training permutations. It has taken years just to understand how to curate the data and determine whether building smaller, specialised models is more effective in most situations.”

No Ferraris on broken roads

Jayaram believes India should avoid joining the global race to build ever-larger language models.

Instead, he says the focus should be on solving specific decision support system problems with frameworks designed for India’s security constraints, particularly its limited access to large-scale datasets. Trying to replicate the approach taken by countries like the US could ultimately prove counterproductive.

“Building large language models is like building a Ferrari. We can try to build one. But if you put a Ferrari on a bad road, the whole thing will tear apart. If the underlying data, infrastructure, or purpose is poor, it will never perform,” he says. “Our priority should be to address the fragmented problems we actually face. That’s why we are building smaller language models, or SLMs. These are easier to manage and iterate, and over time multiple specialised models can come together to form a larger system.”

The Ekam AI has already been deployed within the Indian defence ecosystem while work continues in parallel on more sophisticated language models.

Jayaram also points to ongoing research aimed at training robust AI systems with smaller datasets, an approach that Trivedi had earlier highlighted.

“When training samples are scarce, we apply mathematical transformations to generate additional samples over time. Once enough of these training datasets are created, they can be used to train models more effectively,” he says.

As warfare moves deeper into the era of intelligent automation, the ultimate goal is to build AI systems capable of operating with high precision and minimal collateral damage. Achieving that level of reliability, however, remains a major challenge.

“At the end of the day, it’s not a level-playing field. A terrorist doesn’t care who gets harmed. They simply want to create chaos. But when we act against them, we cannot afford to hit the wrong target. Achieving that level of accuracy is vital,” Trivedi says.

Despite the constraints, experts say the defence sector is steadily building momentum.

“Our work reflects a thoughtful integration of language models and AI into operational workflows,” says a retired major general of the Indian Army, seeking anonymity. “The ecosystem is showing how sovereign technology can quietly yet powerfully enhance intelligent, inference, and situation awareness in modern military operations.”

Edited by Teja Lele

Original Article
(Disclaimer – This post is auto-fetched from publicly available RSS feeds. Original source: Yourstory. All rights belong to the respective publisher.)


Related Posts

Leave a Comment