Join Us On:

How We, The Indians, Came to Be

A massive new study based on ancient DNA answers questions about the people who built the Indus Valley Civilisation

11 min read
How We, The Indians, Came to Be
Hindi Female

The Quint DAILY

For impactful stories you just can’t miss

By subscribing you agree to our Privacy Policy

A breath-taking new study, with some of the most well-regarded names in population genetics, archaeology and anthropology as authors, unpeels the layers of our pre-history concerning the Indus Valley, Vedic Aryans and Dravidian languages.



Slideshow, click through.


Before you begin to read this, take a chair and sit down comfortably. Because this is going to take some time, and it is going to address some of the most fundamental questions about how we, the Indians, or South Asians more generally, came to be.

The answers you are going to read are taken from an extensive new study that has just been released, titled ‘The Genomic Formation of Central and South Asia’. It is co-authored by 92 scientists from around the world and was co-directed by Prof David Reich of Harvard Medical School. Reich runs a lab at Harvard that has no equal in its ability to sequence and analyse ancient DNA at scale and speed, and he has co-authored multiple studies in recent years that have changed the way we understand the prehistory of much of the world. His just-released book, ‘Who We Are and How We Got Here’, is currently making waves.

Among those 92 co-authors are scientists who are stars of an equal measure in their own disciplines, like James Mallory, archaeologist and author of the classic ‘In search of Indo-Europeans: Language, Archaeology and Myth’; and David Anthony, anthropologist and author of the ground-breaking ‘The Horse, The Wheel and the Language: How Bronze Age Raiders from the Eurasian Steppes Shaped the Modern World’.

Archaeobotanist Dorian Fuller and archaeologist Nicole Boivin are familiar names in India for the work they have done in the country. Vasant Shinde is the vice-chancellor of Deccan College, India’s premier institution for archaeology. K Thangaraj, head of the Centre for Cellular and Molecular Biology is a co-director of the study while Niraj Rai of the Birla Sahni Institute of Paleosciences is a co-author, along with Priya Moorjani of University of California, Vagheesh Narasimhan and Swapan Mallik of Harvard Medical School and Ayushi Nayak of the Max Planck Institute for the Science of Human History, Germany.

The Indus Valley port city of Lothal in Gujarat. (Photo Courtesy: Tony Joseph)
(Photo Courtesy: Tony Joseph)
This list of names is noteworthy not just because of the weight they carry, but also because of the variety of fields they come from. Thought was obviously given to the often-raised criticism that population geneticists do not sufficiently take into account archaeological and historical contexts in their studies.

As important as the names, are the data that the study is based on: ancient DNA from 612 individuals, 362 of them reported for the first time.

These ancient individuals come from many regions and periods: Iran and “Turan” which includes Turkmenistan, Uzbekistan and Tajikistan (5,600 to 1,200 BCE); western Siberian forest zone (6,200 to 4,000 BCE); the Steppe east of the Ural mountains, including Kazakhstan (4700 to 1000 BCE) and the Swat Valley of Pakistan (1,200 BCE – 1 CE). This data was then compared and co-analysed with genome-wide data from present-day individuals – 1,789 of them from 246 ethnographically distinct groups in South Asia. It is this comparative analysis using both ancient DNA and present-day DNA across regions and periods that allows the study to arrive at clear conclusions about who moved from where and mixed with whom.

‘Dancing Girl’ figurine from Harappa, part of the Indus Valley Civilisation. (Photo Courtesy:
(Photo Courtesy:

So what does the study say?

One doesn’t know if it was designed that way, but the study addresses the three fundamental questions that have bedevilled Indian archaeologists, anthropologists and historians for decades. These are also the questions that hold the key to understanding how the Indian population is put together, what its basic components are, and how migrations at different points of time may have shaped it.

Question One: Were the beginnings of agriculture in north-western India helped along by the spread of agriculturists from western Asia, or did western Asian crops such as barley and wheat spread to south Asia without the accompaniment of migration?

Question Two: Who built and populated the Indus Valley civilisation? Were they migrants from western Asia? Or were they indigenous hunter-gatherers who had transitioned to agriculture and then urban settlements? Or were they Vedic Aryans?

Question Three: Was there a significant migration of pastoralists from the central Asian Steppe to south Asia who brought with them Indo-European language and culture and who called themselves Aryans? If there was, when did that happen?


The ‘Aryan’ Migration

The study’s response to each question is reasoned and clear in a manner that none of those questions have ever been answered before.

Let’s start with the last question first, about the ‘Aryan’ migration.

According to the study, there was indeed southward migration of pastoralists from the south-eastern Steppe – first towards southern central Asian regions of today’s Turkmenistan, Uzbekistan and Tajikistan between 2,300 and 1,500 BCE, and then towards South Asia throughout the second millennium BCE (2,000 to 1,000 BCE). On their route, they impacted the Bactria-Margiana Archaeological Complex (BMAC) that thrived between 2,300 and 1,700 BCE, but mostly bypassed it to move further down towards South Asia. There they mixed with the existing people of the Indus Valley, thus creating one of the two main sources of population in India today: Ancestral North Indians or ANI, the other being Ancestral South Indians, or ASI.

The study arrived at these conclusions after detecting the signals of the migration in the ancient DNA. To quote: “Outlier analysis shows no evidence of Steppe pastoralist ancestry in groups surrounding BMAC sites prior to 2,100 BCE, but suggests that between 2,100-1,700 BCE, the BMAC communities were surrounded by peoples carrying such ancestry.”

The Indus Valley site of Dholavira in Gujarat. (Photo Courtesy: Tony Joseph)
(Photo Courtesy: Tony Joseph)

Among the ancient DNA from BMAC sites – as well as among the DNA from the eastern Iranian site of Shahr-i-Sokhta – there were some surprising finds with major consequences: Three outlier individuals dated to between 3,100 to 2,200 BC, with an ancestry profile similar to ancient DNA samples from the Swat Prehistoric Grave Culture of Pakistan almost a thousand years later (1,200 to 800 BCE). The BMAC, Shahr-i-Sokhta and the Swat Valley samples were all distinctive in having 14 to 42 percent ancestry from South Asian hunter-gatherers. The Indus Valley civilisation was known to have had contacts with both BMAC and Shahr-i-Sokhta, so the authors of the study suggest that these outlier individuals were recent immigrants from the Indus Valley Civilisation who later migrated to BMAC.

But the story is not over yet.

The scientists compared the Swat Valley samples from 1200 BCE to 1 CE with the outliers from BMAC and Shahr-i-Sokhta and what they found was revealing. While the Swat Valley samples were genetically very similar to the ancient outlier individuals, they also differed significantly in harbouring Steppe ancestry of about 22 percent. “This provides direct evidence for Steppe ancestry being integrated into South Asian groups in the 2nd millennium BCE, and is also consistent with the evidence of southward expansions of the Steppe groups through Turan at this time,” says the study.

Earlier genetic studies had already shown that Indian populations are a mixture of two statistically reconstructed ancient populations, ANI and ASI. But these studies were unable to provide a finer resolution of what went into making these two populations.

The newly available ancient DNA has now made it possible to deconstruct the ANI and ASI into their component parts.

ANI can now be seen as a mixture of Iranian agriculturists, South Asian hunter-gatherers (termed for the first time in this study as Ancient Ancestral South Indians or AASI) and pastoralists from the Steppe. ASI can be seen as a mixture of Iranian agriculturists and south Asian hunter-gatherers.
What one of the largest sites of Indus Valley civilisation, Rakhigarhi, in Haryana, looks like today. (Photo Courtesy: Tony Joseph)
(Photo Courtesy: Tony Joseph)

Sanskrit, Vedic Aryans and The Steppe

There are also other tell-tale marks of the Steppe migration. For example, the Y chromosome haplogroup R1a (of subtype Z93) which is common in South Asia today, was of high frequency in middle to late Bronze Age Steppe.

The study goes on to note:

“It is striking that the great majority of Indo-European speakers today living both in Europe and South Asia harbour large fractions of ancestry related to Yamnaya Steppe pastoralists, suggesting that the “late proto-Indo-European”, the language ancestral to all modern Indo-European languages, was the language of the Yamnaya. While ancient DNA studies have documented westward movements of peoples from the Steppe that plausibly spread this ancestry, there has not been ancient DNA evidence of the chain of transmission to South Asia. Our documentation of a large-scale genetic pressure from Steppe groups in the second millennium BCE provides a prime candidate, a finding that is consistent with archaeological evidence of connections between material culture in the Kazakh middle-to-late Bronze Age Steppe and early Vedic culture in India.”

But there’s more.

When the geneticists tested whether the ANI-ASI mixture model fits 140 present-day population groups south Asia, 10 groups stood out – each of them had poor fits, and significantly elevated levels of Steppe ancestry.

The strongest signals of elevated Steppe ancestry were in two groups that were of traditionally priestly status who were expected to be custodians of texts written in Sanskrit.

Says the study: “A possible explanation is that the influx of Steppe ancestry into South Asia in the mid-2nd millennium BCE created a meta-population of groups with different proportions of Steppe ancestry, with one having relatively more Steppe ancestry having a central role in spreading early Vedic culture. Due to the strong endogamy rules in South Asia, which have kept some groups isolated from their neighbours for thousands of years, some of this substructure within Indian population still persists...”

Indus Valley port city of Lothal in Gujarat. (Photo Courtesy: Tony Joseph)
(Photo Courtesy: Tony Joseph)

Mixing It Up in The Indus Valley

That leads us to question two: Who built and populated the Indus Valley Civilisation?

The fact that migrants from the Steppe arrived only in the second millennium rule out Vedic Aryans as a possibility because by then, the civilisation had already started declining. That leaves only two possibilities: Iranian agriculturists, and indigenous south Asian hunter-gatherers, or AASI. This study had no access to any ancient DNA from the Indus Valley directly, so the answers it gives are based on indirect evidence (it is good to bear in mind, though, that the scientists working on analysing the ancient DNA from the Indus Valley Civilisation site of Rakhigarhi in Haryana area also co-authors of this study, and that report is expected to be published soon). For example, the scientists found three outlier individuals from 3,100 to 2,200 BCE – one in the BMAC site of Gonur and two in the eastern Iran site of Shahr-i-Sokhta – who are very similar genetically to the ancient DNA samples from the Swat Valley around 1,200 to 800 BCE.

These three ancient individuals had 14 to 42 percent of their ancestry related to South Asian hunter-gatherers and the rest mainly to early Iranian agriculturists. The fact that the Indus Valley Civilisation is known to have had contacts with both BMAC and Shahr-i-Sokhta, and that these individuals carried the ancestry of South Asian hunter-gatherers unlike those around them – and that they were genetically similar to the Swat Valley population – make it likely that they were migrants from the Indus Valley to BMAC and eastern Iran.

If this is so, it suggests that the Indus Valley Civilisation was peopled by an admixed population of Iranian agriculturists and South Asian hunter-gatherers.

That takes us to Question One: Were the beginnings of agriculture in South Asia helped along by the migration of agriculturists from Iran or did west Asian crops such as barley and wheat spread to south Asia without west Asian agriculturists coming along?

The only answer that genetics can give as of now is that the Iranian agriculturists must have been in the Indus Valley at least by 4,700 to 3,000 BC. This date was arrived at by using the three outlier individuals from BMAC and Shahri-i-Sokhta – what the study calls the Indus periphery samples – to calculate the date of admixture between the Iranian agriculturist group and the South Asian hunter-gatherer group.

But there is evidence of the beginnings of agriculture in the north-western parts of the subcontinent much earlier. This could either mean that agriculture began locally without migrating agriculturists from Iran; or it could mean that Iranian agriculturists were in the region much earlier but the mixing between the two groups happened later.

This is a question, therefore, that will be definitively answered only when the study based on ancient DNA from the Indus Valley site of Rakhigarhi in Haryana is released – it was scheduled to be published more than a month ago, but has been delayed.

The ‘Priest King’ of Mohenjo Daro, of the Indus Valley Civilisation. (Photo Courtesy:
(Photo Courtesy:

Indus Valley and Dravidian Languages

In the words of the study, which seems to use “Indus periphery people” as a stand-in for the Indus Valley Civilisation people in the absence of direct DNA evidence from there: “The Indus periphery-related people are the single most important source of ancestry in India.”

That is because by mixing with the incoming Steppe pastoralists, they formed the ANI, and by mixing with the south Asian hunter-gatherers or AASI, in the South, they formed the ASI too.

The study doesn’t say it, but it might be useful to look at the Indus Valley people as the common genetic and cultural platform that unites most regions of India. Genetic data shows that both ANI and ASI was fully formed in the second millennium (2,000 to 1,000 BCE), in what must have been among the most tumultuous periods in the history of the region. A civilisation was declining, there was a new influx of people from elsewhere, and everyone was on the move, causing populations that had long stayed separate to mix.

It is worth quoting the study fully on this: “A parsimonious hypothesis is that as the Steppe groups moved south and mixed with the Indus Periphery-related groups at the end of the Indus Valley Civilization to form the ANI, other Indus Periphery-related groups moved further south and east to mix with AASI groups in peninsular India to form the ASI. This is consistent with suggestions that the spread of the Indus Valley Civilization was responsible for dispersing Dravidian languages, although scenarios in which Dravidian languages derive from pre-Indus languages of peninsular India are also entirely plausible as ASI ancestry is mostly derived from the AASI.”

So there we are; all questions answered, more or less.

The Indus Valley Civilisation was likely built and populated by a mixed population of Iranian agriculturists and south Asian hunter-gatherers; pastoralists of the south-eastern Steppe moved into South Asia in the second millennium, bringing with them Indo-European language and culture; the mixing between the Steppe people and people of the Indus Valley Civilisation caused the emergence of the Ancestral North Indian population; and the mixing between the Indus Valley people and the South Asian hunter-gatherers formed the Ancestral South Indian population.

Genetics is steadily making sure that we are no longer stuck in a rut asking the same questions and making the same arguments over and over again, with tempers rising and nostrils flaring. It is time to move on.


“We Are All Migrants”

To end on the same note as on an earlier article written by this author nine months ago titled ‘How Genetics is Settling the Aryan Migration Debate’:

What is abundantly clear is that we are a multi-source civilisation, not a single-source one, drawing its cultural impulses, its tradition and practices from a variety of lineages and migration histories. The Out of Africa immigrants, the pioneering, fearless explorers who discovered this land originally and settled in it and whose lineages still form the bedrock of our population; those who arrived later with a package of farming techniques and built the Indus Valley Civilisation whose cultural ideas and practices perhaps enrich much of our traditions today; those who arrived from East Asia, probably bringing with them the practice of rice cultivation and all that goes with it; those who came later with a language closely related to Sanskrit and its associated beliefs and practices and reshaped our society in fundamental ways; and those who came even later for trade or for conquest and chose to stay, all have mingled and contributed to this civilisation we call Indian. We are all migrants.


Tony Joseph is a writer and is on Twitter @tjoseph0010.

(This is an opinion piece. The views expressed are the author’s own, The Quint neither endorses nor takes responsibility for them.)

(At The Quint, we are answerable only to our audience. Play an active role in shaping our journalism by becoming a member. Because the truth is worth it.)

Read Latest News and Breaking News at The Quint, browse for more from opinion

Speaking truth to power requires allies like you.
Become a Member
3 months
12 months
12 months
Check Member Benefits
Read More