Supplementary Informations from Genetic and linguistic histories in Central Asia inferred using approximate Bayesian computations

Linguistic and genetic data have been widely compared, but the histories underlying these descriptions are rarely jointly inferred. We developed a unique methodological framework for analysing jointly language diversity and genetic polymorphism data, to infer the past history of separation, exchange and admixture events among human populations. This method relies on approximate Bayesian computations that allow to identify the most probable historical scenario underlying each type of data, and to infer the parameters of these scenarios. For this purpose, we developed a new computer program PopLingSim that simulates the evolution of linguistic diversity, which we coupled with an existing coalescent-based genetic simulation program, to simulate both linguistic and genetic data within a set of populations. Applying this new program to a wide linguistic and genetic dataset of Central Asia, we found several differences between linguistic and genetic histories. In particular, we showed how genetic and linguistic exchanges differed in the past in this area: some cultural exchanges were maintained without genetic exchanges. The methodological framework and the linguistic simulation tool here developed can be successfully used in future work for disentangling complex linguistic and genetic evolutions underlying human biological and cultural histories.