Clinal distribution of human genomic diversity across the Netherlands despite archaeological evidence for genetic discontinuities in Dutch population history.

Oscar Lao, Eveline Altena, Christian Becker, Silke Brauer, Thirsa Kraaijenbrink, Mannis van Oven, Peter Nürnberg, Peter de Knijff and Manfred Kayser.

Investigative Genetics 2013, 4:9.

Description of the Dataset:

The Netherlands dataset comprises the cleaned information of 969 out of 999 individuals of Dutch ancestry sampled from 54 geographic regions across the Netherlands. Each individual was genotyped with the GeneChip Human Mapping 500 K Array Set (Affymetrix), consisting in 443,816 markers and genotypes were inferred with the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) algorithm. Individual cleaning included removing individuals that were more genetically related than the average as well as these that were highly genetically differentiated from the sampled population. None of the considered individuals showed a percentage of missing genotypes >2% and therefore there was no further individual exclusion.

SNP data cleaning included:

  1. SNPs that did not pass HWE in at least one subpopulation after multiple testing were excluded.
  2. LD pruning. Markers that showed low LD at a distance <500 kb, as estimated by Kendall’s Tau B statistic, were included.

The number of autosomal markers included in the dataset is 137,662.


The dataset is freely available in Plink bed format. (Netherlands.rar; 18 MB)