... / ... / Resources / Genetic Algorithm for Genetic Ancestry (GAGA)...

Genetic Algorithm for Genetic Ancestry (GAGA) Method

GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans.
Oscar Lao, Fan Liu, Andreas Wollstein, Manfred Kayser. Plos Computational Biology, February 2014, Volume 10, Issue 2, e1003480

Genetic Algorithm for Genetic Ancestry (GAGA) is a program developed to partition a distance matrix between pairs of objects into K clusters in order to minimize the amount of mean differences within each group (SSD'(WP) statistic). Therefore, GAGA is an algorithm for unsupervised clustering. It can be regarded to a dimensional reduction technique analogous to well known multivariate techniques such as Principal Component Analysis (PCA) or Multidimensional Scaling (MDS). When applied to genetic data, each object is one individual and the distance relates to any type of defined genetic distance. Clearly, results will depend on which distance is used and how informative this distance is. In our paper " GAGA: a new algorithm for genomic inference of geographic ancestry reveals fine level population substructure in Europeans" we propose a matrix transformation that we call V, which we show increases the number of closest genetic neighbours that are in the same population.
Another point that is not controlled by GAGA is the number of markers that are used and how to correct for the presence of Linkage Disequilibrium.

First we strongly recommend to read the .pdf in the document folder.
The folder gaga contains:

  1. gaga.jar -> the java program
  2. doc folder -> a folder with the pdf document
  3. example folder -> a folder with an example of the input format and the output if gaga is run with the parameters described below.
  4. lib folder -> a folder with the JAVA libraries required to run gaga. DO NOT MODIFY THIS FOLDER!