NALgen

Non-negative matrix factorization, as implemented in the LEA package, is a case of unsupervised machine learning; LEA uses least-squares estimates of ancestry coefficients.

Using minimum cross-entropy (MinXEnt) to determine the number of genetic clusters (K) is not always straightforward.

The output of multivariate analyses in the NALgen Analysis Pt.1 post was dense (935 lines of code, 121 figures). In this recap, I wanted to highlight some of the results. Furthermore, I wanted to mention that the population structure inference methods I compared were all machine learning (classification) algorithms.

Here, I present the prequel to the NALgen method: using multivariate analyses to infer population structure in the simulated data as well as to find covariance between genetic and environmental data. In upcoming posts about modeling neutral and adaptive genetic variation across continuous landscapes, information about population structure will be used in creating the response variable, while transformed environmental data (based on genetic-environmental covariance) will be used as predictors.

To evaluate the NALgen method of modeling neutral and adaptive genetic variation across continuous landscapes, I first generated several landscapes, and then simulated genotypes based on the generated landscapes. In this post, I break down the simulation steps.

Description of the NALgen method and summary of analyses.

NALgen Analysis Pt.0

Cross-Entropy and Clustering

NALgen Analysis Pt.1 - Recap

NALgen Analysis Pt.1

Landscape Genetic Simulations

NALgen Summary