Software

Below is a list of software I recommend for different tasks.

Here is a nice short article by Ignacy Misztal on software in animal breeding: http://nce.ads.uga.edu/~ignacy/numpub/oldpapers/wc94.PDF

One of the main tasks for animal breeders is to estimate (co)variances components in the animal (mixed) model used. This is required to plan breeding programs and run evaluations (calculate EBVs/EPDs/PTAs). We are also interested in population parameters such as heritability as it helps predict the accuracy of EBVs.

BLUPF90

BLUPF90 was developed by Ignacy Misztal at UGA many years ago and since has developed with a small team including Ignacio Aguilar, Daniela Lourenco, Andres Legarra, Yutaka Masuda, etc.

Home: Link

Download: Link

Manual: Link

Ignacy Idea: Link

Ignacy PDF 1997: Link

REML Notes: Link

Genomic: Link

Computational Techniques Ignacy: Link

Legarra PDF Metafounders: Link

HELP: Link

Masuda

Masuda HTML Notes: Link

Masuda PDF Notes: Link

Masuda GitHub: Link

Masuda GitHub 2: Link

ASReml

ASReml was developed by Arthur Gilmour starting in 1996

PAID

Note that this software comes with a user license.

Standalone: Link

R Version: Link

YouTube: Link

Manual 4.2: Link

Manual ASReml-4 v4: Link

Manual ASReml-R: Link

ASReml-R Download Guide: Link

ASReml Cookbook: Link

EchidnaMMS

Looks like Arthur started a new project to clone ASReml.

Home: Link

DMU

DMU was developed at Aarhus by Per Madsen and Just Jensen who are both at Aarhus in Denmark.

Home: Link

Download: Link

Paper: Link

User Guide: Link

GCTA

GCTA is a newer software developed by the Yang lab. I’m not very familiar with this software, but it’s been around since 2011 it appears. It can also be used for GWAS. See the GWAS section as it can also support GWAS analyses.

Home: Link

Genetic evaluations are easily one of the most important aspects of a breeding program. They are different from other software due to their ability to handle very large datasets (e.g. the Dairy industry would have millions and millions of records in their national evaluation through CDCB). Often with alternative solving methods that may not be theoretically pleasing to do research projects with, for instance we don’t do direct inverses in most of these softwares and therefore not able to calculate PEV for individual accuracy/reliabilities.

PAID

Note that most all software with the ability to run evaluations

MiXBLUP

MiXBLUP was developed at Wageningen by several researchers, currently Jan tenapel and Jeremie Vendenplas (to my knowledge). It was developed on top of Mix99 at it’s core. MiXBLUP tends to be more affordable with most other options (see below).

The main limitation is that it cannot be used for variance component estimation to my knowledge.

Home: Link

Download: Link

License: Link

Abstract: Link

BLUPF90

Mentioned above, but is a main software used in the USA (among other countries) by breeding companies to run evaluations. It has the ability to run iteration on data with a license fee.

**See all of the links for BLUPF90 in the Variance Components tab.

Home: Link

BOLT

Developed by Daniel and Dorian Garrick with Bruce Golden originally, today Dorian’s son Daniel Garrick runs and maintains this software suite.

Home: Link

Mix99

Developed in Finland and the base for MiXBLUP (above).

Home: Link

Slides: Link

PEST

DEPRECATED

Paper: Link

Maintaining adequate levels of genetic variation in a population is critical to the long term survival of that breed or line.

There are few software packages today that are still around to deal with this optimization problem.

Matesel

Matesel was developed in Australia by Brian Kinghorn and his son.

This software is likely the most utilized by industry today because of it’s capabilities.

Home: Link

AlphaMate

Home: Link

EVA

Per Berg was with Brian Kinghorn at one point. He developed EVA software later.

Home: Link

Paper: Link

optiSel R Package

Home: Link

Website: Link

Paper: Link

One of the main problems with genomic selection when it began was that we needed quality control and other processing of the genotypes before utilizing them in the evaluations. Just some of the QC needed would include:

Call rates for animals (rows) and SNPs (columns)
Minor allele frequency minimums (often 0.01 or 0.05)
Correlations between inbreeding values
Correlations between off-diagonal elements
Removing parent-offspring conflicts
Removing duplicates or twins (high off-diag correlations)
Many more

calc_grm

calc_grm was developed within the MiXBLUP suite to process SNP chip genotypes. This software will do QC and calculate the A, G, and H matrix.

Home: Link

preGSf90

Very similar to calc_grm but developed with the BLUPF90 suite. Also works with postGSf90 to run GWAS.

Docs: Link

PLINK

Very popular software to process SNP panels.

Home: Link

Genome wide association studies (GWAS) are a way for researchers to determine what SNPs may contribute more than others to the genetic variance of a trait. Often they are looking for SNPs that may explain a large percentage of the genetic variance (e.g. 10%). There are both frequentist and Bayesian methodology and countless ways to summarize the results.

JWAS

Hao Cheng started this software with Rohan Fernando and Dorian Garrick while at Iowa State University as a PhD student. He then moved to UC Davis as an assistant professor and continued developing it.

Home: Link

GitHub: Link

Gensel

DEPRECATED

JWAS was originally based on this software developed at Iowa State University by Dorian Garrick and Rohan Fernando. I do not think it is still maintained, please use JWAS.

Not available See JWAS

postGSf90

Developed within the BLUPF90 suite of programs. The solving method for postGSf90 uses the so-called EMMAX method, however computationally efficient by dividing the backsolved SNP effects (from a GBLUP run) and dividing by the SE of SNP Effects.

Docs: Link

BGLR

BGLR was developed by Gustavo de los Campos and Paulino Perez-Rodriguez at MSU. BGLR can do many of the Bayesian regressions for GWAS and genomic prediction.

GitHub: Github Link

2014 Paper: Paper Link

2022 Paper: Paper Link

GCTA

GCTA is a newer software developed by the Yang lab. I’m not very familiar with this software, but it’s been around since 2011 it appears. It can also be used for GWAS. Also see the variance component section as it can do both.

Home: Link

Imputation is the process of predicting missing genotype calls for SNPs, often from a lower density (e.g. 10k SNP chip) to a higher density (e.g. 60k SNP chip). However, it can be used to simply impute missing values in a genotype matrix. Most of them first estimate the haplotypes in the population then extrapolate between the observed haplotypes in the smaller chip to the larger chip.

Beagle

Home: Link

AlphaImpute

Originally developed under John Hickey.

GitHub: Link

AlphaPeel

Originally developed under John Hickey.

GitHub: Link

FImpute

PAID

NOTE: FImpute is paid for commercial use.

Home: Link

Breed composition is the process of computing what percent of each genetic line makes up each individual. Purebreds ideally would be 100% one breed, however often this is not the case due to pedigree mistakes over the years in breeding programs. These mistakes are unavoidable and most companies will admit 2-5% pedigree errors in swine, which can be much higher in other species.

These estimates are also good to fit for crossbred models in CCPS I believe.

My personal experience showed that the last method, regression on the allele frequencies works very well if the lines are well known and you have good allele freq estimates. Admixture allows you to fix the lines, but I didn’t see any advantage to it.

Structure

Home: Link

Admixture

Home: Link

Allele Freq Method

This method is very simple, you do a normal or constrained regression on the allele frequencies (divided by 2) for each line and see how the regression fits each line. The coefficients will tell you what percent they are.

Normal Regression:

Kuehn Paper:: Link

Constrained Regression:

Funkhouser Paper:: Link

Scott’s Github: Link

Many questions in the past were done with quantitative genetic and animal breeding theory with deterministic equations. This was very useful and powerful, however there are many assumptions and limitations to this work that may or may not represent a real world breeding program. For that we need simulation to mimic real breeding programs in terms of structure, selection, matings, and evaluations.

AlphaSimR

AlphaSimR is from the Alpha Suite of tools developed by John Hickey’s group at Roslin. It had a headstart on MoBPS. Chris Gaynor is still developing this software at Bayer I believe.

GitHub: Link

Paper: Link

MoBPS

Was written in R starting in Henner’s lab, Torsten Pook did a lot of the work.

This is very much in development at Wageningen now and Dr. Pook is doing a lot of improvements.

GitHub: GitHub Link

Paper: Paper Link

QMSim

QMSim was developed by Mehdi Sargolzaei as an affiliate at Guelph.

NOTE: This software is still very good, but it’s fixed, meaning you cannot really change the breeding program that much besides what is programmed. Most people have now turned to AlphaSimR or MoBPS.

Home: Link

ADAM

Paper: Link

Selection index, also known as economic selection index is the process we take to combine the EBVs from many traits into a single index. We need to understand the accuracy of such an index as well as the weights for the selection criteria after calculating the economic values for each trait.

SelAction

SelAction was developed at Wageningen by Rutten, Bijma, Woolliams, and van Arendonk. The goal of the software is to help design breeding programs and calculate things such as the accuracy of your index and other parameters related to selection indexes.

Paper: Link

SelAction 2

Jack Dekkers and some postdocs have worked on this software. It is still in development, but hopefully done before too long.

Conference Abstract: Link

Here are a set of miscellaneous programs.

OpenMendel

Julia implementation of statistical genetics analysis.

Home: Home Link

Here are my thoughts on ABG software in general, not specific to any one software or software suite:

Most animal breeding software is bush league as almost all of us are not trained in computer science as most of us are self taught (e.g. I have never taken a programming class). The one exception I know if Ignacy Misztal who Dan Gianola convinced to come to Illinois (I believe) to program the threshold models Dan worked on in the 1980’s.
Many projects are free to use but not maintained as it may have been a small part of their PhD research or something. They have then little or no incentive to maintain it or show others how to use it. I’m guilty of this, I mostly share on GitHub to allow others to look at my code and nothing more.
You are not paying, therefore they have no responsibility (or feel none) to explain how to use their software, except to commercial entities who are paying for it.
Some are just lazy af…
Often we start programming with little to no planning, unlike software companies, and then build on top of it leading to 90% spaghetti code. In contrast to planning for all future features we will need. Genomics was an exception as when most software started, there was no way to predict this technology would be implemented and how.
Many simply don’t know how to write documentation correctly and haven’t studied it. You can find different types of documentation online if you search (e.g. a function definition vs a ‘cookbook’ style). Many articles on this out there to read up on.
Out of habit, almost everyone I see write code doesn’t even write comments for themselves later and there is no way to tell what code even does by looking at it (just go to GitHub and start looking…..). Especially bad as you get to lower level languages. This makes it impossible to contribute in an open source way as no one knows what anything does without an intricate knowledge of what you are doing. Higher level languages are much easier to determine but still can be tricky without comments.
Many projects were started long ago, when Windows dominated vs today when MacOS and Linux are extremely popular (especially in academia), Windows is still common in companies for some reason and IT loves it for some reason (because we all need printers in 2024 I guess).
Testing and stress testing software is not done. The only example would be CRAN forces users to make sure their software runs on multiple systems, which is a pain, but keeps the R CRAN network very robust.
Most of them developing software have a very strong conflict of interest (COI) in that they don’t want to actually teach others how the algorithms are implemented because this is their internal competitive edge, I know first hand stories of this happening. So many times it’s difficult or impossible to know how to speed up routines to get them to process at industry speeds.