SPS22-80UL

ped_sim - A flexible forward genetic simulator of complex family pedigrees

By: Kaela Syas

Department: Applied Mathematics

Faculty Advisor: Dr. Rori Rholfs

Large-scale family pedigree trees are essential in multiple disciplines of medical and evolutionary sciences. While family trees provide valuable information of familial kinship, frequently, this data comes without genetic information. Simulating genetic data onto family pedigrees will help facilitate a better understanding of forensic identification tools, or even the heritable basis of complex diseases in a family. Additionally being able to run multiple genetic simulations of the same family will help establish accurate confidence intervals in investigating questions and statistics of cryptic elatedness. Currently, no software exists to take in common pedigree files formats and simulate genetic information to resemble relatedness based on expected kinship. SLiM, a forward population genetic simulator, can simulate family pedigrees but requires information not found in standard .ped files. We created a python command-line-based tool to simulate genetic information from pedigrees, providing a simple interface to perform complex pedigree genetic simulations using SLiM. In python, we represent families as directed graphs, which we utilize to convert into a format that SLiM needs to perform the genetic simulation. Our software provides the flexibility for inputting incomplete pedigrees, as well as initializing the founders with empirical or simulated genetic data. Finally, our software can simulate pedigree structures, family pedigrees, where we draw the number of children per generation based on a user-specified distribution. We validate our software by estimating the kinship of pairs of genetic relatives in our simulations, initializing the family founders with data from the 1000 genomes consortium. We show that the kinship estimated from the simulations fit with expectations based on the genetic relation. Comparing genetic data across large-scale pedigrees will be helpful to study questions of fine-scale population structure, such as rare variant sharing, providing insight into IBD sharing across distantly related individuals. Overall, ped_sim will provide an open-source solution to pedigree and genome simulations inside medical, forensic, and evolutionary genetic analysis.