Inference of Evolutionary Dynamics from Static Data:

Evolution of Genomes

Evolutionary processes leave complex signatures written in genomes. This is especially true for adaptation driven by strong adaptive mutations arising de novo that lead to higher rates of substitutions between species, distortions in the pattern of genetic variation known as selective sweeps, and genetic draft. Our laboratory has pioneered a number of approaches aimed at quantifying adaptation using these signatures and provided evidence that adaptation is both widespread and often driven by genetic variants of surprisingly large individual effect both in Drosophila and in humans (Macpherson et al, 2007; Cai et al, 2009; Messer and Petrov, 2013; Enard et al, 2014). We have also argued that adaptation even by de novo mutations should commonly lead to soft selective sweeps in which multiple adaptive mutations sweep simultaneously and also provided evidence that soft sweeps were indeed common in Drosophila. We also used these approaches to define viruses specifically and likely pathogens in general as key drivers of adaptive change (Enard et al, 2016; Enard and Petrov, 2018; Ebel et al, 2017).

We continue developing new statistics for the analysis of the genomic data that incorporate both demography and selection. The statistical developments will utilize ABC and ML approaches using forward simulations and in particular SLiM (initially developed in our lab by Philipp Messer). We are particularly excited by the power of methods that focus on the joint allele frequency distributions of polymorphisms at varying distance from each other which is underutilized by molecular population genetics. We are also focused on generating high-resolution comparative population genomic datasets such as the family-level population genomics dataset described in detail below.

High-resolution measurements of constraint and adaptation in the Family Drosophilidae

James, a graduate student in the lab is setting up a fly trap during a field trip in Hawaii. Photo by Bernard Kim.

A major challenge in population genomics is that our current datasets of natural genetic variation do not sufficiently resolve the heterogeneity in natural selection that exists across all levels of biological organization. Today’s datasets are mostly limited to one genome per species, and population genomic data are biased towards key taxa even within model groups. We are addressing this problem by leading an effort to systematically generate and analyze population genomic data for hundreds of species at the scale of the insect family Drosophilidae.

In collaboration with a large number of Drosophila labs across the world, we developed an approach to assemble inexpensive yet high-quality genomes, even with as little material as from a single fly. We have sequenced drosophilid samples freshly collected from the wild, laboratory lines including those at the National Drosophila Species Stock Center, and ethanol collections (Kim et al, 2021; Kim et al, 2023). So far, this project has resulted in hundreds of genomes, and greatly improved representation of lesser-studied drosophilid clades in genomic data, and we intend to sequence as many of the >4,000 species in the family as we can get our hands on.

Dmitri caught a fly during a field trip in Hawaii. Photo by Bernard Kim.

Alongside this work, we have collected population samples for hundreds of drosophilid species and have sequenced or are in the process of sequencing them now. These approaches will reveal the signatures of evolution at an unparalleled resolution, at multiple scales of functional organization from amino acids to clades. Papers forthcoming very soon.

We are always open to collaboration — please reach out! We will sequence at no cost to you and the only stipulation is that data are publicly available in a reasonable timeframe. This is all managed with open science principles in mind.

Previous
Previous

Experimental evolution with yeast

Next
Next

Rapid adaptation of Drosophila