
Genetic linkage maps are an important tool for dissecting the molecular basis of complex traits. Once restricted to a few model organisms, inexpensive molecular marker technologies maps have caused an explosion of genetic mapping activity in a diverse array of organisms. We have developed methods for the design of mapping experiments that allow molecular markers and quantitative trait loci to be mapped with greater resolution than is conventionally obtained. We have also developed methods to position high density markers relative to one another faster and with greater precision than was previously possible. These methods, which have been applied to large-scale mapping projects in a variety of different organisms, are available as part of the MapPop software package.
As genetic and physical maps become available for an ever increasing number of species, they collectively become more useful for addressing comparative questions about the macroevolution of chromosomal structure. They also become more useful for the prediction of genome structure in genetically intractable organisms, a category which includes many economically important crops We have been developing new computational methods for map comparison. Our software program FISH, helped to put the identification of ancestrally related regions within or between genomes on a firm statistical footing and to vastly speed up the identification of such regions, making large-scale comparisons possible.
We have been using comparative mapping to study the macroevolution of plant chromosomes. The first complete genome sequence of a plant, that of Arabidopsis thaliana, revealed how challenging this task would be. In the first large-scale genome sequence comparison between different families of plants, we found that at least four different regions in Arabidopsis shared common ancestry with a single ~100 kilobase fragment of the tomato genome. This finding led us to hypothesize two rounds of ancient genomic duplication in the lineage leading to Arabidopsis. Gene loss in the duplicated regions has been extensive, leading us to hypothesize that gene loss following genome duplication could be the major process driving gene order rearrangement in plants. By subsequent analysis of the nearly-complete Arabidopsis genome sequence, we discovered that there were even more ancient, large-scale duplication events that could be detected. Three of these are now well established and their approximate ages are known. Such degenerated large-scale paleoduplications have also been found in other plant lineages. These paleoduplications serve as a springboard for us to study the patterns and processes of gene loss, functional divergence between retained duplicates, and gene order rearrangement.
We believe that comparative genomics holds great promise for plant biologists but that there are some serious obstacles to its adoption by bench scientists: the dispersed nature of the data, the bewildering array of software tools and data formats, the computational resources needed to run many of these tools, and the difficulty of digesting the output from large-scale comparative analyses. To overcome these difficulties, and provide a glue that binds together the numerous taxon-specific genome databases in plants, we are developing a web-based plant comparative genomics database called Phytome. We are continually adding to the functionality of Phytome. It currently serves as a source for gene family information (alignments, phylogenies, protein domain identifications and functional assignments) from 39 species of plants (predominantly angiosperms) for which large-scale sequence data is available. We are currently integrating map data, together with novel software tools which will allow prediction of gene content in mapped, but unsequenced, regions of plant genomes.
This work is supported by the National Science Foundation Plant Genome Research Program. To see the fruits of our labors, go to the Phytome website.