In this lab we are working on a number of different projects related to annotation and curation of the bovine genome sequence. These include:
- Development of the Bovine QTL viewer and database.
- Development and maintenance of the Bovine Composite Map and Bovine BAC map.
Genome Object Oriented Framework (GOOF).
- Using the equine genome as a target database and application, we have generated a novel, Genome Object Oriented Framework (GOOF) to simplify the creation, maintenance and use of genome data. GOOF is fully JAVA based and CHADO compliant and will be ready for beta testing in early 2008.
Repeat Identification and Annotation.
- As part of ongoing participation in the Bovine Genome Sequencing Project and the Equine Genome Sequencing Consortium we are identifying and annotating repeats within these two genomes. All repeat identification is done de novo and subsequently annotated based on known retroelement repeats.
- We have created a pipeline to identify and annotate DNA repeats in mammalian genomes, using two pre-exisiting tools (PALS/PILER and RepeatScout) which had previously not been used on an entire mammalian genome. The pipeline breaks up the genome into manageable chunks to run PALS in a parallelized fashion on a computer cluster. The chunks are then concatenated at the chromosome level and used as input for PILER, generating clustered, consensus sequences for repeats on each chromosome. RepeatScout was run on individual chromosomes and its output converted to make it compatible with PILER output. To identify redundancy across chromosomes, consensus sequences and RepeatScout output were aligned to each other using WUBLAST. Redundancy was minimized by clustering the consensus sequences along with the RepeatScout output on the basis of the WUBLAST output to generate globally alignable non-redundant consensus sequences. In this fashion we have identified many previously known repeats and a number of heretofore unknown repeats present at both low and high copy number.
- By analysing interspersed repeat data we have found underlying correlations with respect to repeat numbers/insertions in mammalian genomes. An example of this type of correlation analysis is shown in the figure below: