The Genomics Repository allows the university to centralise and simplify genomic data storage, analysis and collaboration with the assurance of regular data backup.
It provides researchers an enhanced research capability and collaborative opportunity in the analysis of large next-generation sequencing datasets.
From within the repository researchers can store and share massive genomic research datasets as well as add these large datasets to their personal workspace to set up data processing pipelines and workflows that process directly on Phoenix High Performance Computer (HPC). Once Phoenix has analysed the data, output file/s will be returned to the researcher's workspace for downstream analyses. Available in the repository are pre-prepared, standardised workflows that non-computational researchers who may not have specialist data processing knowledge, can access and reuse to perform their research.
By storing genomic data in a centralised backed up storage repository, researchers are able to re-use genomic data with University collaborators enabling additional outcomes to be found from a single dataset.
How does it work?
The repository enables researchers to add large datasets to their personal workspace and use standardised workflows to run on the University's Phoenix High Performance Computing (HPC) system. Once Phoenix has analysed the data, output file/s will be returned to the researcher's workspace for downstream analyses.
Incremental backups of the Genomics Repository are performed daily with full backups being performed monthly.
Who can use the Repository?
The Genomics Repository is largely utilised by Researchers for approved University of Adelaide research data processing.
Higher Degree Research Students can utilise the Repository under the approval and supervision of their supervisor.
Who is responsible for the Repository?
There are multiple layers of responsibility for the Genomics Repository.
Users of the Repository
When you agree to use the Repository you will need ensure your use of it is for University of Adelaide purposes, unless you have prior approval from the Bioinformatics Hub team. All users of Repository are required to clean and maintain their own data that is analysed. You can choose to share the data that is stored in the Repository, to enable other researchers to access the same datasets rather than having duplications.
For governance, approvals, investment in or feedback on the Genomics Repository, please contact the Bioinformatics Hub.
Phone: +61 8 8313 1207 (ex. 31207)
If you are having issues with the technology infrastructure of the Genomics Respository, eg if it is not working, you should contact the Information Technology and Digital Services (ITDS) Service Desk (see the Additional support section details below).
Features and benefits
The Genomics Repository has been designed to:
- Provide a secure and backed up storage for genomics data (daily incremental backups and monthly full backups)
- Enable researchers to safely store, analyse and share large genomic dataset
- Centralise the store of large genomics datasets, reducing the need to store them in multiple locations
- Set reminders to trigger archiving raw data (manual)
- Allow data to be exported
- Allow other researchers the ability to re-use genomic data and workflows to enable additional outcomes to be found from a single dataset
- Provide a facility for ‘processing genomics pipelines’ including:
- An analysis Workflow Description Language
- Workflow repository allowing workflows to be shared
- Building workflows
- Manage processing time/costs (Reporting purpose only)
- Directly accessing and using Phoenix HPC for compute.
The Genomics Repository has extensive data storage capabilities (200 TB) enabling researchers to store and process data in a timely manner with the assurance of regular data backup. The ability to share datasets within the repository enhances the capability of collaboration across research groups. Bioinformaticians and non-computational researchers benefit from this critical platform capability that can be utilized without specialist data processing knowledge. The repository facilitates the reproducibility of complex datasets within the University This Genomics repository supports the ability to provide completely reproducible data processing workflow that can be included in high-quality research outputs.
The Genomics Repository is one of a number of different research computing options available to researchers at the University. Guidance on selecting the most appropriate option for you research can be found here: High Performance Computing Options.
The following resources are available if you'd like to learn more:
- User guide (subject to change with system improvements)
- Presentation about the Genomics Repository
- Component diagram
- Workspace lifecycle diagram
- Cleaning your workspace (data lifecycle)
- Simple data movements diagram
Workflow Description Language (WDL)
What is high-throughput sequencing?
A genome is an organism's complete set of DNA stored in long molecules of DNA called chromosomes. In order to study the genome of an organism, high-throughput genome sequencing technologies have been developed that produce a large amount of data. Other sequencing approaches, such as the sequencing of genes from the genome that are expressed (RNA-seq or transcriptome sequencing), can also be carried out using these sequencing approaches allowing researchers to study many elements of the organism's genetic system at low cost.
How can the Genomics Repository help researchers?
The Genomics Repository will enable the university to centralise and simplify genomic data storage, analysis and collaboration. The goal is to provide bioinformaticians and non-computational researchers with the tools they need to perform their research without specialist data processing knowledge as there are pre-prepared processing commands available in the repository. This will result in a more efficient preparation and execution of data analysis, allowing for completely reproducible data processing workflow that can be included in high-quality research outputs. Additionally, by enabling a system where genomic data is centralised in a backed up storage repository, researchers will now be able to re-use genomic data with University collaborators enabling additional outcomes to be found from a single dataset.
How much does it cost?
The Genomics Repository is free to use. You will not incur any costs or ongoing charges.
How do I access it?