Compute resources (HPC and cloud)
The Bioinformatics Hub has access to the following compute resources, as well as the expertise and knowledge to leverage the right infrastructure for the right job.
High Performance Computing (HPC) infrastructure
Most HPC facilities are built around clusters of computers which are physically co-located e.g. in the same data centre or the same server room rack. Software is then used to "glue" these resources together into an aggregated pool of resources which are then available to users of that system.
The infrastructure is centrally managed, so users need only concern themselves with running the software they require of performing analyses. Users typically access the HPC resources via the command line.
The Bioinformatics Hub routinely use the following local HPC facilities:
A cloud is simply a geographically distributed bunch of computer hardware (typically commodity hardware) with some software "glue" to provide an aggregated pool of resources for the user. Users then create 1 or more virtual machines (VM) on the cloud and are required to configure/administer the VMs themselves.
The Bioinformatics Hub routinely uses the following cloud facilities:
HPC vs cloud
Cloud computing is designed to be horizontally scalable. That is, data can be split into smaller chunks and run on multiple computers across the cloud. As such, cloud computing is most suited to Embarrassingly Parallel Problems (EPP), where little to no communication is required between computers. Problems which require more communication between computers take a significant performance hit due to possibly large geographical distances between computers.
HPC on the other hand is designed to be vertically scalable. That is, a single instance of an analysis tool can be designed to utilise the distributed compute resources available. As such, HPC is most suited to problems where communication between computers needs to be fast, i.e. problems which are not EPP.
Cloud providers, especially those in the commercial sector, have realised that to be useful for non-EPP problems and to be competative with the vertical scalability of HPC, they must provide a mechanism for users to constrain their VMs to be geographically closer to each other. As such, some cloud providers now offer a service for "HPC in the cloud" which does this. However, HPC is still superior for these classes of problems, as this is exactly what it is deisnged to do.
For more information, see eRSA's From PC to Cloud or HPC self-paced training module.
Hot-desking and rooms
The Bioinformatics Hub provides an open workspace with hot desks for students and researchers to allow greater collaboration on complex problems. The open workspace is also ideal for collaborators to grab a coffee, meet & discuss their work, or for supervisors to talk through work with post-graduate students.
All meeting rooms are equipped with video facilities & are ideal for presentations to be made to research groups, or for video conferencing with remote collaborators. Please book meeting rooms by contacting the Bioinformatics Hub.
Genomics Data Repository
An R Package for managing FastQC reports and other NGS related log files inside R.
- Github repository: https://github.com/UofABioinformaticsHub/ngsReports
- Citation: Ward, C.M., To, H. & Pederson, S.M. (2018) ngsReports: An R Package for managing FastQC reports and other NGS related log files. bioRxiv doi: 10.1101/313148
Check strandedness of RNA Seq alignments. Also able to be used for removal of genomic DNA sequences for libraries with incomplete DNA removal
- Github repository: https://github.com/UofABioinformaticsHub/strandCheckR
- Bioconductor: https://bioconductor.org/packages/strandCheckR/