A high throughput gene, environment and epigenetics database and analysis system for international ALS research
published: July 21, 2017, recorded: May 2017, views: 3
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Genetic technology is advancing rapidly. We now have the ability to quickly and cheaply collect huge amounts of genetic information, and because of close collaboration between research groups, we can do this with tens of thousands of people. The problem is how to store, handle and easily share this information. This project is a collaboration between researchers working on motor neuron disease, and computer scientists working with biological information. We aim to develop a computerized system that will let researchers easily use genetic, clinical and lifestyle information that has already been collected, and add new information as it is produced. The system will make it easy to see patterns in the relationship between clinical features, lifestyle, and gene variations, to compare genetic variations between groups of people, and to share the information between research groups. We are implementing a solution that will enable the sharing of huge raw sequencing data as well as small files for summary results. For raw and processed data we used iRODS, an integrated Rule-Oriented Data System, developed to build distributed storage infrastructure. Through data virtualization several iRODS servers in different locations can share and manipulate their data through automatic mechanisms based on internal rules. This would facilitate sharing, curating and the analysis of the huge amount of data that our genetic research is producing. An iRODS system able to host and deal with petabytes of genetic data, has been deployed on Rosalind, our BRC/ King’s College London HPC cluster. Data is accessible both through a user friendly web browser and the command line. Results and summary statistics data, along with the clinical data, will be loaded into the TranSMART platform. The TranSMART system is a platform for translational medicine comprising a relational database back end and a web based interface that integrates a large number of open source bioinformatics tools for analysis and visualization. This platform will provide general user access to processed data and will allow for cohort selection and analyses on the fly. We are also implementing community driven metadata standards and pipelines for their extractions and data analysis which can be automatized using iRODS. All Pipelines will be available on github together with iRODS Docker images to allow any member of the ALS research community to quickly deploy their own iRODS timescale of hours.
Download slides: encals2017_iacoangeli_epigenetics_database_01.pdf (1.9 MB)
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !