The FAIRDOM Team are proud to announce the launch of the 2016 series of Webinars for Data and Model management Practice in Interdisciplinary Life Sciences. These webinars will interest students, researchers, project investigators, lab managers, institutions, publishers, data providers… anyone with data or models to manage.
Join the first webinar of the season:
“Future-proofing your data: working to ensure your work will survive the connected world”
Dr Steven Wiley,
Lead Scientist for Systems Biology, Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, USA.
25 January 2016 - 14.00 - 14.45 GMT
Biology is increasing becoming a data driven science in which high-throughput analytical platforms are generating data at an ever-increasing rate. Traditionally, the primary mechanism of communicating biological information was the journal article, which is essentially a description of how scientific groups interpret their own data with a few anecdotal data examples thrown in for support. In the future, however, the data itself will be the most useful output of primary scientific research. To enable this transformation, data must be available in a form that is easily discoverable with sufficient metadata to permit quality assessment, normalization and integration. It is also highly likely that future biological data analysis system will exploit noSQL database systems for scalability. Thus, to ensure future use of currently generated biological data, there should be a clear migration path to these future systems. We have explored what is needed to ensure compatibility with these future systems by using the integration of high-throughput genomics, transcriptomics and proteomics data with a noSQL system as a use case. From this work, we have been able to define minimal metadata standards that permitted data normalization and integration across different sample types and experimental conditions. We have also defined a flexible data framework using unique sample IDs as key values that is compatible with both relational and Hadoop/HBase systems. We have found that the types of metadata needed for data reuse and integration is highly dependent on the specific target user. We also found that essential metadata were distributed across a wide variety of different primary data files, requiring multiple mechanisms, interfaces and processes for their capture. Our experience suggests that because of the distributed and multidisciplinary nature of biological data generation and analysis, multiple types of software systems and interfaces will be required for data capture and dissemination. To integrate and reuse that data, however, will require the adoption of a universal metadata framework that is linked to the associated primary data files and/or data repositories.
FAIRDOM's primary mission is to support researchers, students, trainers, funders and publishers by enabling Systems Biology projects to make their Data, Operating procedures and Models, Findable, Accessible, Interoperable and Reusable (FAIR).