Analytics & Data Science
Data Bridge for Neuroscience: A novel way of discovery for Neuroscience Data
Description
Neuroscience is at an inflection point where more and more data are being aggregated and shared through common repositories. The main challenge facing the neuroscience community in the Big Data era is the difficulty of discovering relevant datasets across these repositories. The complication of effective discovery and identification of relevant data forms the last mile problem for long tail of science data. Solving this problem can increase the value of the data through reuse and repurposing and can immensely benefit the NSF Brain Initiative by providing increased access to data in its various thematic areas. This project will initiate studies on the application of a novel data discovery system for Neuroscience based upon a platform called DataBridge, which the project team has developed under a grant from the NSF Big Data program. DataBridge applies "signature" and "similarity" algorithms to semantically bridge large numbers of diverse datasets into a "sociometric" network. The DataBridge for Neuroscience platform will attempt to harness complex analytics algorithms developed by neuroscience researchers in order to extract key signatures and find data associations from large volumes, and diverse collections, of so-called "long-tail" neuroscience data. By providing a venue for defining complicated search criteria through pattern analysis, feature extraction and other relevance criteria, DataBridge provides a highly customizable search engine for scientific data. This project will conduct a preliminary, feasibility study on the applicability of DataBridge on Neuroscience data with two goals: 1. Implement a pilot DataBridge system for Neuroscience and demonstrate a proof of concept for semantically bridging a small collection of neuroscience datasets, and 2. Conduct a workshop to develop a coalition of users from the neuroscience community in order to build a sustainable DataBridge-based infrastructure for neuroscience. This community-based activity will leverage the community infrastructure created by the NSF Big Data Hubs and Spokes program.
RENCI's Role
RENCI was responsible for interfacing with members of the Neuroscience team to develop, implement and run domain specific semantic algorithms appropriate for the corpus of datasets used in the project. RENCI also chaired the Workshop for Advanced Data Discovery for Neuroscience, held in conjunction with the Neuroinformatics 2018 Conference.