The document discusses the challenges of managing and utilizing the large amount of neuroscience data being generated. It notes that currently, about half of researchers only store data in their own labs and many lack funding for proper archiving. The National Information Framework (NIF) is working to address these issues by creating a catalog and federation of neuroscience resources to facilitate discovery, access, analysis and integration of data. NIF has assembled the largest searchable collection of neuroscience data on the web using an ontology and technologies that can search the "hidden web" of resources.
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...Maryann Martone
The NIF Registry provides insight into the state of digital neuroscience resources on the web. It has cataloged over 6,000 resources, including more than 2,200 databases. While some resources disappear over time, many more grow stale as they are not updated regularly. Maintaining an up-to-date registry requires frequent updates. The NIF data federation can search over 200 databases containing over 1 billion records. This collection continues to grow as new databases are added. The NIF utilizes ontologies and semantic frameworks to integrate data across diverse sources and provide insights into the neuroscience landscape.
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
Overview of the Neuroscience Information Framework and how it brings together data, in the form of distributed databases, and knowledge, in the form of ontologies to show the mapping of the dataspace and places where there are mismatches between data and knowledge.
the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
Neuroscience research increasingly relies on large, heterogeneous datasets from various sources. Integrating these diverse data types and making them accessible presents challenges. The NIF (Neuroscience Information Framework) addresses this by creating a federated search engine and unified interface to access multiple neuroscience databases. NIF aims to make neuroscience data more discoverable, accessible, and usable through techniques like unique identifiers, metadata standards, and semantic integration. This will help researchers more effectively find and use relevant neuroscience information.
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
The document discusses using the Neuroscience Information Framework (NIF) to reveal knowledge gaps in neuroscience. It summarizes that NIF aims to maximize awareness, access, and utility of neuroscience research resources by uniting information from over 200 databases containing over 400 million records. However, it notes that certain domains may still be underrepresented due to biases in available data driven by factors like funding priorities. The framework uses ontologies to help integrate diverse data types and link them with defined concepts, but notes that neuroanatomical structures in particular pose challenges due to inconsistent naming conventions across studies.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a portal for finding and utilizing web-based neuroscience resources. NIF provides a consistent framework for describing various resources like databases, literature, and images. It allows simultaneous searches across these different data types and is supported by neuroscience ontologies. NIF currently catalogs over 5,000 resources and is working to integrate these diverse data sources to help answer questions and discover gaps in our knowledge about the brain.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a consistent framework and portal for discovering and utilizing web-based neuroscience resources. It summarizes the goals of NIF in indexing over 2000 databases and making their content searchable through an expansive neuroscience ontology. The document outlines the history and development of NIF, describes its search capabilities and use of ontologies, and provides examples of tools and resources that integrate NIF services like the Whole Brain Catalog.
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...Maryann Martone
The NIF Registry provides insight into the state of digital neuroscience resources on the web. It has cataloged over 6,000 resources, including more than 2,200 databases. While some resources disappear over time, many more grow stale as they are not updated regularly. Maintaining an up-to-date registry requires frequent updates. The NIF data federation can search over 200 databases containing over 1 billion records. This collection continues to grow as new databases are added. The NIF utilizes ontologies and semantic frameworks to integrate data across diverse sources and provide insights into the neuroscience landscape.
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
Overview of the Neuroscience Information Framework and how it brings together data, in the form of distributed databases, and knowledge, in the form of ontologies to show the mapping of the dataspace and places where there are mismatches between data and knowledge.
the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
Neuroscience research increasingly relies on large, heterogeneous datasets from various sources. Integrating these diverse data types and making them accessible presents challenges. The NIF (Neuroscience Information Framework) addresses this by creating a federated search engine and unified interface to access multiple neuroscience databases. NIF aims to make neuroscience data more discoverable, accessible, and usable through techniques like unique identifiers, metadata standards, and semantic integration. This will help researchers more effectively find and use relevant neuroscience information.
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
The document discusses using the Neuroscience Information Framework (NIF) to reveal knowledge gaps in neuroscience. It summarizes that NIF aims to maximize awareness, access, and utility of neuroscience research resources by uniting information from over 200 databases containing over 400 million records. However, it notes that certain domains may still be underrepresented due to biases in available data driven by factors like funding priorities. The framework uses ontologies to help integrate diverse data types and link them with defined concepts, but notes that neuroanatomical structures in particular pose challenges due to inconsistent naming conventions across studies.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a portal for finding and utilizing web-based neuroscience resources. NIF provides a consistent framework for describing various resources like databases, literature, and images. It allows simultaneous searches across these different data types and is supported by neuroscience ontologies. NIF currently catalogs over 5,000 resources and is working to integrate these diverse data sources to help answer questions and discover gaps in our knowledge about the brain.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a consistent framework and portal for discovering and utilizing web-based neuroscience resources. It summarizes the goals of NIF in indexing over 2000 databases and making their content searchable through an expansive neuroscience ontology. The document outlines the history and development of NIF, describes its search capabilities and use of ontologies, and provides examples of tools and resources that integrate NIF services like the Whole Brain Catalog.
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
Large amounts of scientific data remain uncurated, especially small datasets, which are currently invisible or "dark data". This dark data should be curated locally with the involvement of non-scientists at long-lived institutions like libraries and museums that have experience managing scholarly information over time. New roles and skills are needed for data scientists, digital curators, and biological information specialists to help address this problem by developing the necessary cyberinfrastructure, data standards, and educational programs to make more scientific dark data accessible.
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
How do we know what we don't know? Exploring the data and knowledge space th...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative that aims to catalog and integrate neuroscience resources and data. NIF surveys the neuroscience resource landscape, currently cataloging over 3000 databases and datasets. It provides semantic integration of these resources through the use of ontologies and allows deep search of aggregated data. However, significant amounts of neuroscience data and resources remain inaccessible in publications, databases, and file drawers. Barriers to data sharing include lack of incentives, standards, and resources. NIF and related efforts aim to develop solutions to make more neuroscience data FAIR - findable, accessible, interoperable, and reusable.
Next Steps for IMLS's National Digital PlatformTrevor Owens
This keynote, at the Upper Midwest Digital Collections Conference, provides and update on the National Digital Platform and 20 projects supported to enhance it. The national digital platform is a way of thinking about and approaching the digital capability and capacity of libraries across the US. In this sense, it is the combination of software applications, social and technical infrastructure, and staff expertise that provide library content and services to all users in the US. As libraries increasingly use digital infrastructure to provide access to digital content and resources, there are more and more opportunities for collaboration around the tools and services that they use to meet their users’ needs. It is possible for each library in the country to leverage and benefit from the work of other libraries in shared digital services, systems, and infrastructure.
We need to bridge gaps between disparate pieces of the existing digital infrastructure, for increased efficiencies, cost savings, access, and services. To this end, IMLS is focusing on the national digital platform as an area of priority in the National Leadership Grants to Libraries program and the Laura Bush 21st Century Librarian program. We are eager to explore how this way of thinking and approaching infrastructure development can help states make the best use of the funds they receive through the Grants to States program. We’re also eager to work with other foundations and funders to maximize the impact of our federal investment
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
A presentation focusing on the data analysis OCLC Research performed on 900K museum records, plus next steps for the nine project museums who now have the capacity to share standards-based records.
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
This document discusses bioinformatics and some of its key concepts and tools. It begins with definitions of bioinformatics as the intersection of biology, computer science, and information technology. It then discusses some of the data formats, tools, and skills used in bioinformatics, including working with nucleotide sequence data, translating sequences into amino acids, and analyzing large datasets. It also summarizes how ontologies are used to represent concepts and how various data types are organized and stored in databases for analysis.
The Seven Deadly Sins of BioinformaticsDuncan Hull
Keynote talk at Bioinformatics Open Source Conference (BOSC) Special Interest Group at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2007) in Vienna, July 2007 by Carole Goble, University of Manchester.
This document outlines a presentation on biological networks and the software Cytoscape. It begins with an introduction to biological networks and their taxonomy, as well as analytical approaches and visualization techniques. It then provides an overview of Cytoscape, covering core concepts like networks and tables, visual properties, and apps. The document demonstrates how to load networks and data, use visual style managers, and save and export networks. It concludes with tips and tricks for using Cytoscape and a link to a hands-on tutorial.
Exploring a world of networked information built from free-text metadataShenghui Wang
This document summarizes a presentation about exploring topics through networked information extracted from free-text metadata. It describes challenges in exploring topics and related aspects. It then demonstrates an online interface called Ariadne that addresses these challenges by generating semantic representations of entities from a large dataset and identifying nearest neighbors and related entities through multidimensional scaling. Finally, it discusses potential applications of this approach and references related work.
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
Towards collaboration at scale: Libraries, the social and the technicallisld
Libraries are now supporting research and learning behaviors in data rich network environments. This presentation looks at some examples focusing on how an emphasis on individual systems needs to give way to a broader view of process, workflow and behaviors.
It also discusses how this environment creates a demand for collaboration at scale among libraries.
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
Presentation of "g-Social - Enhancing e-Science Tools with Social Networking Functionality" given at the Workshop on Analyzing and Improving Collaborative eScience with Social Networks, Chicago October 8th, 2012. Co-located with IEEE eScience 2012.
The document discusses how the nature of library collections and user needs have changed dramatically with the rise of digital resources and the web. It makes three key points:
1) The old model of large print collections housed in libraries that users had to visit has been replaced by digital collections that are available anytime, anywhere. Now over 50% of library budgets go to electronic resources.
2) User expectations and behaviors have changed as well, shaped by Google and other web search engines. Users want quick, self-sufficient searching across all library resources from a single search box.
3) In response, libraries are adopting "discovery services" that aim to provide a unified search experience for all library resources similar to web search
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
This document provides summaries of several upcoming conferences, training programs, videoconferences, and workshops related to metadata and digital libraries. It also summarizes two ongoing projects: the development of a MARC 21 XML schema by the Library of Congress to facilitate the communication and conversion of MARC records, and the Metadata Encoding and Transmission Standard (METS) being developed by the Library of Congress as a standard for encoding metadata about digital library objects.
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
This presentation discusses how a model of “data sharing as publishing” can contribute to developing Linked Open Data resources in archaeology and the study of the ancient world. The paper gives examples from Open Context’s developing approach to data editing, documentation and quality improvement processes. The goal of these efforts is to better align the professional interests of individual researchers with the needs of the larger community to access and use high-quality data in Linked Data scenarios.
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
The Neuroscience Information Framework (NIF) is an initiative of the NIH Blueprint to maximize access to and utility of worldwide neuroscience research resources. NIF catalogs over 10,000 resources including databases, literature, and materials. It provides search capabilities across these resources and develops ontologies and semantic frameworks to integrate diverse data types and scales. NIF aims to make dispersed neuroscience information more findable, accessible, interoperable, and reusable to enable new insights.
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
Large amounts of scientific data remain uncurated, especially small datasets, which are currently invisible or "dark data". This dark data should be curated locally with the involvement of non-scientists at long-lived institutions like libraries and museums that have experience managing scholarly information over time. New roles and skills are needed for data scientists, digital curators, and biological information specialists to help address this problem by developing the necessary cyberinfrastructure, data standards, and educational programs to make more scientific dark data accessible.
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
How do we know what we don't know? Exploring the data and knowledge space th...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative that aims to catalog and integrate neuroscience resources and data. NIF surveys the neuroscience resource landscape, currently cataloging over 3000 databases and datasets. It provides semantic integration of these resources through the use of ontologies and allows deep search of aggregated data. However, significant amounts of neuroscience data and resources remain inaccessible in publications, databases, and file drawers. Barriers to data sharing include lack of incentives, standards, and resources. NIF and related efforts aim to develop solutions to make more neuroscience data FAIR - findable, accessible, interoperable, and reusable.
Next Steps for IMLS's National Digital PlatformTrevor Owens
This keynote, at the Upper Midwest Digital Collections Conference, provides and update on the National Digital Platform and 20 projects supported to enhance it. The national digital platform is a way of thinking about and approaching the digital capability and capacity of libraries across the US. In this sense, it is the combination of software applications, social and technical infrastructure, and staff expertise that provide library content and services to all users in the US. As libraries increasingly use digital infrastructure to provide access to digital content and resources, there are more and more opportunities for collaboration around the tools and services that they use to meet their users’ needs. It is possible for each library in the country to leverage and benefit from the work of other libraries in shared digital services, systems, and infrastructure.
We need to bridge gaps between disparate pieces of the existing digital infrastructure, for increased efficiencies, cost savings, access, and services. To this end, IMLS is focusing on the national digital platform as an area of priority in the National Leadership Grants to Libraries program and the Laura Bush 21st Century Librarian program. We are eager to explore how this way of thinking and approaching infrastructure development can help states make the best use of the funds they receive through the Grants to States program. We’re also eager to work with other foundations and funders to maximize the impact of our federal investment
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
A presentation focusing on the data analysis OCLC Research performed on 900K museum records, plus next steps for the nine project museums who now have the capacity to share standards-based records.
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
This document discusses bioinformatics and some of its key concepts and tools. It begins with definitions of bioinformatics as the intersection of biology, computer science, and information technology. It then discusses some of the data formats, tools, and skills used in bioinformatics, including working with nucleotide sequence data, translating sequences into amino acids, and analyzing large datasets. It also summarizes how ontologies are used to represent concepts and how various data types are organized and stored in databases for analysis.
The Seven Deadly Sins of BioinformaticsDuncan Hull
Keynote talk at Bioinformatics Open Source Conference (BOSC) Special Interest Group at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2007) in Vienna, July 2007 by Carole Goble, University of Manchester.
This document outlines a presentation on biological networks and the software Cytoscape. It begins with an introduction to biological networks and their taxonomy, as well as analytical approaches and visualization techniques. It then provides an overview of Cytoscape, covering core concepts like networks and tables, visual properties, and apps. The document demonstrates how to load networks and data, use visual style managers, and save and export networks. It concludes with tips and tricks for using Cytoscape and a link to a hands-on tutorial.
Exploring a world of networked information built from free-text metadataShenghui Wang
This document summarizes a presentation about exploring topics through networked information extracted from free-text metadata. It describes challenges in exploring topics and related aspects. It then demonstrates an online interface called Ariadne that addresses these challenges by generating semantic representations of entities from a large dataset and identifying nearest neighbors and related entities through multidimensional scaling. Finally, it discusses potential applications of this approach and references related work.
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
Towards collaboration at scale: Libraries, the social and the technicallisld
Libraries are now supporting research and learning behaviors in data rich network environments. This presentation looks at some examples focusing on how an emphasis on individual systems needs to give way to a broader view of process, workflow and behaviors.
It also discusses how this environment creates a demand for collaboration at scale among libraries.
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
Presentation of "g-Social - Enhancing e-Science Tools with Social Networking Functionality" given at the Workshop on Analyzing and Improving Collaborative eScience with Social Networks, Chicago October 8th, 2012. Co-located with IEEE eScience 2012.
The document discusses how the nature of library collections and user needs have changed dramatically with the rise of digital resources and the web. It makes three key points:
1) The old model of large print collections housed in libraries that users had to visit has been replaced by digital collections that are available anytime, anywhere. Now over 50% of library budgets go to electronic resources.
2) User expectations and behaviors have changed as well, shaped by Google and other web search engines. Users want quick, self-sufficient searching across all library resources from a single search box.
3) In response, libraries are adopting "discovery services" that aim to provide a unified search experience for all library resources similar to web search
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
This document provides summaries of several upcoming conferences, training programs, videoconferences, and workshops related to metadata and digital libraries. It also summarizes two ongoing projects: the development of a MARC 21 XML schema by the Library of Congress to facilitate the communication and conversion of MARC records, and the Metadata Encoding and Transmission Standard (METS) being developed by the Library of Congress as a standard for encoding metadata about digital library objects.
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
This presentation discusses how a model of “data sharing as publishing” can contribute to developing Linked Open Data resources in archaeology and the study of the ancient world. The paper gives examples from Open Context’s developing approach to data editing, documentation and quality improvement processes. The goal of these efforts is to better align the professional interests of individual researchers with the needs of the larger community to access and use high-quality data in Linked Data scenarios.
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
The Neuroscience Information Framework (NIF) is an initiative of the NIH Blueprint to maximize access to and utility of worldwide neuroscience research resources. NIF catalogs over 10,000 resources including databases, literature, and materials. It provides search capabilities across these resources and develops ontologies and semantic frameworks to integrate diverse data types and scales. NIF aims to make dispersed neuroscience information more findable, accessible, interoperable, and reusable to enable new insights.
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
Overview of how to use the Neuroscience Information Framework for data discovery presented at the Genetics of Addiction Workshop, held at Jackson Lab Aug 28- Sept 1, 2014.
A description of software as infrastructure at NSF, and how Apache projects may be similar. What lessons can be shared from one organization to the other? How does science software compare with more general software?
The document discusses solutions to overcoming the tragedy of the data commons through shared metadata. It describes how large scientific projects can share data at low cost by starting from overlapping common metadata terms and having their metadata teams work together. Reusing shared metadata leads to increased reusability of data across projects. The document advocates for developing metadata as evolving, linked resources rather than predefined standards, and provides examples of how this approach has helped scientific collaborations and government data sharing initiatives succeed.
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Spark Summit
This document describes a project at Novartis to use Apache Spark for high-dimensional data analysis from drug screening. Large datasets from various screening technologies were analyzed using Spark pipelines for quality control, normalization, and classification. Visualizations were built using WebGL. The goals were to speed up multi-day batch jobs, create a unified analysis workflow, and build an application for scientists. Future work includes elastic infrastructure, supervised learning of cell phenotypes, and contributing methods to open source.
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
dkNET provides a single portal for discovering over 3,500 biomedical research resources and datasets. It aims to make these resources findable, accessible, interoperable, and reusable in accordance with the FAIR principles. The portal contains three main sections for browsing community resources, additional resources, and literature. It utilizes faceted searching and provides analytics and notifications to help users track changes to resources over time.
Enabling knowledge management in the Agronomic DomainPierre Larmande
This talk will focus mainly on, ongoing projects at the Institute of Computational Biology
Agronomic Linked Data (AgroLD): is a Semantic Web knowledge base designed to integrate data from various publically available plant centric data sources.
GIGwA: is a tool developed to manage genomic, transcriptomic and genotyping large data resulting from NGS analyses.
De-centralized but global: Redesigning biodiversity data aggregation for impr...taxonbytes
The document discusses redesigning biodiversity data aggregation to improve engagement and impact. It proposes a decentralized but global approach using a network of independent themed portal communities that maintain live collections. Portal-to-portal APIs would negotiate partial collection snapshot sharing between portals to attain global coverage while allowing bidirectional data flow. This would better engage experts by providing access through custom research portals and accommodate pluralism in taxonomic data through spatial reasoning tools for mapping relationships between conflicting views.
Vince smith-delivering biodiversity knowledge in the information age-notextVince Smith
Smith, V.S. 2013. Delivering biodiversity knowledge in the information age. Hellenic Botanical Society, Thessaloniki, Greece, 3-6 Oct. 2013. [Delivered via video link through Google Hangouts]
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012lljohnston
Big Data challenges in developing repositories include:
- Collections like web archives and historic newspapers contain billions of files and grow quickly, requiring constant processing and large-scale infrastructure.
- Researchers want to analyze entire collections using algorithms and computational methods rather than accessing individual items.
- Repository services need to support self-serve access, full-text search of entire collections, and APIs to enable computational research methods.
- Ingesting and providing access to collections measured in petabytes and containing highly diverse content and metadata requires normalization and standardization.
This document provides an introduction to big data, including:
- Big data is characterized by its volume, velocity, and variety, which makes it difficult to process using traditional databases and requires new technologies.
- Technologies like Hadoop, MongoDB, and cloud platforms from Google and Amazon can provide scalable storage and processing of big data.
- Examples of how big data is used include analyzing social media and search data to gain insights, enabling personalized experiences and targeted advertising.
- As data volumes continue growing exponentially from sources like sensors, simulations, and digital media, new tools and approaches are needed to effectively analyze and make sense of "big data".
The real world of ontologies and phenotype representation: perspectives from...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF) and its role in facilitating discovery and use of neuroscience resources through a consistent semantic framework. NIF provides a portal for searching various types of neuroscience data and information organized by categories. It utilizes ontologies and advanced technologies to allow simultaneous searching of multiple sources. Challenges include the large number of databases and other resources, differing data types, and inconsistent naming of brain structures across sources.
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
"Building a Sustainable Knowledge Base for the Marine Protected Areas Monitoring Enterprise" a presentation to the California Ocean Science Trust, Oakland, California March 16, 2010
This document discusses multiple content repositories at Johns Hopkins University: JShare for file sharing, JScholarship for an institutional repository, and a pending Data Conservancy project for data curation. It describes each system and challenges of integrating them. Future directions discussed include using the JCR Connect tool or JBoss DNA to integrate content across dissimilar repositories using the JCR API. The goal is flexible access to content from different sources for teaching and learning.
Panel: The Global Research Platform: An OverviewLarry Smarr
The document provides an overview of the Global Research Platform (GRP), an international collaborative partnership creating a distributed environment for data-intensive global science. The GRP facilitates high-performance data gathering, analytics, transport up to terabits per second, computing, and storage to support large-scale global science cyberinfrastructure ecosystems. It aims to orchestrate research across multiple domains using international testbeds for investigating new technologies related to data-intensive science. Examples of instruments generating exabytes of data that would benefit include the Korea Superconducting Tokamak, the High Luminosity LHC, genomics, the SKA radio telescope, and the Vera Rubin Observatory.
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
This document discusses practical applications of Linked Open Data (LOD) for libraries, archives, and museums. It describes how LOD allows these institutions to publish structured data on the web in ways that are interoperable and can be connected to other open datasets. Examples are given of how LOD is being used by various institutions to share metadata, images, and other cultural heritage assets on the web in open, machine-readable formats. The presenter argues that LOD represents a new paradigm that these cultural organizations should embrace to make their collections more accessible and useful on the web.
Similar to EcsiNeurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neuroscienceste (20)
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
EcsiNeurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neuroscienceste
1. Maryann
E.
Martone,
Ph.
D.
University
of
California,
San
Diego
2. “A
grand
challenge
in
neuroscience
is
to
elucidate
brain
func>on
in
rela>on
to
its
mul>ple
layers
of
organiza>on
that
operate
at
different
spa>al
and
temporal
scales.
Central
to
this
effort
is
tackling
“neural
choreography”
-‐-‐
the
integrated
func>oning
of
neurons
into
brain
circuits-‐-‐
Neural
choreography
cannot
be
understood
via
a
purely
reduc>onist
approach.
Rather,
it
entails
the
convergent
use
of
analy>cal
and
synthe>c
tools
to
gather,
analyze
and
mine
informa>on
from
each
level
of
analysis,
and
capture
the
emergence
of
new
layers
of
func>on
(or
dysfunc>on)
as
we
move
from
studying
genes
and
proteins,
to
cells,
circuits,
thought,
and
behavior....
However,
the
neuroscience
community
is
not
yet
fully
engaged
in
exploi;ng
the
rich
array
of
data
currently
available,
nor
is
it
adequately
poised
to
capitalize
on
the
forthcoming
data
explosion.
“
Akil
et
al.,
Science,
Feb
11,
2011
3. • In
that
same
issue
of
Science
– Asked
peer
reviewers
from
last
year
about
the
availability
and
use
of
data
• About
half
of
those
polled
store
their
data
only
in
their
laboratories—not
an
ideal
long-‐term
solu>on.
• Many
bemoaned
the
lack
of
common
metadata
and
archives
as
a
main
impediment
to
using
and
storing
data,
and
most
of
the
respondents
have
no
funding
to
support
archiving
• And
even
where
accessible,
much
data
in
many
fields
is
too
poorly
organized
to
enable
it
to
be
efficiently
used.
“...it
is
a
growing
challenge
to
ensure
that
data
produced
during
the
course
of
reported
research
are
appropriately
described,
standardized,
archived,
and
available
to
all.”
Lead
Science
editorial,
2011
4. Neuroscience
is
unlikely
to
be
served
by
a
few
large
databases
like
the
genomics
and
proteomics
community
Whole
brain
data
(20
um
microscopic
MRI)
Mosiac
LM
images
(1
GB+)
Conven>onal
LM
images
Individual
cell
morphologies
EM
volumes
&
reconstruc>ons
Solved
molecular
structures
No
single
technology
serves
these
all
equally
well.
Mul6ple
data
types;
mul6ple
scales;
mul6ple
databases
6. • Current
web
is
designed
to
share
documents
– Documents
are
unstructured
data
• Much
of
the
content
of
digital
resources
is
part
of
the
“hidden
web”
• Wikipedia:
The
Deep
Web
(also
called
Deepnet,
the
invisible
Web,
DarkNet,
Undernet
or
the
hidden
Web)
refers
to
World
Wide
Web
content
that
is
not
part
of
the
Surface
Web,
which
is
indexed
by
standard
search
engines.
7. • NIF
has
developed
a
produc>on
technology
pla]orm
for
researchers
to:
– Discover
– Share
– Analyze
– Integrate
neuroscience-‐relevant
informa>on
• Since
2008,
NIF
has
assembled
the
largest
searchable
catalog
of
neuroscience
data
and
resources
on
the
web
• Cost-‐effec>ve
and
innova>ve
strategy
for
managing
data
assets
“This
unique
data
depository
serves
as
a
model
for
other
Web
sites
to
provide
research
data.
“
-‐
Choice
Reviews
Online
NIF
is
poised
to
capitalize
on
the
new
tools
and
emphasis
on
big
data
and
open
science
8. h?p://neuinfo.org
June10,
2013
dkCOIN
Inves>gator's
Retreat
8
• A
portal
for
finding
and
using
neuroscience
resources
A
consistent
framework
for
describing
resources
Provides
simultaneous
search
of
mul>ple
types
of
informa>on,
organized
by
category
Supported
by
an
expansive
ontology
for
neuroscience
U>lizes
advanced
technologies
to
search
the
“hidden
web”
UCSD,
Yale,
Cal
Tech,
George
Mason,
Washington
Univ
Literature
Database
Federa>on
Registry
9. • NIF
Registry:
A
catalog
of
neuroscience-‐
relevant
resources
• >
6000
currently
listed
• >
2200
databases
• And
we
are
finding
more
every
day
“Of
relevance
to
neuroscience”
is
very
broad
10. dkCOIN
Inves>gator's
Retreat
10
• NIF
curators
• Nomina>on
by
the
community
• Semi-‐automated
text
mining
pipelines
NIF
Registry
Requires
no
special
skills
Site
map
available
for
local
hos>ng
• NIF
Data
Federa>on
• DISCO
interop
• Requires
some
programming
skill
Low
barrier
to
entry
11. • Extended
over
>me
– Parent
resource
– Suppor>ng
agency
– Grant
numbers
– Accessibility
– Related
to
– Organism
– Disease
or
condi>on
– Last
updated
First
catalog:
SFN
Neuroscience
Database
Gateway
NIF
0.5
NIF
1.0+
Simple
metadata
model
Name,
descrip>on,
type,
URL,
other
names,
keywords,
unique
iden>fier
~2003
2006
2008
12. 12
• NIF
Registry
is
hosted
on
Seman>c
Media
Wiki
pla]orm
Neurolex
– Community
can
add,
review,
edit
without
special
privileges
– Searchable
by
Google
– Integrated
with
NIF
ontologies
– Graph
structure
Seman>c
wiki:
A
wiki
with
seman>cs;
pages
are
linked
through
rela>onships
14. – NIF
employs
an
automated
link
checker
– Last
analysis:
478/6100
invalid
URL’s
(~8%)
– 199
can’t
locate
at
another
university
or
loca>on
out
of
service
(~3%)
– Bigger
issue:
Many
resources
are
no
longer
updated
or
maintained
0
20
40
60
80
100
120
140
160
180
200
1996
1998
2000
2002
2004
2006
2008
2010
2012
2014
0
500
1000
1500
2000
2500
3000
3500
Resources
added
Last
updated
15. Keeping
content
up
to
date
Connectome
Tractography
Epigene>cs
• New
tags
come
into
existence
• New
resource
types
come
into
existence,
e.g.,
Mobile
apps
• Resources
add
new
types
of
content
• Change
name
• Change
scope
• >
7000
updates
to
the
registry
last
year
It’s
a
challenge
to
keep
the
registry
up
to
date;
sitemaps,
cura>on,
ontologies,
community
review
16. • The
NIF
Registry
has
created
a
linked
data
graph
of
web-‐accessible
resources
• Maintained
on
a
community
wiki
pla]orm
• Provides
data
on
the
fluidity
of
the
resource
landscape
– New
resources
con>nue
to
be
created
and
found
– Rela>vely
few
disappear
altogether
– Many
more
grow
stale,
although
their
value
may
s>ll
be
significant
– Maintaining
up
to
date
cura>on
requires
frequent
upda>ng
NIF
Registry
provides
insight
into
the
state
of
digital
resources
on
the
web
17. • The
NIF
data
federa>on
performs
deep
search
over
the
content
of
over
200
databases
• New
databases
are
added
at
a
rate
of
25-‐40
per
year
• Latest
update:
Open
Source
Brain;
ingest
completed
in
2
hours
• Databases
chosen
on
a
variety
of
criteria:
• Early:
tes>ng
different
types
of
resources
• Thema>c
areas
• Volunteers
NIF
provides
access
to
the
largest
aggrega>on
of
neuroscience-‐relevant
informa>on
on
the
web
18. • NIF
was
one
of
the
first
projects
to
aZempt
data
integra>on
in
the
neurosciences
on
a
large
scale
• NIF
is
supported
by
a
contract
that
specified
the
number
of
resources
to
be
added
per
year
– Designed
to
be
populated
rapidly;
set
up
process
for
progressive
refinement
– No
budget
was
allocated
to
retrofit
exis>ng
resources;
had
to
work
with
them
in
their
current
state
– We
designed
a
system
that
required
liZle
to
no
coopera>on
or
work
from
providers
– Supports
many
formats:
rela>onal,
XML,
RDF
19. Current
Planned
DISCO
Dashboard
Func6ons
• Ingest
Script
Manager
• Public
Script
Repository
• Data
&
Event
Tracker
• Versioning
System
• Curator
Tool
• Data
Transformer
Manager
June10,
2013
dkCOIN
Inves>gator's
Retreat
19
Luis
Marenco,
Rixin
Wang,
Perrry
Miller,
Gordon
Shepherd
Yale
University
20. 0
50
100
150
200
250
0.01
0.1
1
10
100
1000
6-‐12
12-‐12
7-‐13
1-‐14
8-‐14
2-‐15
9-‐15
4-‐16
10-‐16
5-‐17
Number
of
Federated
Databases
Number
of
Federated
Records
(Millions)
NIF
searches
the
largest
colla>on
of
neuroscience-‐relevant
data
on
the
web
DISCO
June10,
2013
dkCOIN
Inves>gator's
Retreat
20
22. Hippocampus
OR
“Cornu
Ammonis”
OR
“Ammon’s
horn”
Query
expansion:
Synonyms
and
related
concepts
Boolean
queries
Data
sources
categorized
by
“data
type”
and
level
of
nervous
system
Common
views
across
mul>ple
sources
Tutorials
for
using
full
resource
when
gewng
there
from
NIF
Link
back
to
record
in
original
source
23. Connects
to
Synapsed
with
Synapsed
by
Input
region
innervates
Axon
innervates
Projects
to
Cellular
contact
Subcellular
contact
Source
site
Target
site
Each
resource
implements
a
different,
though
related
model;
systems
are
complex
and
difficult
to
learn,
in
many
cases
24. • NIF
Connec>vity:
7
databases
containing
connec>vity
primary
data
or
claims
from
literature
on
connec>vity
between
brain
regions
• Brain
Architecture
Management
System
(rodent)
• Temporal
lobe.com
(rodent)
• Connectome
Wiki
(human)
• Brain
Maps
(various)
• CoCoMac
(primate
cortex)
• UCLA
Mul>modal
database
(Human
fMRI)
• Avian
Brain
Connec>vity
Database
(Bird)
• Total:
1800
unique
brain
terms
(excluding
Avian)
• Number
of
exact
terms
used
in
>
1
database:
42
• Number
of
synonym
matches:
99
• Number
of
1st
order
partonomy
matches:
385
25. – You
(and
the
machine)
have
to
be
able
to
find
it
• Accessible
through
the
web
• Annota>ons
– You
have
to
be
able
to
access
and
use
it
• Data
type
specified
and
in
a
usable
form
– You
have
to
know
what
the
data
mean
• Some
seman>cs:
“1”
• Context:
Experimental
metadata
• Provenance:
Where
did
the
data
come
from?
Repor>ng
neuroscience
data
within
a
consistent
framework
helps
enormously
26. Knowledge
in
space
and
spa>al
rela>onships
(the
“where”)
Knowledge
in
words,
terminologies
and
logical
rela>onships
(the
“what”)
27. • NIF
covers
mul>ple
structural
scales
and
domains
of
relevance
to
neuroscience
• Aggregate
of
community
ontologies
with
some
extensions
for
neuroscience,
e.g.,
Gene
Ontology,
Chebi,
Protein
Ontology
NIFSTD
Organism
NS
Func>on
Molecule
Inves>ga>on
Subcellular
structure
Macromolecule
Gene
Molecule
Descriptors
Techniques
Reagent
Protocols
Cell
Resource
Instrument
Dysfunc>on
Quality
Anatomical
Structure
NIF
capitalizes
on
the
growing
set
of
community
ontologies
available
in
biomedical
science
28. Purkinje
Cell
Axon
Terminal
Axon
Dendri>c
Tree
Dendri>c
Spine
Dendrite
Cell
body
Cerebellar
cortex
There
is
liZle
obvious
connec>on
between
data
sets
taken
at
different
scales
using
different
microscopies
without
an
explicit
representa>on
of
the
biological
objects
that
the
data
represent
29. Brain
Cerebellum
Purkinje
Cell
Layer
Purkinje
cell
neuron
has
a
has
a
has
a
is
a
• Ontology:
an
explicit,
formal
representa>on
of
concepts
rela>onships
among
them
within
a
par>cular
domain
that
expresses
human
knowledge
in
a
machine
readable
form
– Branch
of
philosophy:
a
theory
of
what
is
– e.g.,
Gene
ontologies
• Provide
universals
for
naviga>ng
across
different
data
sources
– Seman>c
“index”
• Provide
the
basis
for
concept-‐based
queries
to
probe
and
mine
data
– Perform
reasoning
– Link
data
through
rela>onships
not
just
one-‐
to-‐one
mappings
30. “Search
compu6ng”
What
genes
are
upregulated
by
drugs
of
abuse
in
the
adult
mouse?
Morphine
Increased
expression
Adult
Mouse
Some
concepts,
e.g.,
age
category,
are
quan>ta>ve
but
s>ll
must
be
interpreted
in
a
global
query
system
33. hZp://neurolex.org
Stephen
Larson
• Provide
a
simple
interface
for
defining
the
concepts
required
• Light
weight
seman>cs
• Good
teaching
tool
for
learning
about
seman>c
integra>on
and
the
benefits
of
a
consistent
seman>c
framework
• Community
based:
• Anyone
can
contribute
their
terms,
concepts,
things
• Anyone
can
edit
• Anyone
can
link
• Accessible:
searched
by
Google
• Growing
into
a
significant
knowledge
base
for
neuroscience
Demo
D03
200,000
edits
150
contributors
34. • NIF
can
be
used
to
survey
the
data
landscape
• Analysis
of
NIF
shows
mul>ple
databases
with
similar
scope
and
content
• Many
contain
par>ally
overlapping
data
• Data
“flows”
from
one
resource
to
the
next
– Data
is
reinterpreted,
reanalyzed
or
added
to
• Is
duplica>on
good
or
bad?
35. Databases
come
in
many
shapes
and
sizes
• Primary
data:
– Data
available
for
reanalysis,
e.g.,
microarray
data
sets
from
GEO;
brain
images
from
XNAT;
microscopic
images
(CCDB/CIL)
• Secondary
data
– Data
features
extracted
through
data
processing
and
some>mes
normaliza>on,
e.g,
brain
structure
volumes
(IBVD),
gene
expression
levels
(Allen
Brain
Atlas);
brain
connec>vity
statements
(BAMS)
• Ter>ary
data
– Claims
and
asser>ons
about
the
meaning
of
data
• E.g.,
gene
upregula>on/
downregula>on,
brain
ac>va>on
as
a
func>on
of
task
• Registries:
– Metadata
– Pointers
to
data
sets
or
materials
stored
elsewhere
• Data
aggregators
– Aggregate
data
of
the
same
type
from
mul>ple
sources,
e.g.,
Cell
Image
Library
,SUMSdb,
Brede
• Single
source
– Data
acquired
within
a
single
context
,
e.g.,
Allen
Brain
Atlas
Researchers
are
producing
a
variety
of
informa>on
ar>facts
using
a
mul>tude
of
technologies
36. NIF
Analy6cs:
The
Neuroscience
Landscape
NIF
is
in
a
unique
posi>on
to
answer
ques>ons
about
the
neuroscience
landscape
Where
are
the
data?
Striatum
Hypothalamus
Olfactory
bulb
Cerebral
cortex
Brain
Brain
region
Data
source
Vadim
Astakhov,
Kepler
Workflow
Engine
37. Diseases
of
nervous
system
Adding
more
seman6cs
The
combina>on
of
ontologies,
diverse
data
and
analy>cs
lets
us
look
at
the
current
landscape
in
interes>ng
ways
Neurodegenera>ve
Seizure
disorders
Neoplas>c
disease
of
nervous
system
NIH
Reporter
NIF
data
federated
sources
38. • Gemma:
Gene
ID
+
Gene
Symbol
• DRG:
Gene
name
+
Probe
ID
• Gemma
presented
results
rela>ve
to
baseline
chronic
morphine;
DRG
with
respect
to
saline,
so
direc>on
of
change
is
opposite
in
the
2
databases
•
Analysis:
• 1370
statements
from
Gemma
regarding
gene
expression
as
a
func>on
of
chronic
morphine
• 617
were
consistent
with
DRG;
over
half
of
the
claims
of
the
paper
were
not
confirmed
in
this
analysis
• Results
for
1
gene
were
opposite
in
DRG
and
Gemma
• 45
did
not
have
enough
informa>on
provided
in
the
paper
to
make
a
judgment
Rela>vely
simple
standards
would
make
life
easier
39. NIF
favors
a
hybrid,
>ered,
federated
system
• Domain
knowledge
– Ontologies
• Claims,
models
and
observa>ons
– Virtuoso
RDF
triples
– Model
repositories
• Data
– Data
federa>on
– Spa>al
data
– Workflows
• Narra>ve
– Full
text
access
Neuron
Brain
part
Disease
Organism
Gene
Caudate
projects
to
Snpc
Grm1
is
upregulated
in
chronic
cocaine
Betz
cells
degenerate
in
ALS
NIF
provides
the
tentacles
that
connect
the
pieces:
a
new
type
of
en>ty
for
21st
century
science
Technique
People
40. • 2006-‐2008:
A
survey
of
what
was
out
there
• 2008-‐2009:
Strategy
for
resource
discovery
– NIF
Registry
vs
NIF
data
federa>on
– Inges>on
of
data
contained
within
different
technology
pla]orms,
e.g.,
XML
vs
rela>onal
vs
RDF
– Effec>ve
search
across
seman>cally
diverse
sources
• NIFSTD
ontologies
• 2009-‐2011:
Strategy
for
data
integra>on
– Unified
views
across
common
sources
– Mapping
of
content
to
NIF
vocabularies
• 2011-‐present:
Data
analy>cs
– Uniform
external
data
references
• 2012-‐present:
SciCrunch:
unified
biomedical
resource
services
NIF
provides
a
strategy
and
set
of
tools
applicable
to
all
domains
grappling
with
mul>ple
sources
of
diverse
data
(i.e.,
preZy
much
everything)
41. • Search
seman>cs
• Ranking
• Resources
supported
by
NIH
Blueprint
Ins>tutes
are
more
thoroughly
covered
• Data
types,
e.g.,
Brain
ac>va>on
foci
June10,
2013
dkCOIN
Inves>gator's
Retreat
41
42. June10,
2013
42
SciCrunch
NIF
MONARCH
Community
Services
dkCOIN
Shared
Resources
Undiagnosed
Disease
Program
Phenotype
RCN
3D
Virtual
Cell
Na>onal
Ins>tute
on
Aging
One
Mind
for
Research
BIRN
Interna>onal
Neuroinforma>cs
Coordina>ng
Facility
Model
Organism
Databases
Community
Outreach
DELSA
(not
just
a
data
catalog)
43. 43
• 3dVC:
Focus
on
models
and
simula>on
• Gene
Ontology:
Focus
on
bioinforma>cs
tools
• Na>onal
Ins>tute
on
aging:
Aging-‐
related
data
sets
• Monarch:
Phenotype-‐Genotype;
deep
seman>c
data
integra>on
• One
Mind
for
Research:
Biospecimen
repositories
• NeuroGateway:
Computa>onal
resources
• FORCE11:
Tools
for
next-‐gen
publishing
and
e-‐scholarship
SciCrunch
SciCrunch
is
ac>vely
suppor>ng
mul>ple
communi>es;
mul>ple
communi>es
are
enriching
and
improving
SciCrunch
44. Community
database:
beginning
Community
database:
End
“How
do
I
share
my
data/tool?”
“There
is
no
database
for
my
data”
1
2
3
4
Ins3tu3onal
repositories
Cloud
INCF:
Global
infrastructure
Government
Educa>on
Industry
NIF
is
designed
to
leverage
exis>ng
investments
in
resources
and
infrastructure
Tool
repositories
45. • No
one
can
be
stopped
from
doing
what
they
need
to
do
• Every
resource
is
resource
limited:
few
have
enough
>me,
money,
staff
or
exper>se
required
to
do
everything
they
would
like
– If
the
market
can
support
11
MRI
databases,
fine
– Some
consolida>on,
coordina>on
is
warranted
though
• Big,
broad
and
messy
beats
small,
narrow
and
neat
– Without
trying
to
integrate
a
lot
of
data,
we
will
not
know
what
needs
to
be
done
– A
lot
can
be
done
with
messy
data;
neatness
helps
though
– Progressive
refinement;
addi>on
of
complexity
through
layers
• Be
flexible
and
opportunis>c
– A
single
op>mal
technology/container
for
all
types
of
scien>fic
data
and
informa>on
does
not
exist;
technology
is
changing
• Think
globally;
act
locally:
– No
source,
not
even
NIF,
is
THE
source;
we
are
all
a
source
46. • Several
powerful
trends
should
change
the
way
we
think
about
our
data:
One
Many
– Many
data
• Genera>on
of
data
is
gewng
easier
shared
data
• Data
space
is
gewng
richer:
more
–omes
everyday
• But...compared
to
the
biological
space,
s>ll
sparse
– Many
eyes
• Wisdom
of
crowds
• More
than
one
way
to
interpret
data
– Many
algorithms
• Not
a
single
way
to
analyze
data
– Many
analy>cs
• “Signatures”
in
data
may
not
be
directly
related
to
the
ques>on
for
which
they
were
acquired
but
tell
us
something
really
interes>ng
Are
you
exposing
or
burying
your
work?
47. Jeff
Grethe,
UCSD,
Co
Inves>gator,
Interim
PI
Amarnath
Gupta,
UCSD,
Co
Inves>gator
Anita
Bandrowski,
NIF
Project
Leader
Gordon
Shepherd,
Yale
University
Perry
Miller
Luis
Marenco
Rixin
Wang
David
Van
Essen,
Washington
University
Erin
Reid
Paul
Sternberg,
Cal
Tech
Arun
Rangarajan
Hans
Michael
Muller
Yuling
Li
Giorgio
Ascoli,
George
Mason
University
Sridevi
Polavarum
Fahim
Imam
Larry
Lui
Andrea
Arnaud
Stagg
Jonathan
Cachat
Jennifer
Lawrence
Svetlana
Sulima
Davis
Banks
Vadim
Astakhov
Xufei
Qian
Chris
Condit
Mark
Ellisman
Stephen
Larson
Willie
Wong
Tim
Clark,
Harvard
University
Paolo
Ciccarese
Karen
Skinner,
NIH,
Program
Officer
(re>red)
Jonathan
Pollock,
NIH,
Program
Officer
And
my
colleagues
in
Monarch,
dkNet,
3DVC,
Force
11