The hippocampus receives input from the entorhinal cortex and sends projections to multiple targets in the brain. Its main outputs are to the subiculum, which projects to regions like the nucleus accumbens, amygdala, and medial prefrontal cortex. The hippocampus plays an important role in memory formation and spatial navigation.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a portal for finding and utilizing web-based neuroscience resources. NIF provides a consistent framework for describing various resources like databases, literature, and images. It allows simultaneous searches across these different data types and is supported by neuroscience ontologies. NIF currently catalogs over 5,000 resources and is working to integrate these diverse data sources to help answer questions and discover gaps in our knowledge about the brain.
How do we know what we don't know? Exploring the data and knowledge space th...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative that aims to catalog and integrate neuroscience resources and data. NIF surveys the neuroscience resource landscape, currently cataloging over 3000 databases and datasets. It provides semantic integration of these resources through the use of ontologies and allows deep search of aggregated data. However, significant amounts of neuroscience data and resources remain inaccessible in publications, databases, and file drawers. Barriers to data sharing include lack of incentives, standards, and resources. NIF and related efforts aim to develop solutions to make more neuroscience data FAIR - findable, accessible, interoperable, and reusable.
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
The document discusses using the Neuroscience Information Framework (NIF) to reveal knowledge gaps in neuroscience. It summarizes that NIF aims to maximize awareness, access, and utility of neuroscience research resources by uniting information from over 200 databases containing over 400 million records. However, it notes that certain domains may still be underrepresented due to biases in available data driven by factors like funding priorities. The framework uses ontologies to help integrate diverse data types and link them with defined concepts, but notes that neuroanatomical structures in particular pose challenges due to inconsistent naming conventions across studies.
Semantic Technology empowering Real World outcomes in Biomedical Research and...Amit Sheth
Based on the context and background knowledge:
- Patient has leg swelling and stiffness which is limiting her function
- She also has shortness of breath
- Shortness of breath can be a symptom of heart failure
- Heart failure can cause leg swelling
The NLP should annotate:
Problem: Heart failure
Symptom: Shortness of breath
Symptom: Leg swelling
By utilizing background knowledge, the inconsistency is resolved.
Resolve Inconsistency
The patient reports intermittent chest pain NLP Patient has chest pain
on exertion for the past few months. On
exam today, she denies any chest pain.
The document discusses navigating the neuroscience data landscape. It notes that a grand challenge in neuroscience is to understand brain function across multiple scales of organization. Central to this effort is understanding "neural choreography" - the integrated functioning of neurons into brain circuits. The Neuroscience Information Framework (NIF) aims to facilitate discovery and utilization of web-based neuroscience resources. However, the neuroscience community has not fully exploited currently available data or prepared for forthcoming data.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a portal for finding and utilizing web-based neuroscience resources. NIF provides a consistent framework for describing various resources like databases, literature, and images. It allows simultaneous searches across these different data types and is supported by neuroscience ontologies. NIF currently catalogs over 5,000 resources and is working to integrate these diverse data sources to help answer questions and discover gaps in our knowledge about the brain.
How do we know what we don't know? Exploring the data and knowledge space th...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative that aims to catalog and integrate neuroscience resources and data. NIF surveys the neuroscience resource landscape, currently cataloging over 3000 databases and datasets. It provides semantic integration of these resources through the use of ontologies and allows deep search of aggregated data. However, significant amounts of neuroscience data and resources remain inaccessible in publications, databases, and file drawers. Barriers to data sharing include lack of incentives, standards, and resources. NIF and related efforts aim to develop solutions to make more neuroscience data FAIR - findable, accessible, interoperable, and reusable.
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
The document discusses using the Neuroscience Information Framework (NIF) to reveal knowledge gaps in neuroscience. It summarizes that NIF aims to maximize awareness, access, and utility of neuroscience research resources by uniting information from over 200 databases containing over 400 million records. However, it notes that certain domains may still be underrepresented due to biases in available data driven by factors like funding priorities. The framework uses ontologies to help integrate diverse data types and link them with defined concepts, but notes that neuroanatomical structures in particular pose challenges due to inconsistent naming conventions across studies.
Semantic Technology empowering Real World outcomes in Biomedical Research and...Amit Sheth
Based on the context and background knowledge:
- Patient has leg swelling and stiffness which is limiting her function
- She also has shortness of breath
- Shortness of breath can be a symptom of heart failure
- Heart failure can cause leg swelling
The NLP should annotate:
Problem: Heart failure
Symptom: Shortness of breath
Symptom: Leg swelling
By utilizing background knowledge, the inconsistency is resolved.
Resolve Inconsistency
The patient reports intermittent chest pain NLP Patient has chest pain
on exertion for the past few months. On
exam today, she denies any chest pain.
The document discusses navigating the neuroscience data landscape. It notes that a grand challenge in neuroscience is to understand brain function across multiple scales of organization. Central to this effort is understanding "neural choreography" - the integrated functioning of neurons into brain circuits. The Neuroscience Information Framework (NIF) aims to facilitate discovery and utilization of web-based neuroscience resources. However, the neuroscience community has not fully exploited currently available data or prepared for forthcoming data.
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
Large amounts of scientific data remain uncurated, especially small datasets, which are currently invisible or "dark data". This dark data should be curated locally with the involvement of non-scientists at long-lived institutions like libraries and museums that have experience managing scholarly information over time. New roles and skills are needed for data scientists, digital curators, and biological information specialists to help address this problem by developing the necessary cyberinfrastructure, data standards, and educational programs to make more scientific dark data accessible.
This document summarizes an presentation about opportunities for data exchange and optimizing data sharing conditions. It discusses several projects by LIBER, including Europeana which aims to make cultural content available online. It notes that with proper infrastructure, researchers can collaborate on shared data sets across locations. However, challenges include authentication, skills, and managing large amounts of data being generated. Overall, the presentation argues that data sharing can advance scientific inquiry if barriers are addressed and key stakeholders work together.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a consistent framework and portal for discovering and utilizing web-based neuroscience resources. It summarizes the goals of NIF in indexing over 2000 databases and making their content searchable through an expansive neuroscience ontology. The document outlines the history and development of NIF, describes its search capabilities and use of ontologies, and provides examples of tools and resources that integrate NIF services like the Whole Brain Catalog.
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
The document discusses the Neuroscience Information Framework (NIF), which provides a portal for finding and utilizing web-based neuroscience resources. NIF allows simultaneous searching of multiple data sources through a concept-based interface organized by categories. It indexes over 35 million records from 65+ databases. NIF aims to address the challenges of dispersed and inconsistent neuroscience data by providing a common framework and tools to integrate data from various sources. Ontologies are discussed as a way to represent neuroscience concepts and relationships in a machine-readable way to facilitate data integration and querying across multiple scales and domains.
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
Overview of how to use the Neuroscience Information Framework for data discovery presented at the Genetics of Addiction Workshop, held at Jackson Lab Aug 28- Sept 1, 2014.
Presentation Title: Grand Challenges and Big Data: Implications for Public Participation in Scientific Research
Presenter: William Michener, Professor and PI/Director of DataONE, University Libraries, University of New Mexico
Where is the opportunity for libraries in the collaborative data infrastructure?LIBER Europe
Presentation by Susan Reilly at Bibsys2013 on the opportunties for libraries and their role in the collaborative data infrastructure. Looks at data sharing, authentication, preservation and advocacy.
Semantic Web for Health Care and Biomedical InformaticsAmit Sheth
Amit Sheth, "Semantic Web for Health Care and Biomedical Informatics," Keynote at NSF Biomed Web Workshop, Corbett, Oregon, December 4-5, 2007.
http://www.biomedweb.info/2007/
This document discusses data management and curation in bioinformatics. It describes Susanna-Assunta Sansone as the principal investigator and team leader at the University of Oxford e-Research Centre, where her team works on data management, biocuration, software development, databases, and community standards and ontologies for various domains including toxicology, health, and agriculture. The document promotes the importance of data standards to enable data sharing and reproducibility in bioscience research.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes text mining techniques for information retrieval, extraction, and indexing. It discusses common information retrieval techniques like inverted indices and signature files. It also covers stemming, domain dictionaries, exclusion lists, and research directions in text mining like finding better representations for extracted information, enabling multilingual analysis, and integrating domain knowledge. The key techniques discussed are text indexing, query processing, and information extraction from text.
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
This document discusses bioinformatics and some of its key concepts and tools. It begins with definitions of bioinformatics as the intersection of biology, computer science, and information technology. It then discusses some of the data formats, tools, and skills used in bioinformatics, including working with nucleotide sequence data, translating sequences into amino acids, and analyzing large datasets. It also summarizes how ontologies are used to represent concepts and how various data types are organized and stored in databases for analysis.
An Adaptive Filter-Framework for the Quality Improvement of Open-Source Softw...Anna Glukhova
This document describes an adaptive filter framework for improving the quality of open-source software analysis. It discusses how open-source software projects are community-driven and use web tools for communication and development. It also describes how analyzing these projects can provide insights but the results depend on the quality of the data. The adaptive filter framework allows filtering artifacts from communication and development repositories in a structured way to clean the data and improve analysis results. It features filters that can reduce datasets, clean content, and transform artifacts for cross-medium analysis. The framework was validated on biology-related open-source projects where it helped reduce spam and its distorting effects.
The National Resource for Network Biology (NRNB) held its External Advisory Council meeting on December 12, 2012. The NRNB is focused on developing network biology tools and collaborating with investigators. It oversees various technology research and development projects, software releases including Cytoscape 3.0, collaboration projects, and outreach/training events. The meeting agenda covered progress updates and sought advice on future plans.
Knowledge management for integrative omics data analysisCOST action BM1006
This document discusses knowledge management for integrative omics data analysis. It describes Biomax, a knowledge management platform that can flexibly interconnect isolated data silos in biomedical research. The platform addresses challenges like aggregating and analyzing multi-scale omics data from various sources and representing biological knowledge through semantic mapping and ontologies. Examples demonstrate how Biomax can integrate data from literature and databases, develop domain models, perform statistical analyses and network searches on integrated data, and support collaborative knowledge extraction.
This document discusses how natural computation techniques can be applied to web usage mining. It begins by introducing web usage mining and its importance. It then provides an overview of various natural computation approaches, including artificial neural networks, evolutionary algorithms, swarm intelligence, artificial immune systems, bacterial foraging, DNA computation, and hybrid approaches. The document explains how each of these natural computation techniques can inspire computational methods for analyzing web usage data.
This document outlines the big data landscape in 2016, including key components like data lakes, data warehouses, ingestion, processing, data science, analytics, and data sources. It also discusses related microservices, algorithms, data storage technologies, data workflows, stream processing systems, SQL and NoSQL databases, and specialized databases for time series, graphs, and other data types. The goal is to provide an overview of the different technologies and approaches for working with large and diverse datasets.
Big data from small data: A survey of the neuroscience landscape through the...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative by the NIH Blueprint to provide a single access point for searching across multiple neuroscience databases and data types. NIF aims to maximize access to and utility of worldwide neuroscience resources by creating a consistent framework for describing resources and enabling simultaneous searches. It notes that neuroscience data exists in many forms, from raw data to processed data to claims, across multiple scales and data types. NIF is designed to rapidly integrate these diverse resources through a tiered system that has a low barrier for data providers to participate.
Beyond the Top 10 - Combining Profiling and Mobile Behavioral Data for Easy I...Merlien Institute
The document summarizes a presentation given at the MRMW Chicago conference from May 27-30, 2014. It discusses how mobile behavioral data can provide insights into web and app usage beyond just the top 10 sites/apps. It highlights differences between Android and iOS users demographically, attitudinally, and behaviorally. It also shows how engagement scoring and segment profiling of mobile data can provide customized profiles and targeting.
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
Large amounts of scientific data remain uncurated, especially small datasets, which are currently invisible or "dark data". This dark data should be curated locally with the involvement of non-scientists at long-lived institutions like libraries and museums that have experience managing scholarly information over time. New roles and skills are needed for data scientists, digital curators, and biological information specialists to help address this problem by developing the necessary cyberinfrastructure, data standards, and educational programs to make more scientific dark data accessible.
This document summarizes an presentation about opportunities for data exchange and optimizing data sharing conditions. It discusses several projects by LIBER, including Europeana which aims to make cultural content available online. It notes that with proper infrastructure, researchers can collaborate on shared data sets across locations. However, challenges include authentication, skills, and managing large amounts of data being generated. Overall, the presentation argues that data sharing can advance scientific inquiry if barriers are addressed and key stakeholders work together.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a consistent framework and portal for discovering and utilizing web-based neuroscience resources. It summarizes the goals of NIF in indexing over 2000 databases and making their content searchable through an expansive neuroscience ontology. The document outlines the history and development of NIF, describes its search capabilities and use of ontologies, and provides examples of tools and resources that integrate NIF services like the Whole Brain Catalog.
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
The document discusses the Neuroscience Information Framework (NIF), which provides a portal for finding and utilizing web-based neuroscience resources. NIF allows simultaneous searching of multiple data sources through a concept-based interface organized by categories. It indexes over 35 million records from 65+ databases. NIF aims to address the challenges of dispersed and inconsistent neuroscience data by providing a common framework and tools to integrate data from various sources. Ontologies are discussed as a way to represent neuroscience concepts and relationships in a machine-readable way to facilitate data integration and querying across multiple scales and domains.
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
Overview of how to use the Neuroscience Information Framework for data discovery presented at the Genetics of Addiction Workshop, held at Jackson Lab Aug 28- Sept 1, 2014.
Presentation Title: Grand Challenges and Big Data: Implications for Public Participation in Scientific Research
Presenter: William Michener, Professor and PI/Director of DataONE, University Libraries, University of New Mexico
Where is the opportunity for libraries in the collaborative data infrastructure?LIBER Europe
Presentation by Susan Reilly at Bibsys2013 on the opportunties for libraries and their role in the collaborative data infrastructure. Looks at data sharing, authentication, preservation and advocacy.
Semantic Web for Health Care and Biomedical InformaticsAmit Sheth
Amit Sheth, "Semantic Web for Health Care and Biomedical Informatics," Keynote at NSF Biomed Web Workshop, Corbett, Oregon, December 4-5, 2007.
http://www.biomedweb.info/2007/
This document discusses data management and curation in bioinformatics. It describes Susanna-Assunta Sansone as the principal investigator and team leader at the University of Oxford e-Research Centre, where her team works on data management, biocuration, software development, databases, and community standards and ontologies for various domains including toxicology, health, and agriculture. The document promotes the importance of data standards to enable data sharing and reproducibility in bioscience research.
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
This document summarizes text mining techniques for information retrieval, extraction, and indexing. It discusses common information retrieval techniques like inverted indices and signature files. It also covers stemming, domain dictionaries, exclusion lists, and research directions in text mining like finding better representations for extracted information, enabling multilingual analysis, and integrating domain knowledge. The key techniques discussed are text indexing, query processing, and information extraction from text.
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
This document discusses bioinformatics and some of its key concepts and tools. It begins with definitions of bioinformatics as the intersection of biology, computer science, and information technology. It then discusses some of the data formats, tools, and skills used in bioinformatics, including working with nucleotide sequence data, translating sequences into amino acids, and analyzing large datasets. It also summarizes how ontologies are used to represent concepts and how various data types are organized and stored in databases for analysis.
An Adaptive Filter-Framework for the Quality Improvement of Open-Source Softw...Anna Glukhova
This document describes an adaptive filter framework for improving the quality of open-source software analysis. It discusses how open-source software projects are community-driven and use web tools for communication and development. It also describes how analyzing these projects can provide insights but the results depend on the quality of the data. The adaptive filter framework allows filtering artifacts from communication and development repositories in a structured way to clean the data and improve analysis results. It features filters that can reduce datasets, clean content, and transform artifacts for cross-medium analysis. The framework was validated on biology-related open-source projects where it helped reduce spam and its distorting effects.
The National Resource for Network Biology (NRNB) held its External Advisory Council meeting on December 12, 2012. The NRNB is focused on developing network biology tools and collaborating with investigators. It oversees various technology research and development projects, software releases including Cytoscape 3.0, collaboration projects, and outreach/training events. The meeting agenda covered progress updates and sought advice on future plans.
Knowledge management for integrative omics data analysisCOST action BM1006
This document discusses knowledge management for integrative omics data analysis. It describes Biomax, a knowledge management platform that can flexibly interconnect isolated data silos in biomedical research. The platform addresses challenges like aggregating and analyzing multi-scale omics data from various sources and representing biological knowledge through semantic mapping and ontologies. Examples demonstrate how Biomax can integrate data from literature and databases, develop domain models, perform statistical analyses and network searches on integrated data, and support collaborative knowledge extraction.
This document discusses how natural computation techniques can be applied to web usage mining. It begins by introducing web usage mining and its importance. It then provides an overview of various natural computation approaches, including artificial neural networks, evolutionary algorithms, swarm intelligence, artificial immune systems, bacterial foraging, DNA computation, and hybrid approaches. The document explains how each of these natural computation techniques can inspire computational methods for analyzing web usage data.
This document outlines the big data landscape in 2016, including key components like data lakes, data warehouses, ingestion, processing, data science, analytics, and data sources. It also discusses related microservices, algorithms, data storage technologies, data workflows, stream processing systems, SQL and NoSQL databases, and specialized databases for time series, graphs, and other data types. The goal is to provide an overview of the different technologies and approaches for working with large and diverse datasets.
Big data from small data: A survey of the neuroscience landscape through the...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative by the NIH Blueprint to provide a single access point for searching across multiple neuroscience databases and data types. NIF aims to maximize access to and utility of worldwide neuroscience resources by creating a consistent framework for describing resources and enabling simultaneous searches. It notes that neuroscience data exists in many forms, from raw data to processed data to claims, across multiple scales and data types. NIF is designed to rapidly integrate these diverse resources through a tiered system that has a low barrier for data providers to participate.
Beyond the Top 10 - Combining Profiling and Mobile Behavioral Data for Easy I...Merlien Institute
The document summarizes a presentation given at the MRMW Chicago conference from May 27-30, 2014. It discusses how mobile behavioral data can provide insights into web and app usage beyond just the top 10 sites/apps. It highlights differences between Android and iOS users demographically, attitudinally, and behaviorally. It also shows how engagement scoring and segment profiling of mobile data can provide customized profiles and targeting.
Big Transaction Data (BTD) provides a new approach to big data analytics by capturing entire business transactions and associated context such as users, locations, behaviors, and outcomes. This approach addresses the primary challenge with big data today which is understanding and preparing disparate data sources. With BTD, data is stored and linked in a single, normalized stream so business analysts can directly experiment with and gain insights from the data in hours rather than months. BTD can significantly reduce the time and resources spent understanding and preparing data for analytics.
This document discusses big data and analytics. It notes that digital data is growing exponentially and will reach 35 zettabytes by 2020, with 80% coming from enterprise systems. Big data is being driven by increased transaction data, interaction data from mobile and social media, and improved processing capabilities. Major players in big data include Google, Amazon, IBM and Microsoft. Traditional analytics struggle due to batch processing and lack of business context. The document introduces OpTier's approach of capturing real-time business context across interactions to enable insights with low costs and flexibility. Potential use cases for financial services are discussed.
We pitched this presentation on 24 June 2016 in K22 in Ghent. It describes what W4P has become, our process in making it, what you can do with this Open Source template and a brief overview of the first pilots, including their wins and fails.
This presentation, by big data guru Bernard Marr, outlines in simple terms what Big Data is and how it is used today. It covers the 5 V's of Big Data as well as a number of high value use cases.
This document provides an overview of big data. It defines big data as large volumes of diverse data that are growing rapidly and require new techniques to capture, store, distribute, manage, and analyze. The key characteristics of big data are volume, velocity, and variety. Common sources of big data include sensors, mobile devices, social media, and business transactions. Tools like Hadoop and MapReduce are used to store and process big data across distributed systems. Applications of big data include smarter healthcare, traffic control, and personalized marketing. The future of big data is promising with the market expected to grow substantially in the coming years.
Neuroscience research increasingly relies on large, heterogeneous datasets from various sources. Integrating these diverse data types and making them accessible presents challenges. The NIF (Neuroscience Information Framework) addresses this by creating a federated search engine and unified interface to access multiple neuroscience databases. NIF aims to make neuroscience data more discoverable, accessible, and usable through techniques like unique identifiers, metadata standards, and semantic integration. This will help researchers more effectively find and use relevant neuroscience information.
The real world of ontologies and phenotype representation: perspectives from...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF) and its role in facilitating discovery and use of neuroscience resources through a consistent semantic framework. NIF provides a portal for searching various types of neuroscience data and information organized by categories. It utilizes ontologies and advanced technologies to allow simultaneous searching of multiple sources. Challenges include the large number of databases and other resources, differing data types, and inconsistent naming of brain structures across sources.
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
The Neuroscience Information Framework (NIF) is an initiative of the NIH Blueprint to maximize access to and utility of worldwide neuroscience research resources. NIF catalogs over 10,000 resources including databases, literature, and materials. It provides search capabilities across these resources and develops ontologies and semantic frameworks to integrate diverse data types and scales. NIF aims to make dispersed neuroscience information more findable, accessible, interoperable, and reusable to enable new insights.
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
Overview of the Neuroscience Information Framework and how it brings together data, in the form of distributed databases, and knowledge, in the form of ontologies to show the mapping of the dataspace and places where there are mismatches between data and knowledge.
the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
Presented during the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'12). Part of the workshop 'New Models and Modes for Data Sharing: Experiences from Neuroscience'. Presented by Jeffrey S. Grethe, Ph.D. from the Center for Research in Biological Systems at the University of California, San Diego.
This workshop featured several large scale efforts to establish data sharing platforms, standards and tools to promote data intensive analysis in the neurosciences. As we head into the second decade of the 21st century, many scientists realize that current methods for publishing and accessing data are outmoded and inefficient. Neuroscience, with its large diverse and highly competitive community, has been slow to adopt more open sharing of data and has lacked effective tools to do so. There has been a significant investment in databases and tools for biological science, and frequent calls for more of them, but few calls to the biological community to adopt practices and frameworks for making their resources more easily discoverable and data more accessible. Data are contained within diverse sources, from web pages, databases, literature to personal lab systems, making for a haphazard mechanism for data and tool discovery. Although these mechanisms are effective for small communities, they are parochial for the totality of resources available, leading to fragmentation in the resource ecosystem. Neuroscience, with its diverse subdisciplines, complex data types and broad domain, presents the perfect exemplar of the current practices, bottlenecks and issues surrounding open access to data. This situation is changing, however, as groups have started to work together to define new models and tools for sharing and analyzing neuroscience data on an international scale. In this workshop, we bring together experts from national and international projects to discuss issues of data access and progress towards establishing platforms and best practices for effective sharing of neuroscience data in support of basic and clinical neuroscience.
Data analysis & integration challenges in genomicsmikaelhuss
Presentation given at the Genomics Today and Tomorrow event in Uppsala, Sweden, 19 March 2015. (http://connectuppsala.se/events/genomics-today-and-tomorrow/) Topics include APIs, "querying by data set", machine learning.
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...Maryann Martone
The document discusses the challenges of managing and utilizing the large amount of neuroscience data being generated. It notes that currently, about half of researchers only store data in their own labs and many lack funding for proper archiving. The National Information Framework (NIF) is working to address these issues by creating a catalog and federation of neuroscience resources to facilitate discovery, access, analysis and integration of data. NIF has assembled the largest searchable collection of neuroscience data on the web using an ontology and technologies that can search the "hidden web" of resources.
The document discusses the challenges of managing and analyzing the large amounts of neuroscience data being generated. It notes that currently, about half of researchers only store their data locally in their labs instead of in shared databases or archives. This prevents other researchers from accessing and using the data. The National Information Forum (NIF) is working to address these issues by creating a registry of neuroscience resources and developing technologies to allow researchers to discover, share, analyze and integrate data from various sources. NIF's registry currently catalogs over 6000 resources, including 2200 databases. The goal is for NIF to help the neuroscience community better exploit existing data and prepare for future increases in data.
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
Jane's lab uses a freely available open source software package called PhyloDOM to create a phylogenetic database of their molecular data resolving the phylogeny of endangered frog species, allowing their results to be easily shared and integrated by other researchers through web interfaces, data aggregators, and visualization tools that take advantage of standardized metadata.
Phyloinformatics combines phylogenetics and informatics to systematically study and classify evolutionary relationships. It has progressed from closed private data to more open and linked data through standards like ontologies and semantic web technologies. This allows phylogenetic concepts and data to be formalized and connected across resources using unique identifiers and statements called triples. Querying linked phylogenetic data from integrated sources will enable new synthetic research, though challenges remain in deploying these technologies and unlocking legacy data currently locked in publications.
Next generation sequencing requires next generation publishing: the Biodivers...Vince Smith
Penev, L., Stoev, P., Komericki, A., Akkari, N., Li, S., Zhou, X., Edmunds, S., Hunter, C., Weigand, A., Porco, D., Zapparoli, M., Georgiev, T., Mietchen, D., Roberts, D., Smith, V. 2013. Next generation sequencing requires next generation publishing: the Biodiversity Data Journal published the first eukaryotic new species with a fully sequenced transcriptome, DNA barcode and microcomputed tomography. TDWG, Biodiversity Information Standards. Grand Hotel Mediterraneo Florence, Italy, 27 Oct - 1 Nov.
The document discusses the need for a uniform resource layer to allow researchers to easily find and access biomedical resources such as databases, software tools, biobanks, and services. It notes that currently, each resource implements different models and systems that are complex and difficult to learn. There must be a common platform that makes access to biological data uniform so researchers can understand data, not just compute it. While progress has been made in registering some resources, much work still needs to be done to fully register and provide deep metadata for existing resources to achieve this goal.
This document discusses the challenges of handling large-scale genomic and biological data and proposes potential solutions. It notes that data volumes are increasing rapidly due to advances in sequencing technology but dissemination and data handling methods have not kept pace. Several hurdles to data sharing are described including technical issues around data size, heterogeneity and longevity as well as economic and cultural barriers. Potential solutions discussed include providing incentives for data sharing through attribution and citation, adopting data citation practices using Digital Object Identifiers, establishing funding models for long-term curation, and launching new databases and journals focused on publishing and analyzing large-scale datasets.
The document discusses solutions to overcoming the tragedy of the data commons through shared metadata. It describes how large scientific projects can share data at low cost by starting from overlapping common metadata terms and having their metadata teams work together. Reusing shared metadata leads to increased reusability of data across projects. The document advocates for developing metadata as evolving, linked resources rather than predefined standards, and provides examples of how this approach has helped scientific collaborations and government data sharing initiatives succeed.
This document discusses biological databases. It begins by defining biological databases as large, organized bodies of persistent biological data that can be updated, queried and retrieved. It then provides examples of popular databases like GenBank, SwissProt and PIR. The document discusses the importance of databases and different types of biological databases, categorized by the content or nature of the data. Specifically, it describes primary and secondary nucleotide and protein sequence databases like GenBank, EMBL, DDBJ, SwissProt and PIR.
Ontology Based Information Extraction for Disease Intelligence IJORCS
Disease Intelligence (DI) is based on the acquisition and aggregation of fragmented knowledge of diseases at multiple sources all over the world to provide valuable information to doctors, researchers and information seeking community. Some diseases have their own characteristics changed rapidly at different places of the world and are reported on documents as unrelated and heterogeneous information which may be going unnoticed and may not be quickly available. This research presents an Ontology based theoretical framework in the context of medical intelligence and country/region. Ontology is designed for storing information about rapidly spreading and changing diseases with incorporating existing disease taxonomies to genetic information of both humans and infectious organisms. It further maps disease symptoms to diseases and drug effects to disease symptoms. The machine understandable disease ontology represented as a website thus allows the drug effects to be evaluated on disease symptoms and exposes genetic involvements in the human diseases. Infectious agents which have no known place in an existing classification but have data on genetics would still be identified as organisms through the intelligence of this system. It will further facilitate researchers on the subject to try out different solutions for curing diseases.
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
Data is the most powerful resource in any field or subject of study. In Biology, data comes from scientists and their actions, while any institution that makes sense of the data collected, will be in the forefront in their respective research field. In the beginning of any data collection endeavour, it is critical to find proper management techniques to store data and to maximise its utilisation. This presentation reflects upon the current trends and techniques of data modeling, architecture with a highlight on the uses of database, focusing on Bioinformatics examples and case studies. Finally, the future of bioinformatics databases is highlighted to give an overview of the modeling techniques to accommodate the biological data escalation in coming years.
The document discusses methodologies for sharing long-tail data and what has been learned. It notes that unique identifiers (PIDs) are important for identifying entities across contexts. Standards like MINI and common data elements (CDEs) help ensure data is findable, accessible, and reusable. The Neuroscience Information Framework (NIF) aggregates ontologies and searches over 200 data sources to organize information. What we have learned is that data should be in repositories, not personal servers; people are key to these efforts; and resources should be comprehensive and support each other to advance open data sharing.
This document discusses why journals should ask authors to include Research Resource Identifiers (RRIDs) in their manuscripts. RRIDs help answer questions about what antibodies, animals, cell lines, or software tools were used in a study and allow others to find papers that used the same resources. The document notes that RRIDs improve reproducibility by making materials and methods more transparent. It also discusses how RRIDs can help identify problematic resources like contaminated cell lines or antibodies that do not work or are no longer available. The document provides examples of journals that now require RRIDs and how compliance is implemented.
Anita Bandrowski explains how the uniform resource layer of the Neuroscience Information Framework allows several interesting questions about the state of scientific research to be answered.
The document describes a summer institute on discovering big data held in San Diego from August 5-9, 2013. It discusses several topics related to big data in neuroscience including available resources, how to find and connect relevant information, challenges around data integration from disparate sources, and using ontologies and machine learning for tasks like data tagging.
The document provides an overview of the NIF Services which include a curated index of literature and data resources totaling approximately 200GB. The services allow users to search over literature and federated data, retrieve metadata and full text, and perform natural language processing tasks such as entity recognition and topic modeling. The services are implemented using open source technologies like Java, Jersey, Solr, and will be made openly available.
The document provides an overview of the NIF (National Information Framework) services including ontology, latent Dirichlet allocation, and annotation services. It describes lexical services that perform natural language processing tasks like sentence extraction, part-of-speech tagging, and chunking independently of any vocabulary. The document also lists URLs for accessing various NIF vocabulary, lexical, and annotation services.
The document describes various search and data retrieval functions across multiple biomedical data sources and literature. It provides examples of how to search across all data sources, filter search results, retrieve specific data rows from a source, search for papers in PubMed and PubMed Central, find related papers, and grab paper metadata from the concept mapper.
This document provides instructions for registering a resource in the NeuroLex database and generating a sitemap for it. It explains that you should search NeuroLex first before creating a new entry, and lists naming conventions to follow. It describes generating a sitemap to keep the resource description up-to-date, and options for maintaining the sitemap files yourself or having them automatically updated. Contact information is provided for the NIF Interoperability team as well.
DISCO is a customizable Internet-based ETL system that extracts and populates data into relational databases, manages data workflows, integrates with version control systems, and allows ontological annotations. It uses XML templates to extract and transform data from different sources and formats into a DISCO database, with tools to assist in creating scripts and documentation available online.
The document discusses the NIF Data Federation and Concept Mapping Tool. The NIF Data Federation provides the ability to search across individually hosted neuroscience databases and datasets. It currently indexes over 232 databases containing over 358 million records. The Concept Mapping Tool was developed to manage federated resources by setting up database mappings and exporting data to Google Refine for concept mapping. The document also lists several integrated virtual databases created by NIF that combine related data from multiple sources into a single view.
The document discusses the data ingestion, ontology, and search architectures of the Neuroscience Information Framework (NIF). It describes the current functions of the DISCO Dashboard and Concept Mapper for ingesting and mapping data to the NIF ontology. It also lists the key components of the NIF search architecture using the SOLR platform, including generating indexes informed by the ontology and concept mapper, and providing query processing and REST services.
The document discusses the need for a common platform to access biological data and resources in a uniform way. It notes that currently, each resource implements a different model, making systems complex and difficult to learn. It states that the common platform should provide data access in a way that is understandable to biologists, rather than relying solely on standards like RDF, XML or NoSQL. The document also references the Neuroscience Information Framework registry of over 2200 databases and 800 tools, but notes that detailed metadata is only available for a small portion of these resources currently.
Maryann Martone
Making Sense of Biological Systems: Using Knowledge Mining to Improve and Validate Models of Living Systems; NIH COBRE Center for the Analysis of Cellular Mechanisms and Systems Biology, Montana State University, Bozeman, MT
August 24, 2012
This document discusses challenges and potential solutions for improving data sharing in neuroscience. It notes that while there is a large amount of neuroscience data, it is unevenly distributed across repositories and databases. The document proposes creating a distributed "data sharing ecosystem" where data and related metadata are systematically tracked, linked and made available. Key elements would include unique IDs for all data objects, logging all activities, and developing accountability scores and influence measures to promote better data citizenship. However, concerns are raised about monitoring researchers and potential biases, which would need to be addressed for such a system to work.
NIFSTD is a comprehensive ontology for neuroscience developed by the Neuroscience Information Framework (NIF) project. It consists of several modular ontologies covering neuroscience domains like brain regions, cells, molecules, and diseases. NIFSTD aims to provide consistent descriptions of neuroscience resources to enable concept-based search across multiple data types. It imports and maps to existing community ontologies and seeks to avoid duplication of efforts.
The document discusses neuroscience ontologies created by the Neuroscience Information Framework (NIF). It describes how NIF incorporates existing ontologies and extends them for neuroscience as needed. NIF includes modular ontologies covering multiple scales including molecules, cells, anatomy, and functions. Key ontologies discussed include NIFSTD, Neurolex, and bridging files that link related concepts across ontologies. Examples are provided of how neuron classes are defined based on attributes such as brain region, molecular constituents, and roles.
Clinic ^%[+27633867063*Abortion Pills For Sale In Tembisa Central19various
Clinic ^%[+27633867063*Abortion Pills For Sale In Tembisa Central Clinic ^%[+27633867063*Abortion Pills For Sale In Tembisa CentralClinic ^%[+27633867063*Abortion Pills For Sale In Tembisa CentralClinic ^%[+27633867063*Abortion Pills For Sale In Tembisa CentralClinic ^%[+27633867063*Abortion Pills For Sale In Tembisa Central
Histololgy of Female Reproductive System.pptxAyeshaZaid1
Dive into an in-depth exploration of the histological structure of female reproductive system with this comprehensive lecture. Presented by Dr. Ayesha Irfan, Assistant Professor of Anatomy, this presentation covers the Gross anatomy and functional histology of the female reproductive organs. Ideal for students, educators, and anyone interested in medical science, this lecture provides clear explanations, detailed diagrams, and valuable insights into female reproductive system. Enhance your knowledge and understanding of this essential aspect of human biology.
Osteoporosis - Definition , Evaluation and Management .pdfJim Jacob Roy
Osteoporosis is an increasing cause of morbidity among the elderly.
In this document , a brief outline of osteoporosis is given , including the risk factors of osteoporosis fractures , the indications for testing bone mineral density and the management of osteoporosis
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptxHolistified Wellness
We’re talking about Vedic Meditation, a form of meditation that has been around for at least 5,000 years. Back then, the people who lived in the Indus Valley, now known as India and Pakistan, practised meditation as a fundamental part of daily life. This knowledge that has given us yoga and Ayurveda, was known as Veda, hence the name Vedic. And though there are some written records, the practice has been passed down verbally from generation to generation.
5-hydroxytryptamine or 5-HT or Serotonin is a neurotransmitter that serves a range of roles in the human body. It is sometimes referred to as the happy chemical since it promotes overall well-being and happiness.
It is mostly found in the brain, intestines, and blood platelets.
5-HT is utilised to transport messages between nerve cells, is known to be involved in smooth muscle contraction, and adds to overall well-being and pleasure, among other benefits. 5-HT regulates the body's sleep-wake cycles and internal clock by acting as a precursor to melatonin.
It is hypothesised to regulate hunger, emotions, motor, cognitive, and autonomic processes.
Travel vaccination in Manchester offers comprehensive immunization services for individuals planning international trips. Expert healthcare providers administer vaccines tailored to your destination, ensuring you stay protected against various diseases. Conveniently located clinics and flexible appointment options make it easy to get the necessary shots before your journey. Stay healthy and travel with confidence by getting vaccinated in Manchester. Visit us: www.nxhealthcare.co.uk
Mercurius is named after the roman god mercurius, the god of trade and science. The planet mercurius is named after the same god. Mercurius is sometimes called hydrargyrum, means ‘watery silver’. Its shine and colour are very similar to silver, but mercury is a fluid at room temperatures. The name quick silver is a translation of hydrargyrum, where the word quick describes its tendency to scatter away in all directions.
The droplets have a tendency to conglomerate to one big mass, but on being shaken they fall apart into countless little droplets again. It is used to ignite explosives, like mercury fulminate, the explosive character is one of its general themes.
10 Benefits an EPCR Software should Bring to EMS Organizations Traumasoft LLC
The benefits of an ePCR solution should extend to the whole EMS organization, not just certain groups of people or certain departments. It should provide more than just a form for entering and a database for storing information. It should also include a workflow of how information is communicated, used and stored across the entire organization.
These lecture slides, by Dr Sidra Arshad, offer a simplified look into the mechanisms involved in the regulation of respiration:
Learning objectives:
1. Describe the organisation of respiratory center
2. Describe the nervous control of inspiration and respiratory rhythm
3. Describe the functions of the dorsal and respiratory groups of neurons
4. Describe the influences of the Pneumotaxic and Apneustic centers
5. Explain the role of Hering-Breur inflation reflex in regulation of inspiration
6. Explain the role of central chemoreceptors in regulation of respiration
7. Explain the role of peripheral chemoreceptors in regulation of respiration
8. Explain the regulation of respiration during exercise
9. Integrate the respiratory regulatory mechanisms
10. Describe the Cheyne-Stokes breathing
Study Resources:
1. Chapter 42, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 36, Ganong’s Review of Medical Physiology, 26th edition
3. Chapter 13, Human Physiology by Lauralee Sherwood, 9th edition
Big data from small data: A deep survey of the neuroscience landscape data via
1. Big data from small data: A deep
survey of the neuroscience
landscape data via
the Neuroscience Information
Framework
Maryann Martone, Ph. D.
University of California, San Diego
2. “Neural Choreography”
“A grand challenge in neuroscience is to elucidate brain function in relation
to its multiple layers of organization that operate at different spatial and
temporal scales. Central to this effort is tackling “neural choreography” --
the integrated functioning of neurons into brain circuits-- Neural
choreography cannot be understood via a purely reductionist approach.
Rather, it entails the convergent use of analytical and synthetic tools to
gather, analyze and mine information from each level of analysis, and
capture the emergence of new layers of function (or dysfunction) as we
move from studying genes and proteins, to cells, circuits, thought, and
behavior....
However, the neuroscience community is not yet fully engaged in exploiting the
rich array of data currently available, nor is it adequately poised to capitalize
on the forthcoming data explosion. “
Akil et al., Science, Feb 11, 2011
3. “Data choreography”
In that same issue of Science
Asked peer reviewers from last year about the availability and use of
data
About half of those polled store their data only in their
laboratories—not an ideal long-term solution.
Many bemoaned the lack of common metadata and archives as a
main impediment to using and storing data, and most of the
respondents have no funding to support archiving
And even where accessible, much data in many fields is too poorly
organized to enable it to be efficiently used.
“...it is a growing challenge to ensure that data produced during the
course of reported research are appropriately
described, standardized, archived, and available to all.” Lead Science
editorial (Science 11 February 2011: Vol. 331 no. 6018 p. 649 )
4. A data federation problem
No single technology serves these all
equally well.
Multiple data types; multiple
scales; multiple databases
Whole brain data
(20 um
microscopic MRI)
Mosiac LM
images (1 GB+)
Conventional LM
images
Individual cell
morphologies
Neuroscience is unlikely to be EM volumes &
served by a few large databases reconstructions
like the genomics and proteomics
Solved molecular
community structures
5. NIF is an initiative of the NIH Blueprint consortium of institutes
What types of resources (data, tools, materials, services) are
available to the neuroscience community?
How many are there?
What domains do they cover? What domains do they not cover?
Where are they?
Web sites • PDF files
Databases • Desk drawers
Literature
Supplementary material
Who uses them?
Who creates them?
How can we find them?
How can we make them better in the future? http://neuinfo.org
6. We need more databases (?)
•NIF Registry: A
catalog of
neuroscience-relevant
resources
•> 5000 currently
listed
•> 2000 databases
•And we are finding
more every day
7. But we have Google!
Current web is designed Wikipedia: The Deep Web
to share documents (also called Deepnet, the
Documents are invisible Web, DarkNet,
unstructured data Undernet or the hidden
Much of the content of Web) refers to World Wide
digital resources is part of Web content that is not
the “hidden web” part of the Surface Web,
which is indexed by
standard search engines.
8. NIF must work with ecosystem as
it is today
NIF has developed a production technology platform for
researchers to discover, share, access, analyze, and
integrate neuroscience-relevant information
Semantically-enabled search engine and interface that customizes
results for neuroscience
System that searches the “hidden web”, i.e., content not well served by
search engines
Data resources are predominantly relational, xml, text, rdf, owl
Automated data harvesting technologies that produce dynamic indices
of data content including databases, web pages, text, xml etc.
Tools to make products and data available
Designed to be populated rapidly; set up process for progressive
refinement
9. NIF accomplishments
Assembled the largest searchable
collation of neuroscience data on the
web UCSD, Yale, Cal Tech, George Mason, Washington Univ
Data federation
Resource registry (materials, data,
tools, services)
Pub Med literature
Full text of open access
The largest ontology for neuroscience
NIF search portal: simultaneous search
over data, NIF catalog and biomedical
literature
Neurolex Wiki: a community wiki
serving neuroscience concepts
NIF is poised to capitalize on the new tools
A unique technology platform and emphasis on big data and open
A reservoir of cross-disciplinary
science
biomedical data expertise
10. NIF data federation
Percentage of data records per
data type
Brain activation foci
Animals
Images
Pathways
Drugs
connectivity
Antibodies
Microarray
98% Grants
> 180 sources; 350 M records: NIF was Percentage of data records per data
designed to be populated rapidly, with type: everything but microarray
progressive refinement of data
11. What do you mean by data?
Databases come in many shapes and sizes
Primary data: Registries:
Data available for Metadata
reanalysis, e.g., microarray data Pointers to data sets or
sets from GEO; brain images from materials stored elsewhere
XNAT; microscopic images
(CCDB/CIL) Data aggregators
Secondary data Aggregate data of the same
Data features extracted through
type from multiple
data processing and sometimes
sources, e.g., Cell Image
normalization, e.g, brain structure
Library ,SUMSdb, Brede
volumes (IBVD), gene expression Single source
levels (Allen Brain Atlas); brain Data acquired within a single
connectivity statements (BAMS) context , e.g., Allen Brain Atlas
Tertiary data
Claims and assertions about the Researchers are producing a variety of
meaning of data information artifacts using a multitude of
E.g., gene technologies
upregulation/downregulation,
12. What types of questions can I ask?
We’d like to be able to find:
What is known****:
What is the average diameter of a Purkinje neuron
Is GRM1 expressed In cerebral cortex?
What are the projections of hippocampus?
What genes have been found to be upregulated in
chronic drug abuse in adults
Is there a database of fMRI studies?
What studies used my polyclonal antibody against
GABA in humans?
What rat strains have been used most
extensively in research during the last 20 years?
What is not known:
Connections among data
Gaps in knowledge
Without some sort of framework, very difficult to
do
13. What are the connections of the
hippocampus?
Hippocampus OR “CornuAmmonis” OR
“Ammon’s horn” Query expansion: Synonyms
and related concepts
Boolean queries
Data sources
categorized by
“data type” and
level of nervous
system Tutorials for using
full resource when
getting there from
NIF
Common views
across multiple
sources
Link back to
record in
original
source
14. Results are organized within a common
framework
Target site
Synapsed by
innervates Connects to
Input region
Synapsed with
Cellular contact
Projects to
Axon innervates
Subcellular contact
Source site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
15. The scourge of neuroanatomical nomenclature:
Importance of NIF semantic framework
•NIF Connectivity: 7 databases containing connectivity primary data or claims
from literature on connectivity between brain regions
•Brain Architecture Management System (rodent)
•Temporal lobe.com (rodent)
•Connectome Wiki (human)
•Brain Maps (various)
•CoCoMac (primate cortex)
•UCLA Multimodal database (Human fMRI)
•Avian Brain Connectivity Database (Bird)
•Total: 1800 unique brain terms (excluding Avian)
•Number of exact terms used in > 1 database: 42
•Number of synonym matches: 99
•Number of 1st order partonomy matches: 385
16. NIF’s minimum requirements for
effective data sharing
You (and the machine) have to be able to
find it
Accessible through the web
Annotations
You have to be able to use it
Data type specified and in a usable form
You have to know what the data mean
Some semantics
Context: Experimental metadata
Provenance: Where did the data come from?
Reporting neuroscience data within a consistent framework helps enormously
17. What is an ontology?
Brain
Ontology: an explicit, formal has a
representation of concepts
relationships among them Cerebellum
within a particular domain that has a
expresses human knowledge in a Purkinje Cell Layer
machine readable form
has a
Branch of philosophy: a theory Purkinje cell
of what is is a
neuron
e.g., Gene ontologies
18. You need to use
ontology
identifiers instead
of strings
Blah, blah,
ontology blah
“Ontology as mathematics, computer science or esperanto”-
AndreyRzhetsky and James A. Evans
19. What can ontology do for us?
“Esperanto!”
Express neuroscience concepts in a way that is machine readable
Classes are identified by unique identifiers
Synonyms, lexical variants
Definitions
Provide means of disambiguation of strings
Nucleus part of cell; nucleus part of brain; nucleus part of atom
Rules by which a class is defined, e.g., a GABAergic neuron is neuron that releases
GABA as a neurotransmitter
Properties
Provide universals for navigating across different data sources
Semantic “index”
Perform reasoning
Link data through relationships not just one-to-one mappings
“Concept-based queries”
20. Power of unique identifiers: Are you the M
Martone who...
The Gene Wiki: community intelligence applied to human gene annotation.
Huss JW 3rd, Lindenbaum P, Martone M, Roberts D, Pizarro A, Valafar F, Hogenesch
JB, Su AI. Nucleic Acids Res. 2010 Jan;38(Database issue):D633-9.
Ontologies for Neuroscience: What are they and What are they Good for? Larson
SD, Martone ME. Front Neurosci. 2009 May;3(1):60-7. Epub 2009 May 1.
Three-dimensional electron microscopy reveals new details of membrane systems for
Ca2+ signaling in the heart. Hayashi T, Martone ME, Yu Z, Thor A, Doi M, Holst
MJ, Ellisman MH, Hoshijima M. J Cell Sci. 2009 Apr 1;122(Pt 7):1005-13.
Some analyses of forgetting of pictorial material in amnesic and demented
patients.Martone M, Butters N, Trauner D. J Clin Exp Neuropsychol. 1986 Jun;8(3):161-78.
Traumatic brain injury and the goals of care.Martone M. Hastings Cent Rep. 2006 Mar-
Apr;36(2):3.
Three-dimensional pattern of enkephalin-like immunoreactivity in the caudate nucleus of the
cat.Groves PM, Martone M,Young SJ, Armstrong DM. J Neurosci. 1988 Mar;8(3):892-900.
21. I am not a number (but I should
be)
Full URI: Uniform
Resource Identifier Dept of
Boston VA
Psychiatry,
http://orcid.org/1234567 Hospital
UCSD
Label: Maryann Elizabeth
Martone
Synonym: ME Martone, M M Martone Female
Martone, Maryann
Abbreviation: MEM
Is a
Nelson
Has a Butters
Publications
Is that entity which has
these properties
Text mining algorithms can discover a lot of things
about me
ORCID project: Author ID’s
22. NIF Semantic Framework: NIFSTD ontology
NIFSTD
Anatomical
Organism Structure
Cell Dysfunction Quality
Subcellular
Molecule NS Function Investigation
structure
Macromolecule Gene Techniques Resource Instrument
Molecule Descriptors
Reagent Protocols
NIF covers multiple structural scales and domains of relevance to neuroscience
Aggregate of community ontologies with some extensions for
neuroscience, e.g., Gene Ontology, Chebi, Protein Ontology
Simple, basic “is a : hierarchies that can be used “as is” or to form the building blocks
for more complex representations
23. “We studied the behavior of CA2-binding proteins in
Ca2 neurons under high and low Ca2 conditions ”
NIF queries
across over
170+
BioGrid independent
Allen Brain Atlas databases
Brain Info
24. But you don’t have what I need!
•Provide a simple framework for
defining the concepts required
•Cell, Part of
brain, subcellular
structure, molecule
•Community based:
•Communities contribute
their vocabularies
•Reconcile and align
concepts used by different
domains
•Each concept gets its own
unique identifier
•Creating a computable index for
neuroscience data
•INCF Demo D03
http://neurolex.org Stephen Larson/INCF
25. Concept-based search: search by meaning
Search Google: GABAergic neuron
Search NIF: GABAergic neuron
NIF automatically searches for types of
GABAergic neurons
Types of GABAergic
neurons
26. Esperanto!
“The trouble is that if I make up all of my own URIs, my [data]
has no meaning to anyone else unless I explain what each URI is
intended to denote or mean. Two [data sets] with no URIs in
common have no information that can be interrelated.”
NIF favors reuse of identifiers rather than mapping
NIF imports many ontologies
Creating ontologies to be used as common building blocks:
modularity, low semantic overhead, is important
Many community ontologies available covering multiple domains
NIFSTD available via web serivices
Bioportal (http://bioportal.bioontology.org/)
http://www.rdfabout.com/intro/#Introducing%20RDF
27. NIF Analytics: The Neuroscience Ecosystem
Where are the data?
Striatum
Brain Hypothalamus
Olfactory bulb Data source
Brain region
Cerebral cortex
NIF is in a unique position to answer questions about the neuroscience
ecosystem
VadimAstakhov, Kepler Workflow Engine
28. Whither neuroscience information?
What is potentially knowable
∞
Unstructured;
What is known: Natural language
Literature, images, human processing, entity
knowledge recognition, image
processing and
analysis;
communication
What is easily machine
processable and accessible
29. Open world meets closed world
But...NIF has > 900,000
antibodies, 250,000 model
organisms, and 3 million microarray
records
Query for “reference” brain structures and their parts in NIF Connectivity database
30. Gender bias
NIF can start to
answer interesting
questions about
neuroscience
research, not just
about neuroscience
NIF Reports:
Male vs Female
31. What have we learned: Grabbing
the long tail of small data
Analysis of NIF shows
multiple databases with
similar scope and content
Many contain partially
overlapping data
Data “flows” from one
resource to the next
Data is
reinterpreted, reanalyze
d or added to
Is duplication good or bad?
32. Embracing duplication: Data Mash ups
•NIF queries across 3 of approximately 10 fMRI databases
•~300 PMID’swere common between Brede and SUMSdb
•PMID serves as a unique identifier for an article
•Same information; value added
Same data; different aspects
33. Same data: different analysis
Chronic vs acute morphine in striatum
Gemma: Gene ID + Gene Symbol
DRG: Gene name + Probe ID
Gemmapresented results relative to baseline chronic
morphine; DRG with respect to saline, so direction of
change is opposite in the 2 databases
Analysis:
1370 statements from Gemma regarding gene expression as
a function of chronicmorphine
617 were consistent with DRG; over half of the claims of
the paper were not confirmed in this analysis
Results for 1 gene were opposite in DRG and Gemma
45 did not have enough information provided in the paper to
make a judgment
34. Taking a global view on data:
microculture to ecosystem
Several powerful trends should change the way we
think about our data: One Many
Many data
Generation of data is getting easier shared data
Data space is getting richer: more –omes everyday
But...compared to the biological space, still sparse
Many eyes
Wisdom of crowds
More than one way to interpret data
Many algorithms
Not a single way to analyze data
Many analytics
“Signatures” in data may not be directly related to the question for
which they were acquired but tell us something really interesting
Are you exposing or burying your work?
35. The future of scientific
communication
We have learned over the years how to write Printing press
a scientific paper for other humans to read
and for other agents to index
We now have to learn how to write papers
for automated agents (and their humans)
to mine
We have learned over the years to report
Linked data cloud
data in papers for humans to read
We now have to learn how to publish data
in a form and on a suitable platform for
automated agents (and their humans) to
mine
Watson
Reporting neuroscience data within a consistent framework helps enormously
36. Why does it matter?
47/50 major preclinical
published cancer studies “There are no guidelines that
could not be replicated require all data sets to be
reported in a paper; often,
“The scientific community original data are removed
assumes that the claims in a during the peer review and
preclinical study can be taken publication process. “
at face value-that although
there might be some errors in Getting data out sooner in a
detail, the main message of form where they can be exposed
the paper can be relied on and to many eyes and many
analyses, and easily
the data will, for the most compared, may allow us to
part, stand the test of time. expose errors and develop
Unfortunately, this is not better metrics to evaluate the
always the case.” validity of data
Begley and Ellis, 29 MARCH 2012 | VOL 483 | Data, not just stories about them!
NATURE | 531
37. Register your resource to NIF!
1 Institutional
“How do I share my
data?” repositories
Cloud
2
“There is no database
for my data” INCF: Global
infrastructure
3 Community
database:
beginning
4 Community Education
database:
End
Industry Government
NIF is designed to leverage existing investments in resources and infrastructure
38. It’s a messy ecosystem (and that’s OK)
NIF favors a
hybrid, tiered, federated Gene
Organism
system Neuron Brain part Disease
Domain knowledge
Ontologies Caudate projects to
Snpc Grm1 is upregulated in
chronic cocaine
Claims about results Betz cells
degenerate in ALS
Virtuoso RDF triples
Data
Data federation
Workflows
Narrative
39. Future of Research Communications
and e-Scholarship
FORCE11: http://force11.org
Founded by Phil Bourne, Tim
Clark, Ed Hovy, Anita de Waard
and Ivan Herman
Bring together stakeholders with
an interest in moving scholarly
communication beyond reliance
on papers and traditional impact
metrics
Beyond the PDF 2: Spring 2013
40. NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Interim PI Fahim Imam, NIF Ontology Engineer
AmarnathGupta, UCSD, Co Investigator Larry Lui
Anita Bandrowski, NIF Project Leader Andrea Arnaud Stagg
Gordon Shepherd, Yale University Jonathan Cachat
Perry Miller Jennifer Lawrence
Luis Marenco Lee Hornbrook
Rixin Wang Binh Ngo
David Van Essen, Washington University VadimAstakhov
Erin Reid XufeiQian
Paul Sternberg, Cal Tech Chris Condit
ArunRangarajan Mark Ellisman
Hans Michael Muller Stephen Larson
Yuling Li Willie Wong
Giorgio Ascoli, George Mason University Tim Clark, Harvard University
SrideviPolavarum Paolo Ciccarese
Karen Skinner, NIH, Program Officer
41. Why do we create so many
overlapping products?
Science is
“That which I cannot incremental;we build on
build, I cannot understand” the results of others
Don’t trust any data you It’s ingrained in our culture
haven’t generated “Build a better mousetrap and the
Oh, now I see what you are world will beat down our doors”
saying Little credit for making someone
Scientists know the else’s product better
domain, not informatics
Yes, we are planning to There’s more than
do that... way to skin a cat....
We are all time and resource We are still mastering the
constrained medium
We extend projects in time Technology is developing fast
42. You need to use
ontology
identifiers instead
of strings
Blah, blah, ont
ology blah
When I talk toresource providers, neuroscientists (and
journal editors)...
Editor's Notes
Doesn’t do it well; doesn’t organize the results in a domain specific way; doesn’t search across itFor use as content goal Dynamic inventory for deep coverage of neuroscience data: Genes -> Systems