The document discusses the Neuroscience Information Framework (NIF) and its role in facilitating discovery and use of neuroscience resources through a consistent semantic framework. NIF provides a portal for searching various types of neuroscience data and information organized by categories. It utilizes ontologies and advanced technologies to allow simultaneous searching of multiple sources. Challenges include the large number of databases and other resources, differing data types, and inconsistent naming of brain structures across sources.
How do we know what we don't know? Exploring the data and knowledge space th...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative that aims to catalog and integrate neuroscience resources and data. NIF surveys the neuroscience resource landscape, currently cataloging over 3000 databases and datasets. It provides semantic integration of these resources through the use of ontologies and allows deep search of aggregated data. However, significant amounts of neuroscience data and resources remain inaccessible in publications, databases, and file drawers. Barriers to data sharing include lack of incentives, standards, and resources. NIF and related efforts aim to develop solutions to make more neuroscience data FAIR - findable, accessible, interoperable, and reusable.
The document discusses navigating the neuroscience data landscape. It notes that a grand challenge in neuroscience is to understand brain function across multiple scales of organization. Central to this effort is understanding "neural choreography" - the integrated functioning of neurons into brain circuits. The Neuroscience Information Framework (NIF) aims to facilitate discovery and utilization of web-based neuroscience resources. However, the neuroscience community has not fully exploited currently available data or prepared for forthcoming data.
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
The document discusses using the Neuroscience Information Framework (NIF) to reveal knowledge gaps in neuroscience. It summarizes that NIF aims to maximize awareness, access, and utility of neuroscience research resources by uniting information from over 200 databases containing over 400 million records. However, it notes that certain domains may still be underrepresented due to biases in available data driven by factors like funding priorities. The framework uses ontologies to help integrate diverse data types and link them with defined concepts, but notes that neuroanatomical structures in particular pose challenges due to inconsistent naming conventions across studies.
Big data from small data: A survey of the neuroscience landscape through the...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative by the NIH Blueprint to provide a single access point for searching across multiple neuroscience databases and data types. NIF aims to maximize access to and utility of worldwide neuroscience resources by creating a consistent framework for describing resources and enabling simultaneous searches. It notes that neuroscience data exists in many forms, from raw data to processed data to claims, across multiple scales and data types. NIF is designed to rapidly integrate these diverse resources through a tiered system that has a low barrier for data providers to participate.
The document discusses the Neuroscience Information Framework (NIF), which provides a portal for finding and utilizing web-based neuroscience resources. NIF allows simultaneous searching of multiple data sources through a concept-based interface organized by categories. It indexes over 35 million records from 65+ databases. NIF aims to address the challenges of dispersed and inconsistent neuroscience data by providing a common framework and tools to integrate data from various sources. Ontologies are discussed as a way to represent neuroscience concepts and relationships in a machine-readable way to facilitate data integration and querying across multiple scales and domains.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a portal for finding and utilizing web-based neuroscience resources. NIF provides a consistent framework for describing various resources like databases, literature, and images. It allows simultaneous searches across these different data types and is supported by neuroscience ontologies. NIF currently catalogs over 5,000 resources and is working to integrate these diverse data sources to help answer questions and discover gaps in our knowledge about the brain.
How do we know what we don't know? Exploring the data and knowledge space th...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative that aims to catalog and integrate neuroscience resources and data. NIF surveys the neuroscience resource landscape, currently cataloging over 3000 databases and datasets. It provides semantic integration of these resources through the use of ontologies and allows deep search of aggregated data. However, significant amounts of neuroscience data and resources remain inaccessible in publications, databases, and file drawers. Barriers to data sharing include lack of incentives, standards, and resources. NIF and related efforts aim to develop solutions to make more neuroscience data FAIR - findable, accessible, interoperable, and reusable.
The document discusses navigating the neuroscience data landscape. It notes that a grand challenge in neuroscience is to understand brain function across multiple scales of organization. Central to this effort is understanding "neural choreography" - the integrated functioning of neurons into brain circuits. The Neuroscience Information Framework (NIF) aims to facilitate discovery and utilization of web-based neuroscience resources. However, the neuroscience community has not fully exploited currently available data or prepared for forthcoming data.
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
The document discusses using the Neuroscience Information Framework (NIF) to reveal knowledge gaps in neuroscience. It summarizes that NIF aims to maximize awareness, access, and utility of neuroscience research resources by uniting information from over 200 databases containing over 400 million records. However, it notes that certain domains may still be underrepresented due to biases in available data driven by factors like funding priorities. The framework uses ontologies to help integrate diverse data types and link them with defined concepts, but notes that neuroanatomical structures in particular pose challenges due to inconsistent naming conventions across studies.
Big data from small data: A survey of the neuroscience landscape through the...Maryann Martone
The document discusses the Neuroscience Information Framework (NIF), an initiative by the NIH Blueprint to provide a single access point for searching across multiple neuroscience databases and data types. NIF aims to maximize access to and utility of worldwide neuroscience resources by creating a consistent framework for describing resources and enabling simultaneous searches. It notes that neuroscience data exists in many forms, from raw data to processed data to claims, across multiple scales and data types. NIF is designed to rapidly integrate these diverse resources through a tiered system that has a low barrier for data providers to participate.
The document discusses the Neuroscience Information Framework (NIF), which provides a portal for finding and utilizing web-based neuroscience resources. NIF allows simultaneous searching of multiple data sources through a concept-based interface organized by categories. It indexes over 35 million records from 65+ databases. NIF aims to address the challenges of dispersed and inconsistent neuroscience data by providing a common framework and tools to integrate data from various sources. Ontologies are discussed as a way to represent neuroscience concepts and relationships in a machine-readable way to facilitate data integration and querying across multiple scales and domains.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a portal for finding and utilizing web-based neuroscience resources. NIF provides a consistent framework for describing various resources like databases, literature, and images. It allows simultaneous searches across these different data types and is supported by neuroscience ontologies. NIF currently catalogs over 5,000 resources and is working to integrate these diverse data sources to help answer questions and discover gaps in our knowledge about the brain.
The hippocampus receives input from the entorhinal cortex and sends projections to multiple targets in the brain. Its main outputs are to the subiculum, which projects to regions like the nucleus accumbens, amygdala, and medial prefrontal cortex. The hippocampus plays an important role in memory formation and spatial navigation.
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
Overview of the Neuroscience Information Framework and how it brings together data, in the form of distributed databases, and knowledge, in the form of ontologies to show the mapping of the dataspace and places where there are mismatches between data and knowledge.
the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a consistent framework and portal for discovering and utilizing web-based neuroscience resources. It summarizes the goals of NIF in indexing over 2000 databases and making their content searchable through an expansive neuroscience ontology. The document outlines the history and development of NIF, describes its search capabilities and use of ontologies, and provides examples of tools and resources that integrate NIF services like the Whole Brain Catalog.
Neuroscience research increasingly relies on large, heterogeneous datasets from various sources. Integrating these diverse data types and making them accessible presents challenges. The NIF (Neuroscience Information Framework) addresses this by creating a federated search engine and unified interface to access multiple neuroscience databases. NIF aims to make neuroscience data more discoverable, accessible, and usable through techniques like unique identifiers, metadata standards, and semantic integration. This will help researchers more effectively find and use relevant neuroscience information.
Semantic Technology empowering Real World outcomes in Biomedical Research and...Amit Sheth
Based on the context and background knowledge:
- Patient has leg swelling and stiffness which is limiting her function
- She also has shortness of breath
- Shortness of breath can be a symptom of heart failure
- Heart failure can cause leg swelling
The NLP should annotate:
Problem: Heart failure
Symptom: Shortness of breath
Symptom: Leg swelling
By utilizing background knowledge, the inconsistency is resolved.
Resolve Inconsistency
The patient reports intermittent chest pain NLP Patient has chest pain
on exertion for the past few months. On
exam today, she denies any chest pain.
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
Overview of how to use the Neuroscience Information Framework for data discovery presented at the Genetics of Addiction Workshop, held at Jackson Lab Aug 28- Sept 1, 2014.
The document describes a framework for biological relation extraction using biomedical ontologies and text mining. It discusses introducing biomedical text mining and outlines the problem, motivation, and challenges. It then presents the overall system components and architecture, including searching/browsing, Swanson's algorithm, protein-protein interactions, and gene clustering applications. The framework concept issues, design issues, sequence diagram, and database are also covered at a high level.
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...Maryann Martone
The document discusses the challenges of managing and utilizing the large amount of neuroscience data being generated. It notes that currently, about half of researchers only store data in their own labs and many lack funding for proper archiving. The National Information Framework (NIF) is working to address these issues by creating a catalog and federation of neuroscience resources to facilitate discovery, access, analysis and integration of data. NIF has assembled the largest searchable collection of neuroscience data on the web using an ontology and technologies that can search the "hidden web" of resources.
Data Provenance and Scientific Workflow ManagementNeuroMat
Introductory class on techniques and tools to manage scientific data, focusing on sources of information and data analysis. Lecturer: Prof. Kelly Rosa Braghetto, a NeuroMat associate investigator and a professor at the University of São Paulo's Department of Computer Science.
The document discusses methodologies for sharing long-tail data and what has been learned. It notes that unique identifiers (PIDs) are important for identifying entities across contexts. Standards like MINI and common data elements (CDEs) help ensure data is findable, accessible, and reusable. The Neuroscience Information Framework (NIF) aggregates ontologies and searches over 200 data sources to organize information. What we have learned is that data should be in repositories, not personal servers; people are key to these efforts; and resources should be comprehensive and support each other to advance open data sharing.
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...Maryann Martone
The NIF Registry provides insight into the state of digital neuroscience resources on the web. It has cataloged over 6,000 resources, including more than 2,200 databases. While some resources disappear over time, many more grow stale as they are not updated regularly. Maintaining an up-to-date registry requires frequent updates. The NIF data federation can search over 200 databases containing over 1 billion records. This collection continues to grow as new databases are added. The NIF utilizes ontologies and semantic frameworks to integrate data across diverse sources and provide insights into the neuroscience landscape.
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
This document discusses bioinformatics and some of its key concepts and tools. It begins with definitions of bioinformatics as the intersection of biology, computer science, and information technology. It then discusses some of the data formats, tools, and skills used in bioinformatics, including working with nucleotide sequence data, translating sequences into amino acids, and analyzing large datasets. It also summarizes how ontologies are used to represent concepts and how various data types are organized and stored in databases for analysis.
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
The National Resource for Network Biology (NRNB) held its External Advisory Council meeting on December 12, 2012. The NRNB is focused on developing network biology tools and collaborating with investigators. It oversees various technology research and development projects, software releases including Cytoscape 3.0, collaboration projects, and outreach/training events. The meeting agenda covered progress updates and sought advice on future plans.
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
The Neuroscience Information Framework (NIF) is an initiative of the NIH Blueprint to maximize access to and utility of worldwide neuroscience research resources. NIF catalogs over 10,000 resources including databases, literature, and materials. It provides search capabilities across these resources and develops ontologies and semantic frameworks to integrate diverse data types and scales. NIF aims to make dispersed neuroscience information more findable, accessible, interoperable, and reusable to enable new insights.
Data analysis & integration challenges in genomicsmikaelhuss
Presentation given at the Genomics Today and Tomorrow event in Uppsala, Sweden, 19 March 2015. (http://connectuppsala.se/events/genomics-today-and-tomorrow/) Topics include APIs, "querying by data set", machine learning.
Maryann Martone
Making Sense of Biological Systems: Using Knowledge Mining to Improve and Validate Models of Living Systems; NIH COBRE Center for the Analysis of Cellular Mechanisms and Systems Biology, Montana State University, Bozeman, MT
August 24, 2012
The hippocampus receives input from the entorhinal cortex and sends projections to multiple targets in the brain. Its main outputs are to the subiculum, which projects to regions like the nucleus accumbens, amygdala, and medial prefrontal cortex. The hippocampus plays an important role in memory formation and spatial navigation.
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
Overview of the Neuroscience Information Framework and how it brings together data, in the form of distributed databases, and knowledge, in the form of ontologies to show the mapping of the dataspace and places where there are mismatches between data and knowledge.
the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
The document discusses the Neuroscience Information Framework (NIF), which aims to provide a consistent framework and portal for discovering and utilizing web-based neuroscience resources. It summarizes the goals of NIF in indexing over 2000 databases and making their content searchable through an expansive neuroscience ontology. The document outlines the history and development of NIF, describes its search capabilities and use of ontologies, and provides examples of tools and resources that integrate NIF services like the Whole Brain Catalog.
Neuroscience research increasingly relies on large, heterogeneous datasets from various sources. Integrating these diverse data types and making them accessible presents challenges. The NIF (Neuroscience Information Framework) addresses this by creating a federated search engine and unified interface to access multiple neuroscience databases. NIF aims to make neuroscience data more discoverable, accessible, and usable through techniques like unique identifiers, metadata standards, and semantic integration. This will help researchers more effectively find and use relevant neuroscience information.
Semantic Technology empowering Real World outcomes in Biomedical Research and...Amit Sheth
Based on the context and background knowledge:
- Patient has leg swelling and stiffness which is limiting her function
- She also has shortness of breath
- Shortness of breath can be a symptom of heart failure
- Heart failure can cause leg swelling
The NLP should annotate:
Problem: Heart failure
Symptom: Shortness of breath
Symptom: Leg swelling
By utilizing background knowledge, the inconsistency is resolved.
Resolve Inconsistency
The patient reports intermittent chest pain NLP Patient has chest pain
on exertion for the past few months. On
exam today, she denies any chest pain.
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
Overview of how to use the Neuroscience Information Framework for data discovery presented at the Genetics of Addiction Workshop, held at Jackson Lab Aug 28- Sept 1, 2014.
The document describes a framework for biological relation extraction using biomedical ontologies and text mining. It discusses introducing biomedical text mining and outlines the problem, motivation, and challenges. It then presents the overall system components and architecture, including searching/browsing, Swanson's algorithm, protein-protein interactions, and gene clustering applications. The framework concept issues, design issues, sequence diagram, and database are also covered at a high level.
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...Maryann Martone
The document discusses the challenges of managing and utilizing the large amount of neuroscience data being generated. It notes that currently, about half of researchers only store data in their own labs and many lack funding for proper archiving. The National Information Framework (NIF) is working to address these issues by creating a catalog and federation of neuroscience resources to facilitate discovery, access, analysis and integration of data. NIF has assembled the largest searchable collection of neuroscience data on the web using an ontology and technologies that can search the "hidden web" of resources.
Data Provenance and Scientific Workflow ManagementNeuroMat
Introductory class on techniques and tools to manage scientific data, focusing on sources of information and data analysis. Lecturer: Prof. Kelly Rosa Braghetto, a NeuroMat associate investigator and a professor at the University of São Paulo's Department of Computer Science.
The document discusses methodologies for sharing long-tail data and what has been learned. It notes that unique identifiers (PIDs) are important for identifying entities across contexts. Standards like MINI and common data elements (CDEs) help ensure data is findable, accessible, and reusable. The Neuroscience Information Framework (NIF) aggregates ontologies and searches over 200 data sources to organize information. What we have learned is that data should be in repositories, not personal servers; people are key to these efforts; and resources should be comprehensive and support each other to advance open data sharing.
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...Maryann Martone
The NIF Registry provides insight into the state of digital neuroscience resources on the web. It has cataloged over 6,000 resources, including more than 2,200 databases. While some resources disappear over time, many more grow stale as they are not updated regularly. Maintaining an up-to-date registry requires frequent updates. The NIF data federation can search over 200 databases containing over 1 billion records. This collection continues to grow as new databases are added. The NIF utilizes ontologies and semantic frameworks to integrate data across diverse sources and provide insights into the neuroscience landscape.
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
This document discusses bioinformatics and some of its key concepts and tools. It begins with definitions of bioinformatics as the intersection of biology, computer science, and information technology. It then discusses some of the data formats, tools, and skills used in bioinformatics, including working with nucleotide sequence data, translating sequences into amino acids, and analyzing large datasets. It also summarizes how ontologies are used to represent concepts and how various data types are organized and stored in databases for analysis.
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
The National Resource for Network Biology (NRNB) held its External Advisory Council meeting on December 12, 2012. The NRNB is focused on developing network biology tools and collaborating with investigators. It oversees various technology research and development projects, software releases including Cytoscape 3.0, collaboration projects, and outreach/training events. The meeting agenda covered progress updates and sought advice on future plans.
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
The Neuroscience Information Framework (NIF) is an initiative of the NIH Blueprint to maximize access to and utility of worldwide neuroscience research resources. NIF catalogs over 10,000 resources including databases, literature, and materials. It provides search capabilities across these resources and develops ontologies and semantic frameworks to integrate diverse data types and scales. NIF aims to make dispersed neuroscience information more findable, accessible, interoperable, and reusable to enable new insights.
Data analysis & integration challenges in genomicsmikaelhuss
Presentation given at the Genomics Today and Tomorrow event in Uppsala, Sweden, 19 March 2015. (http://connectuppsala.se/events/genomics-today-and-tomorrow/) Topics include APIs, "querying by data set", machine learning.
Maryann Martone
Making Sense of Biological Systems: Using Knowledge Mining to Improve and Validate Models of Living Systems; NIH COBRE Center for the Analysis of Cellular Mechanisms and Systems Biology, Montana State University, Bozeman, MT
August 24, 2012
An expert knowledge base on human performance and cognition was created by extracting information from scientific literature using natural language processing and pattern-based techniques. Over 3 million facts were extracted from abstracts and mapped to a hierarchical structure derived from Wikipedia. The knowledge base was deployed through a browsing tool called Scooner that allows users to navigate relationships between concepts. Further work is focused on improving knowledge base quality by normalizing entities, filtering assertions, and integrating related ontologies and vocabularies.
A knowledge capture framework for domain specific search systemsramakanz
This is the product roll out presentation at the AFRL on creating a focused knowledge base, search, and retrieval system for the domain of human performance and cognition.
Knowledge graph construction for research & medicinePaul Groth
1) Elsevier aims to build knowledge graphs to help address challenges in research and medicine like high drug development costs and medical errors.
2) Knowledge graphs link entities like people, concepts, and events to provide answers by going beyond traditional bibliographic descriptions.
3) Elsevier constructs knowledge graphs using techniques like information extraction from text, integrating data sources, and predictive modeling of large patient datasets to identify statistical correlations.
The document discusses the need for a uniform resource layer to allow researchers to easily find and access biomedical resources such as databases, software tools, biobanks, and services. It notes that currently, each resource implements different models and systems that are complex and difficult to learn. There must be a common platform that makes access to biological data uniform so researchers can understand data, not just compute it. While progress has been made in registering some resources, much work still needs to be done to fully register and provide deep metadata for existing resources to achieve this goal.
Elsevier aims to construct knowledge graphs to help address challenges in research and medicine. Knowledge graphs link entities like people, concepts, and events to provide answers. Elsevier analyzes text and data to build knowledge graphs using techniques like information extraction, machine learning, and predictive modeling. Their knowledge graph integrates data from publications, clinical records, and other sources to power applications that help researchers, medical professionals, and patients. Knowledge graphs are a critical component for delivering value, especially as data volumes and needs accelerate.
NIFSTD is a comprehensive ontology for neuroscience developed by the Neuroscience Information Framework (NIF) project. It consists of several modular ontologies covering neuroscience domains like brain regions, cells, molecules, and diseases. NIFSTD aims to provide consistent descriptions of neuroscience resources to enable concept-based search across multiple data types. It imports and maps to existing community ontologies and seeks to avoid duplication of efforts.
The Neuroscience Information Framework (NIF) uses ontologies like the NIF Standard Ontology (NIFSTD) to enable concept-based search across multiple neuroscience resources. NIFSTD integrates over 60,000 concepts from various domains of neuroscience and reuses terms from existing ontologies. It allows for classification of neuroscience entities and logical inferences to find related concepts. The NIF framework aims to build a rich knowledgebase integrating neuroscience data from various sources.
This document discusses the potential for open source artificial intelligence to help understand molecular biology data. It argues that capturing common sense knowledge computationally has been challenging, but knowledge about molecular biology exists explicitly. An open source AI focused on molecular biology could help explain genomic data by developing a comprehensive knowledge base and using abductive inference. However, explaining biological phenomena is difficult and requires judgment. The document advocates for open source development to gain productivity advantages and build trust through transparency. It outlines challenges and opportunities for facilitating an open source AI community focused on understanding life.
The document discusses neuroscience ontologies created by the Neuroscience Information Framework (NIF). It describes how NIF incorporates existing ontologies and extends them for neuroscience as needed. NIF includes modular ontologies covering multiple scales including molecules, cells, anatomy, and functions. Key ontologies discussed include NIFSTD, Neurolex, and bridging files that link related concepts across ontologies. Examples are provided of how neuron classes are defined based on attributes such as brain region, molecular constituents, and roles.
This document provides a review and summary of major scientific events, trends, and publications in translational bioinformatics in 2008 by Russ B. Altman from Stanford University. Some of the key topics covered include the sequencing and analysis of an individual's diploid genome, next-generation sequencing technologies, genome-wide association studies, pharmacogenomics, analysis of high-throughput molecular data, neuroscience datasets, and using molecular information to improve disease detection and treatment. The review highlights over 25 seminal papers from 2008 and provides insights on emerging trends in the field.
Talk from OHBM education day 2018, an overview of data sharing and other resources for neuroimaging research. Also a brief discussion of the impact that openly shared data has had on publications.
Phyloinformatics combines phylogenetics and informatics to systematically study and classify evolutionary relationships. It has progressed from closed private data to more open and linked data through standards like ontologies and semantic web technologies. This allows phylogenetic concepts and data to be formalized and connected across resources using unique identifiers and statements called triples. Querying linked phylogenetic data from integrated sources will enable new synthetic research, though challenges remain in deploying these technologies and unlocking legacy data currently locked in publications.
The document discusses the challenges of managing and analyzing the large amounts of neuroscience data being generated. It notes that currently, about half of researchers only store their data locally in their labs instead of in shared databases or archives. This prevents other researchers from accessing and using the data. The National Information Forum (NIF) is working to address these issues by creating a registry of neuroscience resources and developing technologies to allow researchers to discover, share, analyze and integrate data from various sources. NIF's registry currently catalogs over 6000 resources, including 2200 databases. The goal is for NIF to help the neuroscience community better exploit existing data and prepare for future increases in data.
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
Presentation at the IJCAI 2018 Industry Day
Elsevier serves researchers, doctors, and nurses. They have come to expect the same AI based services that they use in everyday life in their work environment, e.g.: recommendations, answer driven search, and summarized information. However, providing these sorts of services over the plethora of low resource domains that characterize science and medicine is a challenging proposition. (For example, most of the shelf NLP components are trained on newspaper corpora and exhibit much worse performance on scientific text). Furthermore, the level of precision expected in these domains is quite high. In this talk, we overview our efforts to overcome this challenge through the application of four techniques: 1) unsupervised learning; 2) leveraging of highly skilled but low volume expert annotators; 2) designing annotation tasks for non-experts in expert domains; and 4) transfer learning. We conclude with a series of open issues for the AI community stemming from our experience.
Presented during the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'12). Part of the workshop 'New Models and Modes for Data Sharing: Experiences from Neuroscience'. Presented by Jeffrey S. Grethe, Ph.D. from the Center for Research in Biological Systems at the University of California, San Diego.
This workshop featured several large scale efforts to establish data sharing platforms, standards and tools to promote data intensive analysis in the neurosciences. As we head into the second decade of the 21st century, many scientists realize that current methods for publishing and accessing data are outmoded and inefficient. Neuroscience, with its large diverse and highly competitive community, has been slow to adopt more open sharing of data and has lacked effective tools to do so. There has been a significant investment in databases and tools for biological science, and frequent calls for more of them, but few calls to the biological community to adopt practices and frameworks for making their resources more easily discoverable and data more accessible. Data are contained within diverse sources, from web pages, databases, literature to personal lab systems, making for a haphazard mechanism for data and tool discovery. Although these mechanisms are effective for small communities, they are parochial for the totality of resources available, leading to fragmentation in the resource ecosystem. Neuroscience, with its diverse subdisciplines, complex data types and broad domain, presents the perfect exemplar of the current practices, bottlenecks and issues surrounding open access to data. This situation is changing, however, as groups have started to work together to define new models and tools for sharing and analyzing neuroscience data on an international scale. In this workshop, we bring together experts from national and international projects to discuss issues of data access and progress towards establishing platforms and best practices for effective sharing of neuroscience data in support of basic and clinical neuroscience.
Similar to The real world of ontologies and phenotype representation: perspectives from the Neuroscience Information Framework (20)
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
The real world of ontologies and phenotype representation: perspectives from the Neuroscience Information Framework
1. The real world of ontologies and
phenotype representation:
perspectives from the
Neuroscience Information
Framework
Maryann Martone, Ph. D.
University of California, San Diego
2. “Neural Choreography”
“A grand challenge in neuroscience is to elucidate brain function in relation
to its multiple layers of organization that operate at different spatial and
temporal scales. Central to this effort is tackling “neural choreography” --
the integrated functioning of neurons into brain circuits-- Neural
choreography cannot be understood via a purely reductionist approach.
Rather, it entails the convergent use of analytical and synthetic tools to
gather, analyze and mine information from each level of analysis, and
capture the emergence of new layers of function (or dysfunction) as we
move from studying genes and proteins, to cells, circuits, thought, and
behavior....
However, the neuroscience community is not yet fully engaged in exploiting the
rich array of data currently available, nor is it adequately poised to capitalize
on the forthcoming data explosion. “
Akil et al., Science, Feb 11, 2011
3. “Data choreography”
In that same issue of Science
Asked peer reviewers from last year about the availability and use of
data
About half of those polled store their data only in their
laboratories—not an ideal long-term solution.
Many bemoaned the lack of common metadata and archives as a
main impediment to using and storing data, and most of the
respondents have no funding to support archiving
And even where accessible, much data in many fields is too poorly
organized to enable it to be efficiently used.
“...it is a growing challenge to ensure that data produced during the
course of reported research are appropriately described, standardized,
archived, and available to all.” Lead Science editorial (Science 11
February 2011:Vol. 331 no. 6018 p. 649 )
4. NIF is an initiative of the NIH Blueprint consortium of institutes
What types of resources (data, tools, materials, services) are
available to the neuroscience community?
How many are there?
What domains do they cover? What domains do they not cover?
Where are they?
Web sites
Databases
Literature
Supplementary material
Who uses them?
Who creates them?
How can we find them?
How can we make them better in the future? http://neuinfo.org
• PDF files
• Desk drawers
5. In an ideal world...
We’d like to be able to find:
What is known****:
What is the average diameter of a Purkinje neuron
IsGRM1 expressed In cerebral cortex?
What are the projections of hippocampus?
What genes have been found to be upregulated in
chronic drug abuse in adults
Is alpha synuclein in the striatum?
What studies used my polyclonal antibody against
GABA in humans?
What rat strains have been used most extensively in
research during the last 20 years?
What is not known:
Connections among data
Gaps in knowledge
Without some sort of framework, very difficult to
RequiredComponents:
– Query interface
– Search strategies
– Data sources
– Infrastructure
– Results display
– Why did I get this
result?
– Analysis tools
6. The Neuroscience Information Framework: Discovery and
utilization of web-based resources for neuroscience
A portal for finding and
using neuroscience
resources
A consistent framework for
describing resources
Provides simultaneous
search of multiple types of
information, organized by
category
Supported by an expansive
ontology for neuroscience
Utilizes advanced
technologies to search the
“hidden web”
http://neuinfo.org
UCSD,Yale, CalTech, George Mason, Washington Univ
Supported by NIH Blueprint
Literature
Database
Federation
Registry
7. We need more databases !?
•NIF Registry: A
catalog of
neuroscience-relevant
resources
•> 5000 currently
listed
•> 2000 databases
•And we are finding
more every day
8. NIF must work with ecosystem as
it is today
NIF was one of the first projects to attempt data integration in
the neurosciences on a large scale
NIF is supported by a contract that specified the number of
resources to be added per year
Designed to be populated rapidly; set up process for progressive refinement
No budget was allocated to retrofit existing resources; had to work with
them in their current state
We designed a system that required little to no cooperation or work from
providers
NIF was required to assemble (not create) ontologies very fast and to provide a
platform through which the community could view, comment and add
NIF is enriched by ontologies but does not depend on them
Took advantage of community ontologies
But needed to take a very pragmatic and aggressive approach to incorporating and using them
Neurolex semantic wiki
9. What are the connections of the
hippocampus?
HippocampusOR “CornuAmmonis” OR
“Ammon’s horn” Query expansion: Synonyms
and related concepts
Boolean queries
Data sources
categorized by
“data type” and
level of nervous
system
Common views
across multiple
sources
Tutorials for using
full resource when
getting there from
NIF
Link back to
record in
original
source
10. Imminent: NIF 5.0
NIF 5.0 about
to be released
New design
New query
features
New analytics
11. What do you mean by data?
Databases come in many shapes and sizes
Primary data:
Data available for
reanalysis, e.g., microarray data
sets from GEO; brain images from
XNAT; microscopic images
(CCDB/CIL)
Secondary data
Data features extracted through
data processing and sometimes
normalization, e.g, brain structure
volumes (IBVD), gene expression
levels (Allen Brain Atlas); brain
connectivity statements (BAMS)
Tertiary data
Claims and assertions about the
meaning of data
E.g., gene
upregulation/downregulation,
Registries:
Metadata
Pointers to data sets or
materials stored elsewhere
Data aggregators
Aggregate data of the same
type from multiple sources,
e.g., Cell Image Library
,SUMSdb, Brede
Single source
Data acquired within a single
context , e.g., Allen Brain Atlas
Researchers are producing a variety of
information resources using a multitude of
technologies
12. Exploration: Where is alpha synuclein?
•Spatially:
•Gene
•Protein
•Subcellular
•Cellular
•Regional
•Organism
•Semantically:
•Gene regulation networks
•Protein pathways
•Cellular local connectivity
•Regional connectivity
•Who is studying it?
•Who is funding its study?
Networks exist across scales; all important in the nervous system
13. Set of modular ontologies
86, 000 + distinct concepts +
synonyms
Bridge files between modules
Expressed in OWL-DL language
Currently supports OWL 2
Tries to follow OBO community
best practices
Standardized to the same
upper level ontologies
e.g., Basic Formal Ontology
(BFO), OBO Relations
Ontology (OBO-RO),
Imports existing community
ontologies
e.g., CHEBI, GO, PRO,
DOID, OBI etc.
Retains identifiers in
most recent additions
but reflects history
13
Covers major domains of neuroscience:
Organisms, Brain Regions, Cells,
Molecules, Subcellular parts, Diseases,
Nervous system functions,Techniques
NIFSTD Ontologies
Fahim Imam, William Bug
14. “Search computing”: Query by concept
What genes are upregulated by drugs of abuse in the
adult mouse? (show me the data!)
Morphine
Increased
expression
Adult Mouse
Reasonable standards make it easy to search for and compare results
15. Diseases of nervous system
New: Data analytics
NIF is in a unique position to answer questions about the neuroscience
ecosystem using new analytics tools
Neurodegenerative
Seizuredisorders
Neoplasticdiseaseofnervoussystem
NIH
Reporter
NIFdatafederatedsources
16. Results are organized within a common
framework
Connects to
Synapsed with
Synapsed by
Input region
innervates
Axon innervates
Projects toCellular contact
Subcellular contact
Source site
Target site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
18. The scourge of neuroanatomical nomenclature:
Importance of NIF semantic framework
•NIFConnectivity: 7 databases containing connectivity primary data or claims
from literature on connectivity between brain regions
•BrainArchitecture Management System (rodent)
•Temporal lobe.com (rodent)
•ConnectomeWiki (human)
•Brain Maps (various)
•CoCoMac (primate cortex)
•UCLA Multimodal database (Human fMRI)
•Avian Brain Connectivity Database (Bird)
•Total: 1800 unique brain terms (excluding Avian)
•Number of exact terms used in > 1 database: 42
•Number of synonym matches: 99
•Number of 1st order partonomy matches: 385
19. Why so many names?
The brain is perhaps unique among major organ systems in the
multiplicity of naming schemes for its major and minor regions.
The brain has been divided based on topology of major
features, cyto- and myelo-architecture, developmental
boundaries, supposed evolutionary origins, histochemistry, gene
expression and functional criteria.
The gross anatomy of the brain reflects the underlying networks
only superficially, and thus any parcellation reflects a somewhat
arbitrary division based on one or more of these criteria.
The “activation map” images that commonly accompany brain imaging papers can be
misleading to inexperienced readers, by seeming to suggest that the boundaries between
“activated” and “unactivated” patches of cortex are unambigous and sharp. Instead, as
most researchers are aware, the apparent sharp boundaries are subject to the choice of
threshold applied to the statistical tests that generate the image.What, then, justifies
dividing the cortex into regions with boundaries based on this fuzzy, mutable measure of
functional profile?
(Saxe et al., 2010, p. 39).
Brainmaps.org
20. Program on Ontologies for Neural
Structures
International Neuroinformatics Coordinating Committee
Structural LexiconTask Force
Defining brain structures
Translate among terminologies
Neuronal RegistryTask Force
Consistent naming scheme for neurons
Knowledge base of neuron properties
Representation and DeploymentTask Force
Formal representation
Also interacts with Digital Atlasing Task Force
http://incf.org
21. NeuroLexWiki
http://neurolex.org Stephen Larson
•Provide a simple framework
for defining the concepts
required
•Light weight semantics
•Good teaching tool for
learning about
semantic integration
and the benefits of a
consistent semantic
framework
•Community based:
•Anyone can contribute
their terms, concepts,
things
•Anyone can edit
•Anyone can link
•Accessible: searched by
Google
•Building an extensive cross-
disciplinary knowledge base
for neuroscience
Demo D03
22. Defining nervous system structures
Parcellation scheme: Set of parcels
occupying part or all of an anatomical
entity that has been delineated using a
common approach or set of criteria,
often in a single study.A parcellation
scheme for any given individual entity
may include gaps, transitional zones, or
regions of uncertainty. A parcellation
scheme derived from a set of individuals
registered to a common target (atlas)
may be probabilistic and include overlap
of parcels in regions that reflect
individual variability or imperfections in
alignment.
14 parcellation schemes currently represented in Neurolex
Documentation available
INCF task force on
ontologies
23. Basic model: do not conflate conceptual
structures with parcels
Regional part of
nervous system
Functional part of
nervous system
Parcel
overlaps
overlaps overlaps
Parcel Parcel
Neuroscientists have a lot of different parcellation schemes because they have a lot of different
ways of classifying brain structures and techniques to match them are imperfect
24. Linking semantics to space: INCF Atlasing
www.neurolex.org
Link to spatial
representation in
scalable brain
atlas
Waxholm space
Seth Ruffins,Alan Ruttenberg, Rembrandt Bakker
25. Neurons in Neurolex
International
Neuroinformatics
Coordinating Facility (INCF)
building a knowledge base of
neurons and their properties
via the NeurolexWiki
Led by Dr. Gordon Shepherd
Consistent and parseable
naming scheme
Knowledge is readily
accessible, editable and
computable
While structure is imposed,
don’t worry too much about
the upper level classes of the
ontology
Stephen Larson
26. A KNOWLEDGE BASE OF NEURONAL PROPERTIES
26Additional semantics added in NIFSTD by ontology engineer
27. Concept-based search: search by meaning
Search Google: GABAergic neuron
Search NIF: GABAergic neuron
NIF automatically searches for types of
GABAergic neurons
Types of GABAergic
neurons
28. Challenges of multiscale neurodegenerative
disease phenotypes
•Neurodegenerative diseases target very specific cell
populations
•Model systems only replicate a subset of features of the
disease
•Related phenotypes occur across anatomical scales
•Different vocabularies are used by different communities
not
not
Midbrain degenerated
Substantianigra decreased
in volume
Substantianigra pars
compacta atrophied
Loss of Snpcdopaminergic
neurons
Degeneration of nigrostriatal
terminals
Tyrosine-hydroxylase containing
neurons degenerate
29. Approach: Use ontologies to provide necessary
knowledge for matching related phenotypes
Sarah Maynard, Chris Mungall,
Suzie Lewis, Fahim Imam
Midbrain
Substantianigr
a
Substantianigra pars
compacta
Substantianigra pars
compacta dopamine
cell
Dopamine
Neuron cell
soma
Neuron (CL)
Part of neuron
(GO)
Small molecule
(Chebi)
Atrophied
Decreased
volume
Fewer in
number
Degenerate
Decreased in magnitude
relative to some normal
Has part
Has part
Is part
of
Has part
Has part
Is a
Is a Is a
Is a
Entities
Qualities
NIFSTD/PKB
OBO ontology
31. OBD: Ontology based database
Provides a user
interface for matching
organisms based on
similarity of
phenotypes
Based on EQ model
Uses knowledge in the
ontology to compute
similarity scores and
other statistical
measures like
information content
http://www.berkeleybop.org/pkb/
Chris Mungall, Suzanna Lewis, Lawrence Berkeley
Labs
33. *B6CBA-TgN (HDexon1)62) that express exon1 of the human mutant HD gene- Li et al., J
Neurosci, 21(21):8473-8481
PhenoSim: What organism is most similar to a human
with Huntington’s disease?
Putamen atrophied
Globuspallidusneuropil
degenerate
Part of basal ganglia
decreased in
magnitude
Fewer neostriatum
medium spiny neurons in
putamen
Neurons in striatum
degenerate
Neuron in striatum
decreased in
magnitude
Increased number of
astrocytes in caudate
nucleus
Neurons in striatum
degenerate
Nervous system cell
change in number in
striatum
34. Progressive enrichment
Understanding and comparing phenotypes will be enriched through community
knowledge bases like Neurolex
Looking forward to continuing this as part of the Monarch project with Melissa
Haendel, Chris Mungall and Suzie Lewis
35. Top Down vs Bottom up
Top-down ontology construction
• A select few authors have write privileges
• Maximizes consistency of terms with each other (automated consistency
checking)
• Making changes requires approval and re-publishing
•Works best when domain to be organized has: small corpus, formal categories,
stable entities, restricted entities, clear edges.
•Works best with participants who are: expert catalogers, coordinated users, expert
users, people with authoritative source of judgment
Bottom-up ontology construction
• Multiple participants can edit the ontology instantly (many eyes to correct errors)
• Semantics are limited to what is convenient for the domain
• Not a replacement for top-down construction; sometimes necessary to increase flexibility
• Necessary when domain has: large corpus, no formal categories, no clear edges
•Necessary when participants are: uncoordinated users, amateur users, naïve catalogers
• Neuroscience is a domain that is less formal and neuroscientists are more uncoordinated
NIFSTD
NEUROLEX
Important for Ontologists to define community contribution model
36. It’s a messy ecosystem (and that’s OK)
NIF favors a hybrid, tiered,
federated system
Domain knowledge
Ontologies
Claims about results
Virtuoso RDF triples
Data
Data federation
Workflows
Narrative
Full text access
Neuron Brain part Disease
Organism Gene
Caudate projects to
Snpc Grm1 is upregulated in
chronic cocaine
Betz cells
degenerate in ALS
37. Musings from the NIF
No one can be stopped from doing what they need to do
Every resource is resource limited: few have enough time,
money, staff or expertise required to do everything they would
like
If the market can support 11 MRI databases, fine
Some consolidation, coordination is warranted though
Big, broad and messy beats small, narrow and neat
Without trying to integrate a lot of data, we will not know what needs to be done
A lot can be done with messy data; neatness helps though
Progressive refinement; addition of complexity through layers
Be flexible and opportunistic
A single optimal technology/container for all types of scientific data and
information does not exist; technology is changing
Think globally; act locally:
No source, not even NIF, isTHE source; we are all a source
38. Grabbing the long tail of small
data
Analysis of NIF shows
multiple databases with
similar scope and content
Many contain partially
overlapping data
Data “flows” from one
resource to the next
Data is reinterpreted,
reanalyzed or added to
Is duplication good or bad?
39. Same data: different analysis
Chronic vs acute
morphine in striatum
Drug Related Gene database:
extracted statements from
figures, tables and supplementary
data from published article
Gemma: Reanalyzed microarray
results from GEO using different
algorithms
Both provide results of increased
or decreased expression as a
function of experimental
paradigm
4 strains of mice
3 conditions: chronic morphine,
acute morphine, saline Mined NIF for all references to GEO
ID’s: found small number where the
same dataset was represented in two
or more databases
http://www.chibi.ubc.ca/Gemma/home.html
40. How easy was it to compare?
Gemma: Gene ID + Gene Symbol
DRG: Gene name + Probe ID
Gemma: Increased expression/decreased expression
DRG: Increased expression/decreased expression
But...Gemma presented results relative to baseline chronic morphine; DRG with
respect to saline, so direction of change is opposite in the 2 databases
Analysis:
1370 statements from Gemma regarding gene expression as a function of
chronicmorphine
617 were consistent with DRG; over half of the claims of the paper were not
confirmed in this analysis
Results for 1 gene were opposite in DRG and Gemma
45 did not have enough information provided in the paper to make a judgment
NIF annotation
standard
41. Beware of False Dichotomies
Top-down vs bottom up
Light weight vs heavy weight
“Chaotic Nihilists and Semantic Idealists”
Text mining vs annotation
Curators vs scientists
Human vs machine
DOI’svsURI’s
http://www.datanami.com/datanami/2013-02-
05/chaotic_nihilists_and_semantic_idealists.html
42. NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Interim PI
AmarnathGupta, UCSD, Co Investigator
Anita Bandrowski, NIF Project Leader
Gordon Shepherd,Yale University
Perry Miller
Luis Marenco
RixinWang
DavidVan Essen,Washington University
Erin Reid
Paul Sternberg, CalTech
ArunRangarajan
Hans Michael Muller
Yuling Li
GiorgioAscoli,George Mason University
SrideviPolavarum
Fahim Imam, NIF Ontology Engineer
Larry Lui
Andrea Arnaud Stagg
Jonathan Cachat
Jennifer Lawrence
Lee Hornbrook
Binh Ngo
VadimAstakhov
XufeiQian
Chris Condit
Mark Ellisman
Stephen Larson
WillieWong
TimClark, Harvard University
Paolo Ciccarese
Karen Skinner, NIH, Program Officer
Editor's Notes
Doesn’t do it well; doesn’t organize the results in a domain specific way; doesn’t search across itFor use as content goal Dynamic inventory for deep coverage of neuroscience data: Genes -> Systems
What animal models show
NIFSTD and PATO ontologies served as building blocks to build a phenotype model the ontologies provide relationships between neuroscience related terms provide a structure to qualities and allow related qualities to show relationships
Need an interface to explore and ask questions. Cannot view as a graph. Need to be able to ask a question not in SPARQL and get an answer. Need a better interface to put things in. Discuss Neurolex and PKB. Doesn’t have to be perfect interface, but has to allow a domain expert to ask and answer questions..
Indirect matches that match due to hierarchiesNOTE: should make diagram in the style of previous slides (not screenshot)
In validating our results, we see three types of matches.The first are direct matchesNOTE: should make diagram in the style of previous slides (not screenshot)