The document discusses the challenges of managing and analyzing the large amounts of neuroscience data being generated. It notes that currently, about half of researchers only store their data locally in their labs instead of in shared databases or archives. This prevents other researchers from accessing and using the data. The National Information Forum (NIF) is working to address these issues by creating a registry of neuroscience resources and developing technologies to allow researchers to discover, share, analyze and integrate data from various sources. NIF's registry currently catalogs over 6000 resources, including 2200 databases. The goal is for NIF to help the neuroscience community better exploit existing data and prepare for future increases in data.
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
Overview of the Neuroscience Information Framework and how it brings together data, in the form of distributed databases, and knowledge, in the form of ontologies to show the mapping of the dataspace and places where there are mismatches between data and knowledge.
the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
Overview of the Neuroscience Information Framework and how it brings together data, in the form of distributed databases, and knowledge, in the form of ontologies to show the mapping of the dataspace and places where there are mismatches between data and knowledge.
the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
Next Steps for IMLS's National Digital PlatformTrevor Owens
This keynote, at the Upper Midwest Digital Collections Conference, provides and update on the National Digital Platform and 20 projects supported to enhance it. The national digital platform is a way of thinking about and approaching the digital capability and capacity of libraries across the US. In this sense, it is the combination of software applications, social and technical infrastructure, and staff expertise that provide library content and services to all users in the US. As libraries increasingly use digital infrastructure to provide access to digital content and resources, there are more and more opportunities for collaboration around the tools and services that they use to meet their users’ needs. It is possible for each library in the country to leverage and benefit from the work of other libraries in shared digital services, systems, and infrastructure.
We need to bridge gaps between disparate pieces of the existing digital infrastructure, for increased efficiencies, cost savings, access, and services. To this end, IMLS is focusing on the national digital platform as an area of priority in the National Leadership Grants to Libraries program and the Laura Bush 21st Century Librarian program. We are eager to explore how this way of thinking and approaching infrastructure development can help states make the best use of the funds they receive through the Grants to States program. We’re also eager to work with other foundations and funders to maximize the impact of our federal investment
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
A presentation focusing on the data analysis OCLC Research performed on 900K museum records, plus next steps for the nine project museums who now have the capacity to share standards-based records.
The Seven Deadly Sins of BioinformaticsDuncan Hull
Keynote talk at Bioinformatics Open Source Conference (BOSC) Special Interest Group at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2007) in Vienna, July 2007 by Carole Goble, University of Manchester.
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
Towards collaboration at scale: Libraries, the social and the technicallisld
Libraries are now supporting research and learning behaviors in data rich network environments. This presentation looks at some examples focusing on how an emphasis on individual systems needs to give way to a broader view of process, workflow and behaviors.
It also discusses how this environment creates a demand for collaboration at scale among libraries.
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
Presentation of "g-Social - Enhancing e-Science Tools with Social Networking Functionality" given at the Workshop on Analyzing and Improving Collaborative eScience with Social Networks, Chicago October 8th, 2012. Co-located with IEEE eScience 2012.
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
This presentation discusses how a model of “data sharing as publishing” can contribute to developing Linked Open Data resources in archaeology and the study of the ancient world. The paper gives examples from Open Context’s developing approach to data editing, documentation and quality improvement processes. The goal of these efforts is to better align the professional interests of individual researchers with the needs of the larger community to access and use high-quality data in Linked Data scenarios.
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
Next Steps for IMLS's National Digital PlatformTrevor Owens
This keynote, at the Upper Midwest Digital Collections Conference, provides and update on the National Digital Platform and 20 projects supported to enhance it. The national digital platform is a way of thinking about and approaching the digital capability and capacity of libraries across the US. In this sense, it is the combination of software applications, social and technical infrastructure, and staff expertise that provide library content and services to all users in the US. As libraries increasingly use digital infrastructure to provide access to digital content and resources, there are more and more opportunities for collaboration around the tools and services that they use to meet their users’ needs. It is possible for each library in the country to leverage and benefit from the work of other libraries in shared digital services, systems, and infrastructure.
We need to bridge gaps between disparate pieces of the existing digital infrastructure, for increased efficiencies, cost savings, access, and services. To this end, IMLS is focusing on the national digital platform as an area of priority in the National Leadership Grants to Libraries program and the Laura Bush 21st Century Librarian program. We are eager to explore how this way of thinking and approaching infrastructure development can help states make the best use of the funds they receive through the Grants to States program. We’re also eager to work with other foundations and funders to maximize the impact of our federal investment
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
A presentation focusing on the data analysis OCLC Research performed on 900K museum records, plus next steps for the nine project museums who now have the capacity to share standards-based records.
The Seven Deadly Sins of BioinformaticsDuncan Hull
Keynote talk at Bioinformatics Open Source Conference (BOSC) Special Interest Group at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2007) in Vienna, July 2007 by Carole Goble, University of Manchester.
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
Towards collaboration at scale: Libraries, the social and the technicallisld
Libraries are now supporting research and learning behaviors in data rich network environments. This presentation looks at some examples focusing on how an emphasis on individual systems needs to give way to a broader view of process, workflow and behaviors.
It also discusses how this environment creates a demand for collaboration at scale among libraries.
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
Presentation of "g-Social - Enhancing e-Science Tools with Social Networking Functionality" given at the Workshop on Analyzing and Improving Collaborative eScience with Social Networks, Chicago October 8th, 2012. Co-located with IEEE eScience 2012.
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
Ora Lassila and Amit Sheth, "Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Interoperability", Invited Talk at ONC-HHS Invitational Workshop on Next Generation Interoperability for Health, Washington DC, January 19-20, 2011.
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
This presentation discusses how a model of “data sharing as publishing” can contribute to developing Linked Open Data resources in archaeology and the study of the ancient world. The paper gives examples from Open Context’s developing approach to data editing, documentation and quality improvement processes. The goal of these efforts is to better align the professional interests of individual researchers with the needs of the larger community to access and use high-quality data in Linked Data scenarios.
Anita Bandrowski explains how the uniform resource layer of the Neuroscience Information Framework allows several interesting questions about the state of scientific research to be answered.
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014
Maryann Martone, Principal Investigator, Neuroscience Information Framework, University of California, San Diego
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
Overview of how to use the Neuroscience Information Framework for data discovery presented at the Genetics of Addiction Workshop, held at Jackson Lab Aug 28- Sept 1, 2014.
A description of software as infrastructure at NSF, and how Apache projects may be similar. What lessons can be shared from one organization to the other? How does science software compare with more general software?
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
The NIDDK Information Network (dkNET; http://dknet.org) is a open community resource for basic and clinical investigators in metabolic, digestive and kidney disease. dkNET’s portal facilitates access to a collection of diverse research resources (i.e. the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain) that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). This webinar was presented by dkNET principle investigator Dr. Jeffrey Grethe.
Enabling knowledge management in the Agronomic DomainPierre Larmande
This talk will focus mainly on, ongoing projects at the Institute of Computational Biology
Agronomic Linked Data (AgroLD): is a Semantic Web knowledge base designed to integrate data from various publically available plant centric data sources.
GIGwA: is a tool developed to manage genomic, transcriptomic and genotyping large data resulting from NGS analyses.
De-centralized but global: Redesigning biodiversity data aggregation for impr...taxonbytes
Biodiversity data pose fundamental challenges for unification-based paradigms of data science. In particular, a hierarchical, backbone-driven approach to aggregating global biodiversity data tends to limit community engagement. Data quality, trust, fitness for use, and impact are similarly reduced. This presentation will outline an alternative, de-centralized design for aggregating biodiversity data globally. The design requires a coordinative approach to representing and reconciling evolving systematic perspectives, and further social but technologically mediated coordination between regionally and taxonomically constrained "communities of practice" (sensu Wenger, 2000, https://doi.org/10.1177/135050840072002). Important next steps in this direction include the development of use cases that quantify the benefits of a de-centralized biodiversity data aggregation - in terms of lowering costs to expert engagement, raising efficiency of curation, validating novel integration services, and improving reproducibility and provenance tracking across heterogenous data structures and portals.
Vince smith-delivering biodiversity knowledge in the information age-notextVince Smith
Smith, V.S. 2013. Delivering biodiversity knowledge in the information age. Hellenic Botanical Society, Thessaloniki, Greece, 3-6 Oct. 2013. [Delivered via video link through Google Hangouts]
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
"Building a Sustainable Knowledge Base for the Marine Protected Areas Monitoring Enterprise" a presentation to the California Ocean Science Trust, Oakland, California March 16, 2010
Maryann Martone
Making Sense of Biological Systems: Using Knowledge Mining to Improve and Validate Models of Living Systems; NIH COBRE Center for the Analysis of Cellular Mechanisms and Systems Biology, Montana State University, Bozeman, MT
August 24, 2012
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Neurosciences Information Framework (NIF): An example of community Cyberinfrastructure for the Neurosciences
1. Maryann
E.
Martone,
Ph.
D.
University
of
California,
San
Diego
2. “A
grand
challenge
in
neuroscience
is
to
elucidate
brain
func>on
in
rela>on
to
its
mul>ple
layers
of
organiza>on
that
operate
at
different
spa>al
and
temporal
scales.
Central
to
this
effort
is
tackling
“neural
choreography”
-‐-‐
the
integrated
func>oning
of
neurons
into
brain
circuits-‐-‐
Neural
choreography
cannot
be
understood
via
a
purely
reduc>onist
approach.
Rather,
it
entails
the
convergent
use
of
analy>cal
and
synthe>c
tools
to
gather,
analyze
and
mine
informa>on
from
each
level
of
analysis,
and
capture
the
emergence
of
new
layers
of
func>on
(or
dysfunc>on)
as
we
move
from
studying
genes
and
proteins,
to
cells,
circuits,
thought,
and
behavior....
However,
the
neuroscience
community
is
not
yet
fully
engaged
in
exploi;ng
the
rich
array
of
data
currently
available,
nor
is
it
adequately
poised
to
capitalize
on
the
forthcoming
data
explosion.
“
Akil
et
al.,
Science,
Feb
11,
2011
3. • In
that
same
issue
of
Science
– Asked
peer
reviewers
from
last
year
about
the
availability
and
use
of
data
• About
half
of
those
polled
store
their
data
only
in
their
laboratories—not
an
ideal
long-‐term
solu>on.
• Many
bemoaned
the
lack
of
common
metadata
and
archives
as
a
main
impediment
to
using
and
storing
data,
and
most
of
the
respondents
have
no
funding
to
support
archiving
• And
even
where
accessible,
much
data
in
many
fields
is
too
poorly
organized
to
enable
it
to
be
efficiently
used.
“...it
is
a
growing
challenge
to
ensure
that
data
produced
during
the
course
of
reported
research
are
appropriately
described,
standardized,
archived,
and
available
to
all.”
Lead
Science
editorial,
2011
4. Neuroscience
is
unlikely
to
be
served
by
a
few
large
databases
like
the
genomics
and
proteomics
community
Whole
brain
data
(20
um
microscopic
MRI)
Mosiac
LM
images
(1
GB+)
Conven>onal
LM
images
Individual
cell
morphologies
EM
volumes
&
reconstruc>ons
Solved
molecular
structures
No
single
technology
serves
these
all
equally
well.
Mul6ple
data
types;
mul6ple
scales;
mul6ple
databases
6. • Current
web
is
designed
to
share
documents
– Documents
are
unstructured
data
• Much
of
the
content
of
digital
resources
is
part
of
the
“hidden
web”
• Wikipedia:
The
Deep
Web
(also
called
Deepnet,
the
invisible
Web,
DarkNet,
Undernet
or
the
hidden
Web)
refers
to
World
Wide
Web
content
that
is
not
part
of
the
Surface
Web,
which
is
indexed
by
standard
search
engines.
7. • NIF
has
developed
a
produc>on
technology
pla]orm
for
researchers
to:
– Discover
– Share
– Analyze
– Integrate
neuroscience-‐relevant
informa>on
• Since
2008,
NIF
has
assembled
the
largest
searchable
catalog
of
neuroscience
data
and
resources
on
the
web
• Cost-‐effec>ve
and
innova>ve
strategy
for
managing
data
assets
“This
unique
data
depository
serves
as
a
model
for
other
Web
sites
to
provide
research
data.
“
-‐
Choice
Reviews
Online
NIF
is
poised
to
capitalize
on
the
new
tools
and
emphasis
on
big
data
and
open
science
8. h?p://neuinfo.org
June10,
2013
dkCOIN
Inves>gator's
Retreat
8
• A
portal
for
finding
and
using
neuroscience
resources
A
consistent
framework
for
describing
resources
Provides
simultaneous
search
of
mul>ple
types
of
informa>on,
organized
by
category
Supported
by
an
expansive
ontology
for
neuroscience
U>lizes
advanced
technologies
to
search
the
“hidden
web”
UCSD,
Yale,
Cal
Tech,
George
Mason,
Washington
Univ
Literature
Database
Federa>on
Registry
9. • NIF
Registry:
A
catalog
of
neuroscience-‐
relevant
resources
• >
6000
currently
listed
• >
2200
databases
• And
we
are
finding
more
every
day
“Of
relevance
to
neuroscience”
is
very
broad
10. dkCOIN
Inves>gator's
Retreat
10
• NIF
curators
• Nomina>on
by
the
community
• Semi-‐automated
text
mining
pipelines
NIF
Registry
Requires
no
special
skills
Site
map
available
for
local
hos>ng
• NIF
Data
Federa>on
• DISCO
interop
• Requires
some
programming
skill
Low
barrier
to
entry
11. • Extended
over
>me
– Parent
resource
– Suppor>ng
agency
– Grant
numbers
– Accessibility
– Related
to
– Organism
– Disease
or
condi>on
– Last
updated
First
catalog:
SFN
Neuroscience
Database
Gateway
NIF
0.5
NIF
1.0+
Simple
metadata
model
Name,
descrip>on,
type,
URL,
other
names,
keywords,
unique
iden>fier
~2003
2006
2008
12. 12
• NIF
Registry
is
hosted
on
Seman>c
Media
Wiki
pla]orm
Neurolex
– Community
can
add,
review,
edit
without
special
privileges
– Searchable
by
Google
– Integrated
with
NIF
ontologies
– Graph
structure
Seman>c
wiki:
A
wiki
with
seman>cs;
pages
are
linked
through
rela>onships
14. – NIF
employs
an
automated
link
checker
– Last
analysis:
478/6100
invalid
URL’s
(~8%)
– 199
can’t
locate
at
another
university
or
loca>on
out
of
service
(~3%)
– Bigger
issue:
Many
resources
are
no
longer
updated
or
maintained
0
20
40
60
80
100
120
140
160
180
200
1996
1998
2000
2002
2004
2006
2008
2010
2012
2014
0
500
1000
1500
2000
2500
3000
3500
Resources
added
Last
updated
15. Keeping
content
up
to
date
Connectome
Tractography
Epigene>cs
• New
tags
come
into
existence
• New
resource
types
come
into
existence,
e.g.,
Mobile
apps
• Resources
add
new
types
of
content
• Change
name
• Change
scope
• >
7000
updates
to
the
registry
last
year
It’s
a
challenge
to
keep
the
registry
up
to
date;
sitemaps,
cura>on,
ontologies,
community
review
16. • The
NIF
Registry
has
created
a
linked
data
graph
of
web-‐accessible
resources
• Maintained
on
a
community
wiki
pla]orm
• Provides
data
on
the
fluidity
of
the
resource
landscape
– New
resources
con>nue
to
be
created
and
found
– Rela>vely
few
disappear
altogether
– Many
more
grow
stale,
although
their
value
may
s>ll
be
significant
– Maintaining
up
to
date
cura>on
requires
frequent
upda>ng
NIF
Registry
provides
insight
into
the
state
of
digital
resources
on
the
web
17. • The
NIF
data
federa>on
performs
deep
search
over
the
content
of
over
200
databases
• New
databases
are
added
at
a
rate
of
25-‐40
per
year
• Latest
update:
Open
Source
Brain;
ingest
completed
in
2
hours
• Databases
chosen
on
a
variety
of
criteria:
• Early:
tes>ng
different
types
of
resources
• Thema>c
areas
• Volunteers
NIF
provides
access
to
the
largest
aggrega>on
of
neuroscience-‐relevant
informa>on
on
the
web
18. • NIF
was
one
of
the
first
projects
to
aZempt
data
integra>on
in
the
neurosciences
on
a
large
scale
• NIF
is
supported
by
a
contract
that
specified
the
number
of
resources
to
be
added
per
year
– Designed
to
be
populated
rapidly;
set
up
process
for
progressive
refinement
– No
budget
was
allocated
to
retrofit
exis>ng
resources;
had
to
work
with
them
in
their
current
state
– We
designed
a
system
that
required
liZle
to
no
coopera>on
or
work
from
providers
– Supports
many
formats:
rela>onal,
XML,
RDF
19. Current
Planned
DISCO
Dashboard
Func6ons
• Ingest
Script
Manager
• Public
Script
Repository
• Data
&
Event
Tracker
• Versioning
System
• Curator
Tool
• Data
Transformer
Manager
June10,
2013
dkCOIN
Inves>gator's
Retreat
19
Luis
Marenco,
Rixin
Wang,
Perrry
Miller,
Gordon
Shepherd
Yale
University
20. 0
50
100
150
200
250
0.01
0.1
1
10
100
1000
6-‐12
12-‐12
7-‐13
1-‐14
8-‐14
2-‐15
9-‐15
4-‐16
10-‐16
5-‐17
Number
of
Federated
Databases
Number
of
Federated
Records
(Millions)
NIF
searches
the
largest
colla>on
of
neuroscience-‐relevant
data
on
the
web
DISCO
June10,
2013
dkCOIN
Inves>gator's
Retreat
20
22. Hippocampus
OR
“Cornu
Ammonis”
OR
“Ammon’s
horn”
Query
expansion:
Synonyms
and
related
concepts
Boolean
queries
Data
sources
categorized
by
“data
type”
and
level
of
nervous
system
Common
views
across
mul>ple
sources
Tutorials
for
using
full
resource
when
gewng
there
from
NIF
Link
back
to
record
in
original
source
23. Connects
to
Synapsed
with
Synapsed
by
Input
region
innervates
Axon
innervates
Projects
to
Cellular
contact
Subcellular
contact
Source
site
Target
site
Each
resource
implements
a
different,
though
related
model;
systems
are
complex
and
difficult
to
learn,
in
many
cases
24. • NIF
Connec>vity:
7
databases
containing
connec>vity
primary
data
or
claims
from
literature
on
connec>vity
between
brain
regions
• Brain
Architecture
Management
System
(rodent)
• Temporal
lobe.com
(rodent)
• Connectome
Wiki
(human)
• Brain
Maps
(various)
• CoCoMac
(primate
cortex)
• UCLA
Mul>modal
database
(Human
fMRI)
• Avian
Brain
Connec>vity
Database
(Bird)
• Total:
1800
unique
brain
terms
(excluding
Avian)
• Number
of
exact
terms
used
in
>
1
database:
42
• Number
of
synonym
matches:
99
• Number
of
1st
order
partonomy
matches:
385
25. – You
(and
the
machine)
have
to
be
able
to
find
it
• Accessible
through
the
web
• Annota>ons
– You
have
to
be
able
to
access
and
use
it
• Data
type
specified
and
in
a
usable
form
– You
have
to
know
what
the
data
mean
• Some
seman>cs:
“1”
• Context:
Experimental
metadata
• Provenance:
Where
did
the
data
come
from?
Repor>ng
neuroscience
data
within
a
consistent
framework
helps
enormously
26. Knowledge
in
space
and
spa>al
rela>onships
(the
“where”)
Knowledge
in
words,
terminologies
and
logical
rela>onships
(the
“what”)
27. • NIF
covers
mul>ple
structural
scales
and
domains
of
relevance
to
neuroscience
• Aggregate
of
community
ontologies
with
some
extensions
for
neuroscience,
e.g.,
Gene
Ontology,
Chebi,
Protein
Ontology
NIFSTD
Organism
NS
Func>on
Molecule
Inves>ga>on
Subcellular
structure
Macromolecule
Gene
Molecule
Descriptors
Techniques
Reagent
Protocols
Cell
Resource
Instrument
Dysfunc>on
Quality
Anatomical
Structure
NIF
capitalizes
on
the
growing
set
of
community
ontologies
available
in
biomedical
science
28. Purkinje
Cell
Axon
Terminal
Axon
Dendri>c
Tree
Dendri>c
Spine
Dendrite
Cell
body
Cerebellar
cortex
There
is
liZle
obvious
connec>on
between
data
sets
taken
at
different
scales
using
different
microscopies
without
an
explicit
representa>on
of
the
biological
objects
that
the
data
represent
29. Brain
Cerebellum
Purkinje
Cell
Layer
Purkinje
cell
neuron
has
a
has
a
has
a
is
a
• Ontology:
an
explicit,
formal
representa>on
of
concepts
rela>onships
among
them
within
a
par>cular
domain
that
expresses
human
knowledge
in
a
machine
readable
form
– Branch
of
philosophy:
a
theory
of
what
is
– e.g.,
Gene
ontologies
• Provide
universals
for
naviga>ng
across
different
data
sources
– Seman>c
“index”
• Provide
the
basis
for
concept-‐based
queries
to
probe
and
mine
data
– Perform
reasoning
– Link
data
through
rela>onships
not
just
one-‐
to-‐one
mappings
30. “Search
compu6ng”
What
genes
are
upregulated
by
drugs
of
abuse
in
the
adult
mouse?
Morphine
Increased
expression
Adult
Mouse
Some
concepts,
e.g.,
age
category,
are
quan>ta>ve
but
s>ll
must
be
interpreted
in
a
global
query
system
33. hZp://neurolex.org
Stephen
Larson
• Provide
a
simple
interface
for
defining
the
concepts
required
• Light
weight
seman>cs
• Good
teaching
tool
for
learning
about
seman>c
integra>on
and
the
benefits
of
a
consistent
seman>c
framework
• Community
based:
• Anyone
can
contribute
their
terms,
concepts,
things
• Anyone
can
edit
• Anyone
can
link
• Accessible:
searched
by
Google
• Growing
into
a
significant
knowledge
base
for
neuroscience
Demo
D03
200,000
edits
150
contributors
34. • NIF
can
be
used
to
survey
the
data
landscape
• Analysis
of
NIF
shows
mul>ple
databases
with
similar
scope
and
content
• Many
contain
par>ally
overlapping
data
• Data
“flows”
from
one
resource
to
the
next
– Data
is
reinterpreted,
reanalyzed
or
added
to
• Is
duplica>on
good
or
bad?
35. Databases
come
in
many
shapes
and
sizes
• Primary
data:
– Data
available
for
reanalysis,
e.g.,
microarray
data
sets
from
GEO;
brain
images
from
XNAT;
microscopic
images
(CCDB/CIL)
• Secondary
data
– Data
features
extracted
through
data
processing
and
some>mes
normaliza>on,
e.g,
brain
structure
volumes
(IBVD),
gene
expression
levels
(Allen
Brain
Atlas);
brain
connec>vity
statements
(BAMS)
• Ter>ary
data
– Claims
and
asser>ons
about
the
meaning
of
data
• E.g.,
gene
upregula>on/
downregula>on,
brain
ac>va>on
as
a
func>on
of
task
• Registries:
– Metadata
– Pointers
to
data
sets
or
materials
stored
elsewhere
• Data
aggregators
– Aggregate
data
of
the
same
type
from
mul>ple
sources,
e.g.,
Cell
Image
Library
,SUMSdb,
Brede
• Single
source
– Data
acquired
within
a
single
context
,
e.g.,
Allen
Brain
Atlas
Researchers
are
producing
a
variety
of
informa>on
ar>facts
using
a
mul>tude
of
technologies
36. NIF
Analy6cs:
The
Neuroscience
Landscape
NIF
is
in
a
unique
posi>on
to
answer
ques>ons
about
the
neuroscience
landscape
Where
are
the
data?
Striatum
Hypothalamus
Olfactory
bulb
Cerebral
cortex
Brain
Brain
region
Data
source
Vadim
Astakhov,
Kepler
Workflow
Engine
37. Diseases
of
nervous
system
Adding
more
seman6cs
The
combina>on
of
ontologies,
diverse
data
and
analy>cs
lets
us
look
at
the
current
landscape
in
interes>ng
ways
Neurodegenera>ve
Seizure
disorders
Neoplas>c
disease
of
nervous
system
NIH
Reporter
NIF
data
federated
sources
38. • Gemma:
Gene
ID
+
Gene
Symbol
• DRG:
Gene
name
+
Probe
ID
• Gemma
presented
results
rela>ve
to
baseline
chronic
morphine;
DRG
with
respect
to
saline,
so
direc>on
of
change
is
opposite
in
the
2
databases
•
Analysis:
• 1370
statements
from
Gemma
regarding
gene
expression
as
a
func>on
of
chronic
morphine
• 617
were
consistent
with
DRG;
over
half
of
the
claims
of
the
paper
were
not
confirmed
in
this
analysis
• Results
for
1
gene
were
opposite
in
DRG
and
Gemma
• 45
did
not
have
enough
informa>on
provided
in
the
paper
to
make
a
judgment
Rela>vely
simple
standards
would
make
life
easier
39. NIF
favors
a
hybrid,
>ered,
federated
system
• Domain
knowledge
– Ontologies
• Claims,
models
and
observa>ons
– Virtuoso
RDF
triples
– Model
repositories
• Data
– Data
federa>on
– Spa>al
data
– Workflows
• Narra>ve
– Full
text
access
Neuron
Brain
part
Disease
Organism
Gene
Caudate
projects
to
Snpc
Grm1
is
upregulated
in
chronic
cocaine
Betz
cells
degenerate
in
ALS
NIF
provides
the
tentacles
that
connect
the
pieces:
a
new
type
of
en>ty
for
21st
century
science
Technique
People
40. • 2006-‐2008:
A
survey
of
what
was
out
there
• 2008-‐2009:
Strategy
for
resource
discovery
– NIF
Registry
vs
NIF
data
federa>on
– Inges>on
of
data
contained
within
different
technology
pla]orms,
e.g.,
XML
vs
rela>onal
vs
RDF
– Effec>ve
search
across
seman>cally
diverse
sources
• NIFSTD
ontologies
• 2009-‐2011:
Strategy
for
data
integra>on
– Unified
views
across
common
sources
– Mapping
of
content
to
NIF
vocabularies
• 2011-‐present:
Data
analy>cs
– Uniform
external
data
references
• 2012-‐present:
SciCrunch:
unified
biomedical
resource
services
NIF
provides
a
strategy
and
set
of
tools
applicable
to
all
domains
grappling
with
mul>ple
sources
of
diverse
data
(i.e.,
preZy
much
everything)
41. • Search
seman>cs
• Ranking
• Resources
supported
by
NIH
Blueprint
Ins>tutes
are
more
thoroughly
covered
• Data
types,
e.g.,
Brain
ac>va>on
foci
June10,
2013
dkCOIN
Inves>gator's
Retreat
41
42. June10,
2013
42
SciCrunch
NIF
MONARCH
Community
Services
dkCOIN
Shared
Resources
Undiagnosed
Disease
Program
Phenotype
RCN
3D
Virtual
Cell
Na>onal
Ins>tute
on
Aging
One
Mind
for
Research
BIRN
Interna>onal
Neuroinforma>cs
Coordina>ng
Facility
Model
Organism
Databases
Community
Outreach
DELSA
(not
just
a
data
catalog)
43. 43
• 3dVC:
Focus
on
models
and
simula>on
• Gene
Ontology:
Focus
on
bioinforma>cs
tools
• Na>onal
Ins>tute
on
aging:
Aging-‐
related
data
sets
• Monarch:
Phenotype-‐Genotype;
deep
seman>c
data
integra>on
• One
Mind
for
Research:
Biospecimen
repositories
• NeuroGateway:
Computa>onal
resources
• FORCE11:
Tools
for
next-‐gen
publishing
and
e-‐scholarship
SciCrunch
SciCrunch
is
ac>vely
suppor>ng
mul>ple
communi>es;
mul>ple
communi>es
are
enriching
and
improving
SciCrunch
44. Community
database:
beginning
Community
database:
End
“How
do
I
share
my
data/tool?”
“There
is
no
database
for
my
data”
1
2
3
4
Ins3tu3onal
repositories
Cloud
INCF:
Global
infrastructure
Government
Educa>on
Industry
NIF
is
designed
to
leverage
exis>ng
investments
in
resources
and
infrastructure
Tool
repositories
45. • No
one
can
be
stopped
from
doing
what
they
need
to
do
• Every
resource
is
resource
limited:
few
have
enough
>me,
money,
staff
or
exper>se
required
to
do
everything
they
would
like
– If
the
market
can
support
11
MRI
databases,
fine
– Some
consolida>on,
coordina>on
is
warranted
though
• Big,
broad
and
messy
beats
small,
narrow
and
neat
– Without
trying
to
integrate
a
lot
of
data,
we
will
not
know
what
needs
to
be
done
– A
lot
can
be
done
with
messy
data;
neatness
helps
though
– Progressive
refinement;
addi>on
of
complexity
through
layers
• Be
flexible
and
opportunis>c
– A
single
op>mal
technology/container
for
all
types
of
scien>fic
data
and
informa>on
does
not
exist;
technology
is
changing
• Think
globally;
act
locally:
– No
source,
not
even
NIF,
is
THE
source;
we
are
all
a
source
46. • Several
powerful
trends
should
change
the
way
we
think
about
our
data:
One
Many
– Many
data
• Genera>on
of
data
is
gewng
easier
shared
data
• Data
space
is
gewng
richer:
more
–omes
everyday
• But...compared
to
the
biological
space,
s>ll
sparse
– Many
eyes
• Wisdom
of
crowds
• More
than
one
way
to
interpret
data
– Many
algorithms
• Not
a
single
way
to
analyze
data
– Many
analy>cs
• “Signatures”
in
data
may
not
be
directly
related
to
the
ques>on
for
which
they
were
acquired
but
tell
us
something
really
interes>ng
Are
you
exposing
or
burying
your
work?
47. Jeff
Grethe,
UCSD,
Co
Inves>gator,
Interim
PI
Amarnath
Gupta,
UCSD,
Co
Inves>gator
Anita
Bandrowski,
NIF
Project
Leader
Gordon
Shepherd,
Yale
University
Perry
Miller
Luis
Marenco
Rixin
Wang
David
Van
Essen,
Washington
University
Erin
Reid
Paul
Sternberg,
Cal
Tech
Arun
Rangarajan
Hans
Michael
Muller
Yuling
Li
Giorgio
Ascoli,
George
Mason
University
Sridevi
Polavarum
Fahim
Imam
Larry
Lui
Andrea
Arnaud
Stagg
Jonathan
Cachat
Jennifer
Lawrence
Svetlana
Sulima
Davis
Banks
Vadim
Astakhov
Xufei
Qian
Chris
Condit
Mark
Ellisman
Stephen
Larson
Willie
Wong
Tim
Clark,
Harvard
University
Paolo
Ciccarese
Karen
Skinner,
NIH,
Program
Officer
(re>red)
Jonathan
Pollock,
NIH,
Program
Officer
And
my
colleagues
in
Monarch,
dkNet,
3DVC,
Force
11