More Related Content Similar to Buried in data and starving for information (20) Buried in data and starving for information1. © 2017 Riffyn Inc hello@riffyn.com
Never Miss a Discovery
PREP MEDIA
THAW CELLS INOCULATE
ADD
TREATMENT
INCUBATE
IMMUNO
ASSAY
MEASURE
CELL COUNT
0.1954
0.2122
0.2397
0.1823 10.589
10.897
2.6349
10.732
2.5732
2.7589
11.344
2.6349
2. © 2017 Riffyn Inc hello@riffyn.com
* Error rate = R&D results that fail to reproduce, leading to wasted
resources, failed tech transfers and missed discoveries.
Life Sciences
Chemicals/Materials
Energy
Food/AG
Mining and Metals
Pulp, Paper, Textiles
Water/Wastewater
Other
0 50 100 150 200 250 300
Sources:
2013 R&D Magazine Global Funding Forecast
NSF Science and Engineering Indicators 2012
CIA World Fact Book
Prinz, et al., Believe it or not: how much can we rely on published data on potential drug targets?, Nat. Rev. Drug Disc., 2011
Begley & Ellis, Raise Standards for Preclinical Cancer Research, Nature, 483, 2012
Ioannidis & Khoury, Improving Validation Practices in “Omics” Research, Science, 334, 2011
Halford, The Second Annual State of Translational Research, Sigma-Aldrich / AAAS, 2014
Freedman, et al., The Economics of Reproducibility in Preclinical Research, PLOS Biology, 2015
R&D Spend
$420B
R&D losses
$100B>25%
error rate*
Annual R&D Spend ($B)
The problem in science today: >$100B of lost R&D productivity each year
3. © 2017 Riffyn Inc hello@riffyn.com
Poor data utilization is accepted as a price for flexibility and innovation
A lot of today’s operations are achieved through trial
and error and “tribal knowledge”
Scientist at a global ethanol producer
4. © 2017 Riffyn Inc hello@riffyn.com
“We can’t determine which variables matter.”
“Data context is only in people’s head.”
“Process transfer is unreliable.”
This is the consequence of data fragmentation in the daily life of a scientist
5. © 2017 Riffyn Inc hello@riffyn.com
Communication in science today is a bit like this
“The problem in science
today isn’t recording data,
it’s integrating methods
and results so we can do
science.”
6. © 2017 Riffyn Inc hello@riffyn.com
Today, 80% of time is wasted wrangling scientific data, not analyzing it
DATA
STRUCTURE
QUALITY
CONTROL
ANALYZE
Machine LearningExperiment
80% of a data
scientist’s effort
20% of a data
scientist’s effort
DATA WRANGLING
8. © 2017 Riffyn Inc hello@riffyn.com
For one paper, securing the necessary data took a year. And the authors of four other
papers have stopped communicating with the project altogether. Most authors were
happy to collaborate, but it has taken longer than expected to locate the relevant data.
R. Van Noorden, Sluggish data sharing hampers reproducibility effort, Nature News, 3 June 2015
http://www.nature.com/news/sluggish-data-sharing-hampers-reproducibility-effort-1.17694
This is a data wrangler
9. © 2017 Riffyn Inc hello@riffyn.com
This where scientific data analytics needs to be
80% of your time analyzing data for scientific breakthroughs, not wrangling data
DATA ANALYZE
Machine-aided Design & Learning
80% of a data
scientist’s effort
STRUCTURE
QUALITY
BY DESIGN
SCIENTIFIC
DESIGN
20% of a data
scientist’s effort
10. © 2017 Riffyn Inc hello@riffyn.com
Why are we buried in data and starving for information?
11. © 2017 Riffyn Inc hello@riffyn.com
This is how R&D is performed and communicated today
Artisanal know-how is stuck in 400-year-old concepts for scientific documentation and data
TECH TRANSFER / SCALE-UP
Ad hoc knowledge
capture/transfer between
R&D to Manufacturing
EARLY R&D
Un-computable &
ambiguous protocols
12. © 2017 Riffyn Inc hello@riffyn.com
We are starving for information because …
we cannot access computable data sets, and they’re not annotated with methodological context
From Riffyn 2014 survey of life science and chemical companies
R&D issue % with issue
Data cannot be integrated, is hard to find, or is not annotated 66%
Cannot assess the impact of process changes 52%
Data is unstructured or incomplete 48%
13. © 2017 Riffyn Inc hello@riffyn.com
The poor access to data and context means that …
errors accumulate because we can’t identify their causes, & progress is slowed
1X R&D pace
2007 2008 2009 2010
2X R&D pace
6X error reduction
Year
Reducing error 6X in screening process doubles the rate of cell line yield improvement
Source: Gardner, TS (2013) Trends in Biotech. 31:3, 123-125.
BEFORE VARIANCE
REDUCTION
AFTER VARIANCE
REDUCTION
30% relative error
5% relative error
Productyieldofastrain
14. © 2017 Riffyn Inc hello@riffyn.com
Fragmented data and accumulation of errors means that …
discoveries get lost and buried in a sea of disjointed data
Parameter X
ParameterY
Data from a single experiment
Data from a 100s of experiments
Linking data across experiments identifies critical process parameters
and unexpected correlations
15. © 2017 Riffyn Inc hello@riffyn.com
How do we find information in our data?
16. © 2017 Riffyn Inc hello@riffyn.com
It starts by recognizing today’s scientific paradigm is missing ”blueprints”
Thus the most relevant methodological data is stuck in people’s heads
“We work on what we know, not what matters.”
17. © 2017 Riffyn Inc hello@riffyn.com
A metaphor: evolution of geographic information systems toward CAD
c. 1700s c. 1800s
18. © 2017 Riffyn Inc hello@riffyn.com
Geographic information systems
Digital, visual, living documents with integrated data -> a foundation for AI
c. 2000s (Maps) c. 2010s (AI)
19. © 2017 Riffyn Inc hello@riffyn.com
Never Miss a Discovery
The Riffyn Scientific Development Environment (SDE) is a cloud-
based application for reproducible science and faster discovery
Design, execute, and share scientific experiments as visual,
computable data sets for interactive analysis and machine learning
20. © 2017 Riffyn Inc hello@riffyn.com
Riffyn is transforming scientific “spreadsheet hell” …
21. © 2017 Riffyn Inc hello@riffyn.com
…to digital scientific process maps & linked data for statistical learning
22. © 2017 Riffyn Inc hello@riffyn.com
Riffyn Scientific Development Environment
Flexible process design, analysis and improvement across the scientific development lifecycle
Riffyn MapTM
Where you design your
processes, workflows &
experiments
Riffyn TrackTM
Where you track your
samples, equipment
and measurement data
across your workflow
Riffyn DiscoverTM
Where you pick hits,
eliminate variation, and
establish cause & effect
Riffyn ShareTM
Where you collaborate, review,
and transfer all your methods
and results
Riffyn BridgeTM
where you connect data sources including legacy
applications, enterprise systems, and devices
25. © 2017 Riffyn Inc hello@riffyn.com
Riffyn DiscoverTM
15mL
250 mL
250 mL
50 L
15mL
250 mL
250 mL
50 L
Analyze
27. © 2017 Riffyn Inc hello@riffyn.com
DRAG-AND-DROP EXPERIMENT DESIGN
AUTOMATICALLY LINKS METHODS TO
RESULTS
VISUAL DESIGN
DATA IS PRESENTED IN A STATISTICAL DATA
FRAME READY FOR VISUALIZATION, QUALITY
ANALYSIS, CORRELATION & MACHINE LEARNING
REAL-TIME DATA INTEGRATION
INACCESSIBLE METHODS DATA FRAGMENTATION
Riffyn Scientific Development Environment
solves solves
Addresses 3 critical barriers to reproducible outcomes & faster product development
“The problem isn’t capturing data, it’s integrating results so we can do science”
ONLINE AND OFFLINE DATA CAPTURED AND
INSTANTLY INTEGRATED FOR IMMEDIATE
INTERROGATION
ANALYSIS IN SECONDS
EXPERIMENTAL ERROR &
MISSED DISCOVERIES
solves
28. © 2017 Riffyn Inc hello@riffyn.com
EXPERIMENT DATA TABLE
SINGLE-VERSION DATA TABLE
SINGLE-VERSION DATA TABLE
SINGLE-VERSION DATA
TABLE
EXPERIMENT DATA TABLE
SINGLE-VERSION DATA TABLE
SINGLE-VERSION DATA TABLE
SINGLE-VERSION DATA
TABLE
EXPERIMENT DATA TABLE
SINGLE-VERSION DATA TABLE
SINGLE-VERSION DATA TABLE
SINGLE-VERSION DATA
TABLE
Variable1
DATA FROM 2 PROPERTIES ON ALL RUNS IN ONE
EXPERIMENT
DATA ON 2 PROPERTIES FROM SELECTED
OBSERVATIONS IN MULTIPLE VERSIONS &
EXPERIMENTS
Variable 2
PROCESS DATA TABLE
FILTER TO SELECTED OBSERVATIONS CORRELATE 2 OR MORE VAROAB;ES
Immediate access to
• Material genealogy
• Error / Variance analysis
• Statistical process control
• Hit picking
• Root cause analysis
• Design of Experiments
• AI / statistical learning pipeline
Riffyn Scientific Development Environment …
delivers every data point structured, contextualized & ready for machine learning / AI
Variables
Observations
29. © 2017 Riffyn Inc hello@riffyn.com
Founded in 2014 Offices in Oakland & Boston
Cloud-based software
biopharm chemicals food tech transfer
Our mission is to deliver research we can trust and build on efficiently, like high-
quality parts in a global supply chain