Standards, tools, incentives – what does it take to enable data sharing?
Fiona Nielsen at AIDR, May 14th 2019
I used to be a scientist
- like you
I became frustrated by my lack
of data access
Picture by Melissa O’Donahue CC-BY-ND
Strongly motivated by the journey of my mother
“Someone has got to do something!”
Fiona Nielsen
Around 2012
I wanted to build a data broker for genomics data
To speed up research
DNAdigest
Founded Repositive with Adrian
With Repositive we built a search engine for genomics
Contributed our data
search expertise to the
NIH Data Commons
Pilot
Open pages – all
indexed by Google
Index of >1million
public genomic data
sets
All users can
contribute
annotation and data
sets
Visit http://discover.repositive.io
We launched a marketplace for translational cancer models
Biopharma Cancer R&D Cancer model vendors
(CROs)
Help researchers find the right cancer
model to suit their needs in drug
development for precision medicine
What do cancer models have to do with data?
The cancer models are described by complex data:
• Genetic profile
• Tumor type
• Cancer growth and phenotype
There are 100s of cancer model providers
With 1000s of cancer models
Finding the right cancer model is a data access problem
Photo by Marblesgalore.com
We organize the data to make it easily searchable
Photo by Marblesgalore.com
Our platform enables data discovery and data sharing
Biopharma Cancer R&D Cancer model vendors
(CROs)
Serving pre-
clinical
CROs
seeking
customers
who are
interested in
their cancer
models
Serving
researchers
who are
looking to
outsource
their cancer
model
experiments World’s largest inventory of cancer
models 5,000+ models in our partner
networkRead more on http://repositive.io/
Why is data sharing still a
problem?
The scientific approach for addressing challenges
Solving the problem for Alice and Bob:
1. Define the problem
2. Design a solution
3. Publish a paper
You are done.
By odder - own work, based on png version originally uploaded to the Commons by Dake., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1812312
Some steps are missing to solve the problem
Your solution/algorithm/standard method (i)
need to be implemented into an easy-to-use and available tool (ii)
And the people who are experiencing the problem need a reason/incentive (iii) to use the tool
Unfortunately, the scientific approach only addresses (i)
We have lots of algorithms and proposed standards
But not all of them are solving real problems…
https://xkcd.com/927/
On the other hand, if you have incentives…
©Derek Law
What have I learnt across academia and industry?
Data sharing is a multi-step process
Before you can access data, you need to
 Assert this data is the data that you need
 Discover that the data exists
 someone made that data discoverable
Each step of the process requires
incentives, tools, standards
My prediction
No advanced AI/ML tool will dramatically improve the data contributed,
discovered or accessed outside the communities and problem domains
where there is a strong incentive for it’s use.
But will incentives alone fix the problem?
You have to fix two sides:
- Increase the incentives
- And/or lower the effort needed to use the tools
- Incentives?
- Fix scientific publishing to include data
- Fix hiring, promotion, tenure criteria
- Fix funding requirements
©ReinierVanOorsouw
Take an example from AirBnB
Incentives and tools existed before AirBnB
Holiday homes would be listed on Craigslist
What did AirBnB do?
They made it easier to search for rentals,
And super-easy to make data about your rental home
discoverable!
AirBnB:
Incentives > tools > standards and methods
http://ui-patterns.com/explore/domain/airbnb+com
My message to you
Is not to stop developing methods 
It is:
If you care about the impact you want to make and you want to see the problem solved on a
larger scale, you have to care about making easy to use tools and fixing the incentives.
It is not an easy task, but I am with you on this one!
What can I do today to fix incentives?
Acknowledge and give credit for good data stewardship data creation, data visibility, data
accessibility, data curation, data publishing and data sharing.
Start in your day-to-day work: e.g. create a data steward award in your lab!
Always include when hiring, promoting and funding: promote good data stewardship
Make your data stewardship tools really easy to use (documentation, support, etc)
& keep developing cool methods 
Thank you for your attention – go share that data!
repositive [ re-poz-i-tiv ], noun;
1. a positive experience of accessing
genomic data repositories
Thanks for listening!
Find me on twitter @glyn_dk
and read more about us at Repositive.io

AIDR2019 - standards - tools - incentives - what does it take to enable data sharing?

  • 1.
    Standards, tools, incentives– what does it take to enable data sharing? Fiona Nielsen at AIDR, May 14th 2019
  • 2.
    I used tobe a scientist - like you
  • 3.
    I became frustratedby my lack of data access Picture by Melissa O’Donahue CC-BY-ND
  • 4.
    Strongly motivated bythe journey of my mother
  • 5.
    “Someone has gotto do something!” Fiona Nielsen Around 2012
  • 6.
    I wanted tobuild a data broker for genomics data To speed up research DNAdigest Founded Repositive with Adrian
  • 7.
    With Repositive webuilt a search engine for genomics Contributed our data search expertise to the NIH Data Commons Pilot Open pages – all indexed by Google Index of >1million public genomic data sets All users can contribute annotation and data sets Visit http://discover.repositive.io
  • 8.
    We launched amarketplace for translational cancer models Biopharma Cancer R&D Cancer model vendors (CROs) Help researchers find the right cancer model to suit their needs in drug development for precision medicine
  • 9.
    What do cancermodels have to do with data? The cancer models are described by complex data: • Genetic profile • Tumor type • Cancer growth and phenotype There are 100s of cancer model providers With 1000s of cancer models
  • 10.
    Finding the rightcancer model is a data access problem Photo by Marblesgalore.com
  • 11.
    We organize thedata to make it easily searchable Photo by Marblesgalore.com
  • 12.
    Our platform enablesdata discovery and data sharing Biopharma Cancer R&D Cancer model vendors (CROs) Serving pre- clinical CROs seeking customers who are interested in their cancer models Serving researchers who are looking to outsource their cancer model experiments World’s largest inventory of cancer models 5,000+ models in our partner networkRead more on http://repositive.io/
  • 13.
    Why is datasharing still a problem?
  • 14.
    The scientific approachfor addressing challenges Solving the problem for Alice and Bob: 1. Define the problem 2. Design a solution 3. Publish a paper You are done. By odder - own work, based on png version originally uploaded to the Commons by Dake., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1812312
  • 15.
    Some steps aremissing to solve the problem Your solution/algorithm/standard method (i) need to be implemented into an easy-to-use and available tool (ii) And the people who are experiencing the problem need a reason/incentive (iii) to use the tool Unfortunately, the scientific approach only addresses (i)
  • 16.
    We have lotsof algorithms and proposed standards But not all of them are solving real problems… https://xkcd.com/927/
  • 17.
    On the otherhand, if you have incentives… ©Derek Law
  • 18.
    What have Ilearnt across academia and industry? Data sharing is a multi-step process Before you can access data, you need to  Assert this data is the data that you need  Discover that the data exists  someone made that data discoverable Each step of the process requires incentives, tools, standards
  • 19.
    My prediction No advancedAI/ML tool will dramatically improve the data contributed, discovered or accessed outside the communities and problem domains where there is a strong incentive for it’s use.
  • 20.
    But will incentivesalone fix the problem? You have to fix two sides: - Increase the incentives - And/or lower the effort needed to use the tools - Incentives? - Fix scientific publishing to include data - Fix hiring, promotion, tenure criteria - Fix funding requirements ©ReinierVanOorsouw
  • 21.
    Take an examplefrom AirBnB Incentives and tools existed before AirBnB Holiday homes would be listed on Craigslist What did AirBnB do? They made it easier to search for rentals, And super-easy to make data about your rental home discoverable! AirBnB: Incentives > tools > standards and methods http://ui-patterns.com/explore/domain/airbnb+com
  • 22.
    My message toyou Is not to stop developing methods  It is: If you care about the impact you want to make and you want to see the problem solved on a larger scale, you have to care about making easy to use tools and fixing the incentives. It is not an easy task, but I am with you on this one!
  • 23.
    What can Ido today to fix incentives? Acknowledge and give credit for good data stewardship data creation, data visibility, data accessibility, data curation, data publishing and data sharing. Start in your day-to-day work: e.g. create a data steward award in your lab! Always include when hiring, promoting and funding: promote good data stewardship Make your data stewardship tools really easy to use (documentation, support, etc) & keep developing cool methods 
  • 24.
    Thank you foryour attention – go share that data!
  • 25.
    repositive [ re-poz-i-tiv], noun; 1. a positive experience of accessing genomic data repositories Thanks for listening! Find me on twitter @glyn_dk and read more about us at Repositive.io

Editor's Notes

  • #8 Before google for datasets 50+ data sources 1M data sets + open pages => now all the metadata we curated and indexed is also findable in google datasets
  • #23 You can start with: Fixing the skewed incentives for publishing scientific publications Fixing hiring, promotion, tenur Fixing funding requirements
  • #24 You can start with: Fixing the skewed incentives for publishing scientific publications Fixing hiring, promotion, tenur Fixing funding requirements