Potential use cases for use of Big Data in Pharma R&D. Also trying to take some of the hype out of the topic and present some tools that can be used to link and analyse data eventhough they are not really Big data (just important data)
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
Taking the mystery out of Big Data - Berlin - Feb 2014
1. Taking (some of) the mystery out of Big Data
Claus Stie Kallesøe
7th Berlin Conference on IP in Life Sciences
Focus on Big Data
February 7, 2014
1
2. 2
Introducing myself
Current roles:
Board of Directors, Pistoia Alliance
Head of Global Research Informatics
Background:
MSc. Pharm, Uni of Pharma Sciences, Copenhagen, 1997
Diploma Software Development, School of Engineering, Copenhagen,
2002
E-MBA, INSEAD, France, 2007
Linkedin: http://www.linkedin.com/in/clausstiekallesoe
4. NOT FOR PROMOTIONALUSE
Big Data –
Either VERY large datasets AND/OR other complexities
4
Characteristics of big data
Source: IBM methodology
5. A couple of words about scale
100’s of Megabytes
This should not be a problem. Can be hand led with Matlab, R, Ruby
10’s of Gigabytes
This can all be loaded into the RAM of a laptop
100/500 Gigabytes – 1Terabyte
2 Terabyte harddrives can be bought in the local shop for €100
Connect it to your laptop and install postgresql or a no-sql database on it
> 5 Terabytes
Now you might have a size issue
5Inspired by: http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
6. NOT FOR PROMOTIONALUSE
Big Data - Definition
6
"Big Data is high volume, high velocity, and/or high variety
information assets that require new forms of processing
to enable enhanced decision making, insight discovery and process
optimization."
9. What is Big Data in Pharma R&D?
Many ideas/possibilities across Pharma R&D and market
access
But many of them are likley NOT real Big Data problems!
Are they relevant and can they bring insights?
Yes, very much so
Should we than find a way to handle them?
Absolutely
9
10. NOT FOR PROMOTIONALUSE
Linking R&D data
Semantic, Text indexes and search tools
10
Purpose: Build text indexes which enables fast searches across
large data sets of linked data – both internal and external data
10
Research
Databases
ClinicalTrials.gov
Clinicaltrialsregister.eu
2)
1)
4)
External
databases
Clinical
Databases
3)
Today
11. NOT FOR PROMOTIONALUSE
What about patents?
Text mining, linking and indexing
11
Text mining of patent databases and other
sources…
Including chemicalname => structure
….followed by:
1. Convert to RDF => link with Semantic technologies
2. Enrich and load into a text index like Solr or similar
12. NOT FOR PROMOTIONALUSE
Pharmaceutical R&D – Future Big
Data Opportunities
12
Online social networks and health records offer a huge repository of
real-world patient data that can be used to:
identify undiagnosed patients and serious adverse events
improve understanding of health outcomes and comparative
effectiveness
14. For many people/companies
”Big data technology” is a black box
14
”A lot of stuff”
And then the vendors go:
If
{ box = magic or money}
then
{ box = expensive}
15. Working within a community
A lot of tools available
15From: http://people10.com/blog/ruby-on-rails-the-popular-platform-for-web-development/
19. Elasticsearch text indexes
All research assay metadata
=> Google like search to find the relevant assay
All research project sharepoint workspaces
=> Enable easy, fast cross project queries to find trends
19
20. Conclusion – Big data in Pharma R&D
Many opportunitites across R&D and market access
More data linking and data analytics than Big Data
You can use freely available tools on ”normal” hardware
No magic ”Under the hood” – it’s just data
BUT you still need to define the questions you want to
answer – before diving into technology!
20
21. Please go home and read….
21http://blog.mongohq.com/you-dont-have-big-data/