Taking (some of) the mystery out of Big Data
Claus Stie Kallesøe
7th Berlin Conference on IP in Life Sciences
Focus on Big...
2
Introducing myself
Current roles:
Board of Directors, Pistoia Alliance
Head of Global Research Informatics
Background:
M...
Introduction
3
NOT FOR PROMOTIONALUSE
Big Data –
Either VERY large datasets AND/OR other complexities
4
Characteristics of big data
Sourc...
A couple of words about scale
100’s of Megabytes
This should not be a problem. Can be hand led with Matlab, R, Ruby
10’s o...
NOT FOR PROMOTIONALUSE
Big Data - Definition
6
"Big Data is high volume, high velocity, and/or high variety
information as...
Cool, but remember where we are!
Gartner Hype Cycle 2013
7
Big Data in Pharma R&D
8
What is Big Data in Pharma R&D?
Many ideas/possibilities across Pharma R&D and market
access
But many of them are likley N...
NOT FOR PROMOTIONALUSE
Linking R&D data
Semantic, Text indexes and search tools
10
Purpose: Build text indexes which enabl...
NOT FOR PROMOTIONALUSE
What about patents?
Text mining, linking and indexing
11
Text mining of patent databases and other
...
NOT FOR PROMOTIONALUSE
Pharmaceutical R&D – Future Big
Data Opportunities
12
Online social networks and health records off...
Technologies
Can we do anything on our own
13
For many people/companies
”Big data technology” is a black box
14
”A lot of stuff”
And then the vendors go:
If
{ box = mag...
Working within a community
A lot of tools available
15From: http://people10.com/blog/ruby-on-rails-the-popular-platform-fo...
New visualisations – easy and free
http://philogb.github.io/jit/demos.html
Automated calculations
LSP Front End
Job submitted to async
calculation server
1
2
3
4
5
5a
5b
5c
Etc……
https://circleci.com/
Also a lot of great tools to handle data
18
Elasticsearch text indexes
All research assay metadata
=> Google like search to find the relevant assay
All research proje...
Conclusion – Big data in Pharma R&D
Many opportunitites across R&D and market access
More data linking and data analytics ...
Please go home and read….
21http://blog.mongohq.com/you-dont-have-big-data/
http://ask.debian.net/
Upcoming SlideShare
Loading in …5
×

Taking the mystery out of Big Data - Berlin - Feb 2014

943 views
819 views

Published on

Potential use cases for use of Big Data in Pharma R&D. Also trying to take some of the hype out of the topic and present some tools that can be used to link and analyse data eventhough they are not really Big data (just important data)

Published in: Health & Medicine
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
943
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Taking the mystery out of Big Data - Berlin - Feb 2014

  1. 1. Taking (some of) the mystery out of Big Data Claus Stie Kallesøe 7th Berlin Conference on IP in Life Sciences Focus on Big Data February 7, 2014 1
  2. 2. 2 Introducing myself Current roles: Board of Directors, Pistoia Alliance Head of Global Research Informatics Background: MSc. Pharm, Uni of Pharma Sciences, Copenhagen, 1997 Diploma Software Development, School of Engineering, Copenhagen, 2002 E-MBA, INSEAD, France, 2007 Linkedin: http://www.linkedin.com/in/clausstiekallesoe
  3. 3. Introduction 3
  4. 4. NOT FOR PROMOTIONALUSE Big Data – Either VERY large datasets AND/OR other complexities 4 Characteristics of big data Source: IBM methodology
  5. 5. A couple of words about scale 100’s of Megabytes This should not be a problem. Can be hand led with Matlab, R, Ruby 10’s of Gigabytes This can all be loaded into the RAM of a laptop 100/500 Gigabytes – 1Terabyte 2 Terabyte harddrives can be bought in the local shop for €100 Connect it to your laptop and install postgresql or a no-sql database on it > 5 Terabytes Now you might have a size issue 5Inspired by: http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
  6. 6. NOT FOR PROMOTIONALUSE Big Data - Definition 6 "Big Data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."
  7. 7. Cool, but remember where we are! Gartner Hype Cycle 2013 7
  8. 8. Big Data in Pharma R&D 8
  9. 9. What is Big Data in Pharma R&D? Many ideas/possibilities across Pharma R&D and market access But many of them are likley NOT real Big Data problems! Are they relevant and can they bring insights? Yes, very much so Should we than find a way to handle them? Absolutely 9
  10. 10. NOT FOR PROMOTIONALUSE Linking R&D data Semantic, Text indexes and search tools 10 Purpose: Build text indexes which enables fast searches across large data sets of linked data – both internal and external data 10 Research Databases ClinicalTrials.gov Clinicaltrialsregister.eu 2) 1) 4) External databases Clinical Databases 3) Today
  11. 11. NOT FOR PROMOTIONALUSE What about patents? Text mining, linking and indexing 11 Text mining of patent databases and other sources… Including chemicalname => structure ….followed by: 1. Convert to RDF => link with Semantic technologies 2. Enrich and load into a text index like Solr or similar
  12. 12. NOT FOR PROMOTIONALUSE Pharmaceutical R&D – Future Big Data Opportunities 12 Online social networks and health records offer a huge repository of real-world patient data that can be used to: identify undiagnosed patients and serious adverse events improve understanding of health outcomes and comparative effectiveness
  13. 13. Technologies Can we do anything on our own 13
  14. 14. For many people/companies ”Big data technology” is a black box 14 ”A lot of stuff” And then the vendors go: If { box = magic or money} then { box = expensive}
  15. 15. Working within a community A lot of tools available 15From: http://people10.com/blog/ruby-on-rails-the-popular-platform-for-web-development/
  16. 16. New visualisations – easy and free http://philogb.github.io/jit/demos.html
  17. 17. Automated calculations LSP Front End Job submitted to async calculation server 1 2 3 4 5 5a 5b 5c Etc……
  18. 18. https://circleci.com/ Also a lot of great tools to handle data 18
  19. 19. Elasticsearch text indexes All research assay metadata => Google like search to find the relevant assay All research project sharepoint workspaces => Enable easy, fast cross project queries to find trends 19
  20. 20. Conclusion – Big data in Pharma R&D Many opportunitites across R&D and market access More data linking and data analytics than Big Data You can use freely available tools on ”normal” hardware No magic ”Under the hood” – it’s just data BUT you still need to define the questions you want to answer – before diving into technology! 20
  21. 21. Please go home and read…. 21http://blog.mongohq.com/you-dont-have-big-data/
  22. 22. http://ask.debian.net/

×