Kennisalliantie Nieuwjaarsreceptie 31 januari 2013:
Prof. dr. Jacob de Vlieg: “Taming the Big Data Beast Together”
CEO en wetenschappelijk directeur van het Netherlands eScience Center (NLeSC)
1. Netherlands eScience Center
ICT Synergy Hub, Amsterdam
Taming the Big Data Beast - Together
Nieuwjaarsbijeenkomst Kennisalliantie
Delft, 31 januari-2013
Prof. dr. Jacob de Vlieg ¹ ²
1. CEO & Scientific Director of Netherlands eScience Center, NWO-SURF
2. Head Computational Design & Discovery, CMBI, Radboud University, Medical Center, Nijmegen,
Netherlands
2. Agenda
• Big Data in Science: Challenges & Opportunities
– Top Sector ICT Roadmap theme: “Data, Data, Data”
• Netherlands eScience Center (NLeSC)
– Expert centre for Big Data Research
• Joint NWO-NLeSC “Big Data” project call
– Public-private partnerships
3. Data are the lifeblood of modern science and
the digital economy
4. Data are the lifeblood of modern science and
the digital economy
Managing, analyzing, linking & re-using data to create business
value and/or scientific breakthroughs e.g.
– Social media data to influence consumer choices
– Sensor networks data: e.g. sensor-enabled smart dikes
– Imaging & biobanking data in health care e.g. diagnostics, medicine
– And many more opportunities
5. Data are the lifeblood of modern science and
the digital economy
Managing, analyzing, linking & re-using data to create business
value and/or scientific breakthroughs e.g.
– Social media data to influence consumer choices
– Sensor networks data: e.g. sensor-enabled smart dikes
– Imaging & biobanking data in health care e.g. diagnostics, medicine
– And many more opportunities
Big Data: a complex concept
– 4Vs: Volume, Variety, Velocity, Verification
6. Data are the lifeblood of modern science and
the digital economy
Managing, analyzing, linking & re-using data to create business
value and/or scientific breakthroughs e.g.
– Social media data to influence consumer choices
– Sensor networks data: e.g. sensor-enabled smart dikes
– Imaging & biobanking data in health care e.g. diagnostics, medicine
– And many more opportunities.
Big Data: a complex concept
– 4Vs: Volume, Variety, Velocity, Verification
Big Data inextricably connected to eScience/HPC
ICT top sector roadmap: e-Science is about intelligent infrastructure to
model and/or to access big data
7. Key eScience challenges Big Data research
– Cross-type data integration
– Data-driven & multi-models simulations
– Visualization & analytics
– High performance computing: connected computers & fast networks.
8. Key eScience challenges Big Data research
– Cross-type data integration
– Data-driven & multi-models simulations
– Visualization & analytics
– High performance computing: connected computers & fast networks
– Stimulate culture of knowledge sharing: no silos; data stewardship
– Rationalization of ICT landscapes; interoperability & industry data standards
– Training & education
9. Science itself is changing …We
need to change with it…
Neelie Kroes in “Giving Europe’s Scientists the
Tools to Deliver”
Two key words: multidisciplinary research & data-driven discovery
12. eScience and the mystery of the
empty labs
• Much more data per experiment (miniaturized and/or automation)
• External data sources & outsourcing
• Experimental design, data management & analytics(eScience)
13. Quantified Self Movement -> Big Data
Use apps and wearable sensors to
monitor daily life e.g. hours of sleep, food
consumed, exercise taken, etc.
Quantified Self = Big Data + Mobile +
Sensors + Visualization + Gamification
.
14. eScience Hero
Fights for medical innovation; parkinson’s disease
• Big Data
• Pattern recognition
• Machine learning
• Social Media
Andy Grove (ex-CEO Intel)
15. Voice algorithms spot Parkinson's disease:
data-driven diagnostics
• Machine learning algorithms that analyse voice
recordings to detect Parkinson's symptoms early
on (Little at al. @ Media Lab, MIT)
• Social Media:
Looking for volunteers to contribute to
the database to improve pattern
recognition
16. Voice algorithms spot Parkinson's disease:
data-driven diagnostics
• Machine learning algorithms that analyse voice
recordings to detect Parkinson's symptoms early
on (Little at al. @ Media Lab, MIT)
• Social Media:
Looking for volunteers to contribute to
the database to improve pattern
recognition
Social networking health sites: patient-driven data collection
•21andme
•PatientsLikeMe.com
•And so on
Big Data V= Verification: privacy, compliance, etc
17. 'Data Scientist' is
now the hottest
job title in Silicon
Valley… Tim O'Reilly
Founder of O'Reilly Media
Supporter free software and open
source movements
McKinsey projected that the US needs
140,000 to 190,000 more workers with “deep
analytical expertise”
18. Netherlands eScience Center
Netherlands organization Principal Dutch body for
for scientific research: ICT innovation for research
NL-eSC
SURF Science park, Amsterdam;
SARA, EGI
Networked innovation model
Bridge:
•Science & advanced ICT
•Industry & Academic Research
•Training & Education
New ways to do research made possible because of Big Data/eScience
19. NLeSC portfolio divided in themes
•Sustainability & Environment •Life Sciences
- Climate - Green Genetics
- Water management - Translational Research IT
-Energy - Foods
-Ecology - Cognition/Neuroscience
•Chemistry & Materials •eScience Methodology & ‘Big Data’
-Chemistry - eScience Methodology
- Astronomy
•Humanities & Social Sciences
- Humanities
-Social Sciences
20. Can scientists from digital humanities help food
researchers?
Food Research: Food Specific Ontologies for Food Focused Text Mining
Project Leader: Wynand Alkema
Addressing absence of domain specific structured
vocabularies which limits the use of data mining &
knowledge management methods in food research.
Digital Humanities: BiographyNED
Project Leader: Guus Schreiber
Will improve current version of the Biography Portal by
incorporating analytical tools to show interconnections,
trends, geographical maps and time lines.
21. eScience & Big Data: providing leads for
new food applications
22. NLeSC eScience engineers:
Scientists bridging research and advanced ICT
Deliver sustainable solutions for data-driven research
Work both at center and on site
23. Collaborative Innovation Network
Taming the Data Beast Together
SMEs,etc
NLeSC eScience Engineers:
Work both at center and on site:
•Exchange of eScience expertise
•Re-use of proven eScience (technology hopping)
•Career development & training
24. Grand scientific challenges leads to
innovative eScience & Big Data Research
eSalsa NLeSC project: data-driven simulations & advanced
visualization to understand Climate Change
Prof. Henk Dijkstra,
Univ. of Utrecht
NLeSC Integrator Climate
Dr. Jason Maassen
eScience Engineer
NLeSC
•eScience to allow unprecedented level of detail (large scale distributed computing)
•State-of-the-art visualization techniques to analyze hundreds of Terabytes of
output
•Re-use of proven eScience concepts in new areas (e.g. sector water)
25. The number of data-driven start-ups is growing—
particularly when it comes to social media.
Taming the Big Data Beast
26. Development of a high performance
Twitter analysis platform
Hadoop – MapReduce architecture @ a large SARA computer cluster
Smart search & analysis software
Prof. Antal van den Bosch
Goal is to ask “Big Data” research questions e.g. NLeSC Integrator Humanities
Radboud University Nijmegen
• Ability to analyze microblogging data produced over years
• Time dependant Dr. Erik Tjong Kim Sang
eScience Engineer
• Real time sentiment analysis NleSC
• And so on…
27. Cyber-common: a facility for 21st century data-
driven research and multidisciplinary team work
The key to scientific questions y!
To link minds and eScience
SURF-SARA-NLeSC
28. Cyber-common: a facility for 21st century data-
driven research and multidisciplinary team work
The key to to scientific questions
The key scientific questions y!
yet unasked!
To link minds and eScience
SURF-SARA-NLeSC
29. Joint NWO-NLeSC “data sciences” call
• Focus on stimulating public-private partnerships
• Three instruments:
– Industrial Partnership Programme (IPP)
– Technology Area’s (TA)
– Knowledge Innovation Mapping SMEs (KIEM MKB)
Rosemarie van der Veen-Oei (NLeSC)
r.vanderveen@nwo.nl
T 070 3440 851
Mark Kas (NWO) www.nlesc.nl
m.kas@nwo.nl
T 070 3440 811, M 06 205 93 207 Netherlands eScience Center