Netherlands eScience Center
ICT Synergy Hub, Amsterdam


Taming the Big Data Beast - Together

Nieuwjaarsbijeenkomst Kennisalliantie
Delft, 31 januari-2013

Prof. dr. Jacob de Vlieg ¹ ²
1. CEO & Scientific Director of Netherlands eScience Center, NWO-SURF
2. Head Computational Design & Discovery, CMBI, Radboud University, Medical Center, Nijmegen,
Netherlands
Agenda

• Big Data in Science: Challenges & Opportunities
  – Top Sector ICT Roadmap theme: “Data, Data, Data”


• Netherlands eScience Center (NLeSC)
  – Expert centre for Big Data Research


• Joint NWO-NLeSC “Big Data” project call
  – Public-private partnerships
Data are the lifeblood of modern science and
             the digital economy
Data are the lifeblood of modern science and
               the digital economy

Managing, analyzing, linking & re-using data to create business
  value and/or scientific breakthroughs e.g.
   –   Social media data to influence consumer choices
   –   Sensor networks data: e.g. sensor-enabled smart dikes
   –   Imaging & biobanking data in health care e.g. diagnostics, medicine
   –   And many more opportunities
Data are the lifeblood of modern science and
               the digital economy

Managing, analyzing, linking & re-using data to create business
  value and/or scientific breakthroughs e.g.
   –   Social media data to influence consumer choices
   –   Sensor networks data: e.g. sensor-enabled smart dikes
   –   Imaging & biobanking data in health care e.g. diagnostics, medicine
   –   And many more opportunities


Big Data: a complex concept
   – 4Vs: Volume, Variety, Velocity, Verification
Data are the lifeblood of modern science and
               the digital economy

Managing, analyzing, linking & re-using data to create business
  value and/or scientific breakthroughs e.g.
    –   Social media data to influence consumer choices
    –   Sensor networks data: e.g. sensor-enabled smart dikes
    –   Imaging & biobanking data in health care e.g. diagnostics, medicine
    –   And many more opportunities.


Big Data: a complex concept
    – 4Vs: Volume, Variety, Velocity, Verification


Big Data inextricably connected to eScience/HPC

ICT top sector roadmap: e-Science is about intelligent infrastructure to
   model and/or to access big data
Key eScience challenges Big Data research

–   Cross-type data integration
–   Data-driven & multi-models simulations
–   Visualization & analytics
–   High performance computing: connected computers & fast networks.
Key eScience challenges Big Data research

–   Cross-type data integration
–   Data-driven & multi-models simulations
–   Visualization & analytics
–   High performance computing: connected computers & fast networks




–   Stimulate culture of knowledge sharing: no silos; data stewardship
–   Rationalization of ICT landscapes; interoperability & industry data standards
–   Training & education
Science itself is changing …We
 need to change with it…

                     Neelie Kroes in “Giving Europe’s Scientists the
                     Tools to Deliver”




Two key words: multidisciplinary research & data-driven discovery
eScience and the mystery of the
empty labs
eScience and the mystery of the
empty labs
eScience and the mystery of the
empty labs




       • Much more data per experiment (miniaturized and/or automation)
       • External data sources & outsourcing
       • Experimental design, data management & analytics(eScience)
Quantified Self Movement -> Big Data

                 Use apps and wearable sensors to
                 monitor daily life e.g. hours of sleep, food
                 consumed, exercise taken, etc.


                 Quantified Self = Big Data + Mobile +
                 Sensors + Visualization + Gamification




                 .
eScience Hero
Fights for medical innovation; parkinson’s disease


•   Big Data

•   Pattern recognition

•   Machine learning

•   Social Media

                            Andy Grove (ex-CEO Intel)
Voice algorithms spot Parkinson's disease:
data-driven diagnostics
• Machine learning algorithms that analyse voice
  recordings to detect Parkinson's symptoms early
  on (Little at al. @ Media Lab, MIT)

• Social Media:

                       Looking for volunteers to contribute to
                       the database to improve pattern
                       recognition
Voice algorithms spot Parkinson's disease:
data-driven diagnostics
• Machine learning algorithms that analyse voice
  recordings to detect Parkinson's symptoms early
  on (Little at al. @ Media Lab, MIT)

• Social Media:

                          Looking for volunteers to contribute to
                          the database to improve pattern
                          recognition



Social networking health sites: patient-driven data collection
    •21andme
    •PatientsLikeMe.com
    •And so on
                     Big Data V= Verification: privacy, compliance, etc
'Data Scientist' is
  now the hottest
  job title in Silicon
  Valley…                                    Tim O'Reilly
                                             Founder of O'Reilly Media
                                             Supporter free software and open
                                             source movements



McKinsey projected that the US needs
140,000 to 190,000 more workers with “deep
analytical expertise”
Netherlands eScience Center

          Netherlands organization       Principal Dutch body for
           for scientific research:    ICT innovation for research




                                  NL-eSC
                                            SURF Science park, Amsterdam;
                                            SARA, EGI

                                            Networked innovation model

                                            Bridge:
                                                 •Science & advanced ICT
                                                 •Industry & Academic Research

                                            •Training & Education

New ways to do research made possible because of Big Data/eScience
NLeSC portfolio divided in themes
•Sustainability & Environment   •Life Sciences
- Climate                       - Green Genetics
- Water management              - Translational Research IT
-Energy                         - Foods
-Ecology                        - Cognition/Neuroscience

•Chemistry & Materials          •eScience Methodology & ‘Big Data’
-Chemistry                      - eScience Methodology
                                - Astronomy
•Humanities & Social Sciences
- Humanities
-Social Sciences
Can scientists from digital humanities help food
researchers?
Food Research: Food Specific Ontologies for Food Focused Text Mining

Project Leader: Wynand Alkema

Addressing absence of domain specific structured
vocabularies which limits the use of data mining &
knowledge management methods in food research.


Digital Humanities: BiographyNED

Project Leader: Guus Schreiber

Will improve current version of the Biography Portal by
incorporating analytical tools to show interconnections,
trends, geographical maps and time lines.
eScience & Big Data: providing leads for
        new food applications
NLeSC eScience engineers:
Scientists bridging research and advanced ICT




                   Deliver sustainable solutions for data-driven research

                   Work both at center and on site
Collaborative Innovation Network
     Taming the Data Beast Together

 SMEs,etc




      NLeSC eScience Engineers:

      Work both at center and on site:
      •Exchange of eScience expertise
      •Re-use of proven eScience (technology hopping)
      •Career development & training
Grand scientific challenges leads to
     innovative eScience & Big Data Research
eSalsa NLeSC project: data-driven simulations & advanced
visualization to understand Climate Change
                                                                   Prof. Henk Dijkstra,
                                                                   Univ. of Utrecht
                                                                   NLeSC Integrator Climate




                                                                    Dr. Jason Maassen
                                                                    eScience Engineer
                                                                    NLeSC




 •eScience to allow unprecedented level of detail (large scale distributed computing)

 •State-of-the-art visualization techniques to analyze hundreds of Terabytes of
 output

 •Re-use of proven eScience concepts in new areas (e.g. sector water)
The number of data-driven start-ups is growing—
particularly when it comes to social media.




              Taming the Big Data Beast
Development of a high performance
              Twitter analysis platform
Hadoop – MapReduce architecture @ a large SARA computer cluster


Smart search & analysis software



                                                                 Prof. Antal van den Bosch
Goal is to ask “Big Data” research questions e.g.                NLeSC Integrator Humanities
                                                                 Radboud University Nijmegen

 •   Ability to analyze microblogging data produced over years
 •   Time dependant                                                Dr. Erik Tjong Kim Sang
                                                                   eScience Engineer
 •   Real time sentiment analysis                                  NleSC
 •   And so on…
Cyber-common: a facility for 21st century data-
driven research and multidisciplinary team work
     The key to scientific questions y!
 To link minds and eScience




                      SURF-SARA-NLeSC
Cyber-common: a facility for 21st century data-
driven research and multidisciplinary team work
     The key to to scientific questions
       The key scientific questions y!
                yet unasked!
 To link minds and eScience




                      SURF-SARA-NLeSC
Joint NWO-NLeSC “data sciences” call
• Focus on stimulating public-private partnerships

• Three instruments:
   – Industrial Partnership Programme (IPP)
   – Technology Area’s (TA)
   – Knowledge Innovation Mapping SMEs (KIEM MKB)


           Rosemarie van der Veen-Oei (NLeSC)
           r.vanderveen@nwo.nl
           T 070 3440 851




             Mark Kas (NWO)                     www.nlesc.nl
             m.kas@nwo.nl
             T 070 3440 811, M 06 205 93 207    Netherlands eScience Center
Thank you

     www.esciencecenter.nl

Taming the Big Data Beast - Together

  • 1.
    Netherlands eScience Center ICTSynergy Hub, Amsterdam Taming the Big Data Beast - Together Nieuwjaarsbijeenkomst Kennisalliantie Delft, 31 januari-2013 Prof. dr. Jacob de Vlieg ¹ ² 1. CEO & Scientific Director of Netherlands eScience Center, NWO-SURF 2. Head Computational Design & Discovery, CMBI, Radboud University, Medical Center, Nijmegen, Netherlands
  • 2.
    Agenda • Big Datain Science: Challenges & Opportunities – Top Sector ICT Roadmap theme: “Data, Data, Data” • Netherlands eScience Center (NLeSC) – Expert centre for Big Data Research • Joint NWO-NLeSC “Big Data” project call – Public-private partnerships
  • 3.
    Data are thelifeblood of modern science and the digital economy
  • 4.
    Data are thelifeblood of modern science and the digital economy Managing, analyzing, linking & re-using data to create business value and/or scientific breakthroughs e.g. – Social media data to influence consumer choices – Sensor networks data: e.g. sensor-enabled smart dikes – Imaging & biobanking data in health care e.g. diagnostics, medicine – And many more opportunities
  • 5.
    Data are thelifeblood of modern science and the digital economy Managing, analyzing, linking & re-using data to create business value and/or scientific breakthroughs e.g. – Social media data to influence consumer choices – Sensor networks data: e.g. sensor-enabled smart dikes – Imaging & biobanking data in health care e.g. diagnostics, medicine – And many more opportunities Big Data: a complex concept – 4Vs: Volume, Variety, Velocity, Verification
  • 6.
    Data are thelifeblood of modern science and the digital economy Managing, analyzing, linking & re-using data to create business value and/or scientific breakthroughs e.g. – Social media data to influence consumer choices – Sensor networks data: e.g. sensor-enabled smart dikes – Imaging & biobanking data in health care e.g. diagnostics, medicine – And many more opportunities. Big Data: a complex concept – 4Vs: Volume, Variety, Velocity, Verification Big Data inextricably connected to eScience/HPC ICT top sector roadmap: e-Science is about intelligent infrastructure to model and/or to access big data
  • 7.
    Key eScience challengesBig Data research – Cross-type data integration – Data-driven & multi-models simulations – Visualization & analytics – High performance computing: connected computers & fast networks.
  • 8.
    Key eScience challengesBig Data research – Cross-type data integration – Data-driven & multi-models simulations – Visualization & analytics – High performance computing: connected computers & fast networks – Stimulate culture of knowledge sharing: no silos; data stewardship – Rationalization of ICT landscapes; interoperability & industry data standards – Training & education
  • 9.
    Science itself ischanging …We need to change with it… Neelie Kroes in “Giving Europe’s Scientists the Tools to Deliver” Two key words: multidisciplinary research & data-driven discovery
  • 10.
    eScience and themystery of the empty labs
  • 11.
    eScience and themystery of the empty labs
  • 12.
    eScience and themystery of the empty labs • Much more data per experiment (miniaturized and/or automation) • External data sources & outsourcing • Experimental design, data management & analytics(eScience)
  • 13.
    Quantified Self Movement-> Big Data Use apps and wearable sensors to monitor daily life e.g. hours of sleep, food consumed, exercise taken, etc. Quantified Self = Big Data + Mobile + Sensors + Visualization + Gamification .
  • 14.
    eScience Hero Fights formedical innovation; parkinson’s disease • Big Data • Pattern recognition • Machine learning • Social Media Andy Grove (ex-CEO Intel)
  • 15.
    Voice algorithms spotParkinson's disease: data-driven diagnostics • Machine learning algorithms that analyse voice recordings to detect Parkinson's symptoms early on (Little at al. @ Media Lab, MIT) • Social Media: Looking for volunteers to contribute to the database to improve pattern recognition
  • 16.
    Voice algorithms spotParkinson's disease: data-driven diagnostics • Machine learning algorithms that analyse voice recordings to detect Parkinson's symptoms early on (Little at al. @ Media Lab, MIT) • Social Media: Looking for volunteers to contribute to the database to improve pattern recognition Social networking health sites: patient-driven data collection •21andme •PatientsLikeMe.com •And so on Big Data V= Verification: privacy, compliance, etc
  • 17.
    'Data Scientist' is now the hottest job title in Silicon Valley… Tim O'Reilly Founder of O'Reilly Media Supporter free software and open source movements McKinsey projected that the US needs 140,000 to 190,000 more workers with “deep analytical expertise”
  • 18.
    Netherlands eScience Center Netherlands organization Principal Dutch body for for scientific research: ICT innovation for research NL-eSC SURF Science park, Amsterdam; SARA, EGI Networked innovation model Bridge: •Science & advanced ICT •Industry & Academic Research •Training & Education New ways to do research made possible because of Big Data/eScience
  • 19.
    NLeSC portfolio dividedin themes •Sustainability & Environment •Life Sciences - Climate - Green Genetics - Water management - Translational Research IT -Energy - Foods -Ecology - Cognition/Neuroscience •Chemistry & Materials •eScience Methodology & ‘Big Data’ -Chemistry - eScience Methodology - Astronomy •Humanities & Social Sciences - Humanities -Social Sciences
  • 20.
    Can scientists fromdigital humanities help food researchers? Food Research: Food Specific Ontologies for Food Focused Text Mining Project Leader: Wynand Alkema Addressing absence of domain specific structured vocabularies which limits the use of data mining & knowledge management methods in food research. Digital Humanities: BiographyNED Project Leader: Guus Schreiber Will improve current version of the Biography Portal by incorporating analytical tools to show interconnections, trends, geographical maps and time lines.
  • 21.
    eScience & BigData: providing leads for new food applications
  • 22.
    NLeSC eScience engineers: Scientistsbridging research and advanced ICT Deliver sustainable solutions for data-driven research Work both at center and on site
  • 23.
    Collaborative Innovation Network Taming the Data Beast Together SMEs,etc NLeSC eScience Engineers: Work both at center and on site: •Exchange of eScience expertise •Re-use of proven eScience (technology hopping) •Career development & training
  • 24.
    Grand scientific challengesleads to innovative eScience & Big Data Research eSalsa NLeSC project: data-driven simulations & advanced visualization to understand Climate Change Prof. Henk Dijkstra, Univ. of Utrecht NLeSC Integrator Climate Dr. Jason Maassen eScience Engineer NLeSC •eScience to allow unprecedented level of detail (large scale distributed computing) •State-of-the-art visualization techniques to analyze hundreds of Terabytes of output •Re-use of proven eScience concepts in new areas (e.g. sector water)
  • 25.
    The number ofdata-driven start-ups is growing— particularly when it comes to social media. Taming the Big Data Beast
  • 26.
    Development of ahigh performance Twitter analysis platform Hadoop – MapReduce architecture @ a large SARA computer cluster Smart search & analysis software Prof. Antal van den Bosch Goal is to ask “Big Data” research questions e.g. NLeSC Integrator Humanities Radboud University Nijmegen • Ability to analyze microblogging data produced over years • Time dependant Dr. Erik Tjong Kim Sang eScience Engineer • Real time sentiment analysis NleSC • And so on…
  • 27.
    Cyber-common: a facilityfor 21st century data- driven research and multidisciplinary team work The key to scientific questions y! To link minds and eScience SURF-SARA-NLeSC
  • 28.
    Cyber-common: a facilityfor 21st century data- driven research and multidisciplinary team work The key to to scientific questions The key scientific questions y! yet unasked! To link minds and eScience SURF-SARA-NLeSC
  • 29.
    Joint NWO-NLeSC “datasciences” call • Focus on stimulating public-private partnerships • Three instruments: – Industrial Partnership Programme (IPP) – Technology Area’s (TA) – Knowledge Innovation Mapping SMEs (KIEM MKB) Rosemarie van der Veen-Oei (NLeSC) r.vanderveen@nwo.nl T 070 3440 851 Mark Kas (NWO) www.nlesc.nl m.kas@nwo.nl T 070 3440 811, M 06 205 93 207 Netherlands eScience Center
  • 30.
    Thank you www.esciencecenter.nl