Cloud Computing y Big Data,
próxima frontera de la innovación
Cloud Computing and Big Data,
the next frontier of innovation


Jordi Torres, UPC-BSC
Madrid, 21 Marzo 2013
HOW DID SCIENCE START?
Source: Prof. Mateo Valero, BSC-CNS 2010
Source: Prof. Mateo Valero, BSC-CNS 2010
HOW IS SCIENCE ADVANCING TODAY?
Source: Prof. Mateo Valero, BSC-CNS 2010
Source: Prof. Mateo Valero, BSC-CNS 2010
MATHEMATICAL CALCULATIONS?

         WHERE?
MN3
              Cores/chip                 8
              Chip/node                  2
Compute       Cores/node                16
              Nodes                    3028
              Total cores          48448
              Freq.                     2,6
              Gflops/core              20,8
Performance
              Gflops/node           332,8
              Total Tflops         1000,0
              GB/core (GB)               2
Memory        GB/node (GB)              32
              Total (TB)            96,89
              Latency (μs)              0,7
Network
              Bandwidth (Gb/s)          40
Storage       (TB)                     2000
Consumption   (KW)                     1080
FOR SOME SPANISH RESEARCH GROUPS!
AND…

FOR THE REST OF THE WORLD?
GOOD NEWS!




Source: http://news.cnet.com/8301-13846_3-57349321-62
/amazon-takes-supercomputing-to-the-cloud
CLOUD COMPUTING?
Source: http://www.wired.com/wiredenterprise/2011/
12/nonexistent-supercomputer/all/1
Source: http://www.facebook.com/media/
            set/?set=a.190842620965185.47008.140375289345252




   40 Mw
28.000 m2
Foto: Google
HUGE DATA CENTERS
Foto: Google




                        > football pitch x 4
Source: http://www.google.com/about/datacenters/gallery/images
Source: http://www.google.com/about/datacenters/gallery/images
Source: http://www.google.com/about/datacenters/gallery/images
Different IT
             production
Foto: J.T.
CLOUD COMPUTING:
            IT as a service

On-demand self-service                                           Pay per use




  Rapid elasticity                                     Ubiquitous access
                                             ....
           Source: http://www.telegraph.co.uk/technology
           /reviews/9241719/Power-Ethernet-Sockets-review.html
Example of benefits (IaaS):




1 computer in a rack
for 120 hours          120 computers in three
                       racks for 1 hour


                              Idea : Tutorial SC2011 - Robert Grossman
AND DATA?
Source: http://www.docuciencia.es/2009/05/lhc-el-acelerador-de-particulas/



“… the LHC produces 1PetaByte of data every second, big data and
lack of computing resources were becoming the European Organization
for Nuclear Research’s biggest IT challenges…”
       Source: computerweekly.com/news/2240173897/CERN-adopts
       -OpenStack-private-cloud-to-solve-big-data-challenges
1 Gigabyte (GB) = 1.000.000.000 byte
1 Terabyte (TB) = 1.000 Gigabyte (GB)
1 Petabyte (PB) = 1.000.000 Gigabyte (GB)
1 Exabyte (EB) = 1.000.000.000 Gigabyte (GB)
1 Zettabyte (ZB) = 1.000.000.000.000 (GB)
Deluge of data created daily




                               Source: Economist , Feb 25th, 2010 http://www.economist.com/node/15579717
Big Data?

definition?
BIG DATA?
Big Data is data that exceeds the
storing, processing and managing
capacity of conventional systems.
BIG DATA?




The reason is that the data is too
big, moves too fast, or doesn’t fit
the structures of our current systems’
architectures.
BIG DATA?




Moreover, to gain value from this
data, we must change the way to
analyze it.
BIG DATA?
Big Data is data that exceeds the storing,
processing and managing capacity of
conventional systems.
The reason is that the data is too big,
moves too fast, or doesn’t fit the
structures of our current systems’
architectures.
Moreover, to gain value from this data, we
must change the way to analyze it.
NEW CHALLENGES
that must be addressed urgently, in order to respond
     to the needs of the advancement of science


                 1.   Storing
                 2.   Managing
                 3.   Processing
                 4.   Analyzing
Affordable Storage
But scanning disks…



assume 100MB/sec
But scanning disks…



assume 100MB/sec
more than 5 hours
approach: massive parallelism

    assume 20.000 disks:
scanning 2 TB takes 1 second




Source: http://www.google.com/about/datacenters/gallery/images/_2000/IDI_018.jpg
1 Data processing challenges




Rethinking data processing is required:
      MapReduce, Storm, S4,…



  Source: http://www.google.com/about/datacenters/gallery/images/_2000/IDI_018.jpg
2 Data storage challenges

New Storage technologies are required

                     HHD 100 cheaper than RAM
                     But 1000 times slower
RAM vs HHD

                     Solid- state drive (SSD)
                     Not volatile
Present solutions:

                     Storage Class Memory (SCM)
Research:
3 Data management challenges


   Relational DB can’t support everything


Example: eventual consistency

Solution: “NoSQL systems”

Research: New management systems
                                   Source: gigaom.com/cloud/big-data-
                                   and-nosql-march-to-the-enterprise/




                                                                        43
4 Obtaining value from data

        The information is non actionable knowledge

-
             Data                  prediction using data mining &
                          +        machine learning techniques
Value




                         Volume

          Information
                                  Research: The majority of algorithms
                                  function well in thousands of
+                                 registers, however at the moment they
           Knowledge      -       are impractical for thousands of
                                  milions.
Cloud Computing
   and Big Data:
the next frontier of
    science and
     innovation
Thank you for your attention

www.JordiTorres.org - @JordiTorresBCN




     www.smartcityexpo.com                 www.bsc.es/eBusiness
  Autonomic Systems and e-Business Platforms research line at BSC/UPC
Thank you for your attention

www.JordiTorres.org - @JordiTorresBCN




     www.smartcityexpo.com                 www.bsc.es/eBusiness
  Autonomic Systems and e-Business Platforms research line at BSC/UPC

Cloud Computing y Big Data, próxima frontera de la innovación