Big data on the Cloud
Dr. Putchong Uthayopas
Department of Computer Engineering, Faculty of
Engineering, Kasetsart University.
Email: pu@ku.ac.th
We are living in the world of Data


                                               Video
                                             Surveillan
                                                 ce
          Social
          Media


Mobile
Sensors




 Smart
 Grids             Geophysi    Medical Imaging
                                                 Gene Sequencing
                   cal
                   Explorati
                   on
Why now?
• Internet create an ability to gather all data
  together at the scale never be seen before.
   – Data from human
   – Data from Sensor
• Crowd Sourcing is now being practice
   – User generated data is flooding the world
• New device and tools make it easy to generate
  data
Big Data
“Big data is data that exceeds the processing
capacity of conventional database systems. The
data is too big, moves too fast, or doesn’t fit the
strictures of your database architectures. To gain
value from this data, you must choose an
alternative way to process it.”



     Reference: “What is big data? An introduction to the big data
     landscape.”, Edd Dumbill,
     http://radar.oreilly.com/2012/01/what-is-big-data.html
Amazon View of Big Data

      'Big data' refers to a collection of tools,
       'Big data' refers to a collection of tools,
techniques and technologies which make ititeasy
 techniques and technologies which make easy
to work with data at any scale. These distributed,
 to work with data at any scale. These distributed,
   scalable tools provide flexible programming
    scalable tools provide flexible programming
   models to navigate and explore data of any
    models to navigate and explore data of any
    shape and size, from a variety of sources.
     shape and size, from a variety of sources.
3 Characteristics of Big Data
Big Data Challenge
Information as an Asset
• Cloud will enable larger and larger data to be
  easily collected and used
• People will deposit information into the cloud
  – Bank, personal ware house
• New technology will emerge
  – Larger and scalable storage technology
  – Innovative and complex data analysis/visualization
    for multimedia data
  – Security technology to ensure privacy
• Cloud will be mankind intelligent and memory!
Google Cloud Platform
• App engines
  – mobile and web app
• Cloud SQL
  – MySQL on the cloud
• Cloud Storage
  – Data storage
• Big Query
  – Data analysis
• Google Compute Engine
  – Processing of large data
Amazon
• Amazon EC2
  – Computation Service using VM
• Amazon DynamoDB
  – Large scalable NoSQL databased
  – Fully distributed shared nothing architecture
• Amazon Elastic MapReduce (Amazon EMR)
  – Hadoop based analysis engine
  – Can be used to analyse data from DynamoDB
Trends
• A move toward large and scalable Virtual
  Infrastructure
  – Providing computing service
  – Providing basic storage service
  – Providing Scalable large database
     • NOSQL
  – Providing Analysis Service
• All these services has to come together
  – Big data can not moved!
Issues
• Security
   – Will you let an important data being accumulate outside your
     organization?
       • If it is not an important data, why analyze them ?
   – Who own the data? If you discontinue the service, is the data
     being destroy properly.
   – Protection in multi-tenant environment
• Big data can not be moved easily
   – Processing have to be near. Just can not ship data around
       • So you finally have to select the same cloud for your processing. Is it
         available, easy, fast?
• New learning, development cost
   – Need new programming, porting?
   – Tools is mature enough?
When to use Big data on the Cloud
• When data is already on the cloud
  – Virtual organization
  – Cloud based SaaS Service
• For startup
  –   CAPEX to OPEX
  –   No need to maintain large infra
  –   Focus on scalability and pay as you go
  –   Data is on the cloud anyway
• For experimental project
  – Pilot for new services
Summary
• Big data is coming.
   – Big data are being accumulate anyway
   – Knowledge is power.
      • Better understand your customer so you can offer
        better service
• Tools and Technology is available
   – Still being developed fast
• Cloud is coming, why not doing big data
  on the cloud
   – Probably not today but soon
Thank you

Big Data on The Cloud

  • 1.
    Big data onthe Cloud Dr. Putchong Uthayopas Department of Computer Engineering, Faculty of Engineering, Kasetsart University. Email: pu@ku.ac.th
  • 2.
    We are livingin the world of Data Video Surveillan ce Social Media Mobile Sensors Smart Grids Geophysi Medical Imaging Gene Sequencing cal Explorati on
  • 3.
    Why now? • Internetcreate an ability to gather all data together at the scale never be seen before. – Data from human – Data from Sensor • Crowd Sourcing is now being practice – User generated data is flooding the world • New device and tools make it easy to generate data
  • 4.
    Big Data “Big datais data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.” Reference: “What is big data? An introduction to the big data landscape.”, Edd Dumbill, http://radar.oreilly.com/2012/01/what-is-big-data.html
  • 5.
    Amazon View ofBig Data 'Big data' refers to a collection of tools, 'Big data' refers to a collection of tools, techniques and technologies which make ititeasy techniques and technologies which make easy to work with data at any scale. These distributed, to work with data at any scale. These distributed, scalable tools provide flexible programming scalable tools provide flexible programming models to navigate and explore data of any models to navigate and explore data of any shape and size, from a variety of sources. shape and size, from a variety of sources.
  • 7.
  • 8.
  • 9.
    Information as anAsset • Cloud will enable larger and larger data to be easily collected and used • People will deposit information into the cloud – Bank, personal ware house • New technology will emerge – Larger and scalable storage technology – Innovative and complex data analysis/visualization for multimedia data – Security technology to ensure privacy • Cloud will be mankind intelligent and memory!
  • 11.
    Google Cloud Platform •App engines – mobile and web app • Cloud SQL – MySQL on the cloud • Cloud Storage – Data storage • Big Query – Data analysis • Google Compute Engine – Processing of large data
  • 12.
    Amazon • Amazon EC2 – Computation Service using VM • Amazon DynamoDB – Large scalable NoSQL databased – Fully distributed shared nothing architecture • Amazon Elastic MapReduce (Amazon EMR) – Hadoop based analysis engine – Can be used to analyse data from DynamoDB
  • 13.
    Trends • A movetoward large and scalable Virtual Infrastructure – Providing computing service – Providing basic storage service – Providing Scalable large database • NOSQL – Providing Analysis Service • All these services has to come together – Big data can not moved!
  • 14.
    Issues • Security – Will you let an important data being accumulate outside your organization? • If it is not an important data, why analyze them ? – Who own the data? If you discontinue the service, is the data being destroy properly. – Protection in multi-tenant environment • Big data can not be moved easily – Processing have to be near. Just can not ship data around • So you finally have to select the same cloud for your processing. Is it available, easy, fast? • New learning, development cost – Need new programming, porting? – Tools is mature enough?
  • 15.
    When to useBig data on the Cloud • When data is already on the cloud – Virtual organization – Cloud based SaaS Service • For startup – CAPEX to OPEX – No need to maintain large infra – Focus on scalability and pay as you go – Data is on the cloud anyway • For experimental project – Pilot for new services
  • 16.
    Summary • Big datais coming. – Big data are being accumulate anyway – Knowledge is power. • Better understand your customer so you can offer better service • Tools and Technology is available – Still being developed fast • Cloud is coming, why not doing big data on the cloud – Probably not today but soon
  • 17.

Editor's Notes

  • #3 The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media). Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage. More detailed structured data New unstructured data Device-generated data But big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as: Scale out storage MPP database architectures Hadoop and the Hadoop ecosystem In-database analytics In-memory computing Data virtualization Data visualization