Your SlideShare is downloading. ×
  • Like
  • Save
Big Data on The Cloud
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Big Data on The Cloud

  • 531 views
Published

My talk at Hitachi Event

My talk at Hitachi Event

Published in Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
531
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The sources of information are expanding. Many new sources are machine generated. It’s also big files (siesmic scans can be 5TB per file) and massive numbers of small files (email, social media). Leading companies for decades have always sought to leverage new sources of data, and the insights that can be gleaned from those data sources, as new sources of competitive advantage. More detailed structured data New unstructured data Device-generated data But big data isn’t only about data, a comprehensive big data strategy also needs to consider the role and prominence of new, enabling-technologies such as: Scale out storage MPP database architectures Hadoop and the Hadoop ecosystem In-database analytics In-memory computing Data virtualization Data visualization

Transcript

  • 1. Big data on the CloudDr. Putchong UthayopasDepartment of Computer Engineering, Faculty ofEngineering, Kasetsart University.Email: pu@ku.ac.th
  • 2. We are living in the world of Data Video Surveillan ce Social MediaMobileSensors Smart Grids Geophysi Medical Imaging Gene Sequencing cal Explorati on
  • 3. Why now?• Internet create an ability to gather all data together at the scale never be seen before. – Data from human – Data from Sensor• Crowd Sourcing is now being practice – User generated data is flooding the world• New device and tools make it easy to generate data
  • 4. Big Data“Big data is data that exceeds the processingcapacity of conventional database systems. Thedata is too big, moves too fast, or doesn’t fit thestrictures of your database architectures. To gainvalue from this data, you must choose analternative way to process it.” Reference: “What is big data? An introduction to the big data landscape.”, Edd Dumbill, http://radar.oreilly.com/2012/01/what-is-big-data.html
  • 5. Amazon View of Big Data Big data refers to a collection of tools, Big data refers to a collection of tools,techniques and technologies which make ititeasy techniques and technologies which make easyto work with data at any scale. These distributed, to work with data at any scale. These distributed, scalable tools provide flexible programming scalable tools provide flexible programming models to navigate and explore data of any models to navigate and explore data of any shape and size, from a variety of sources. shape and size, from a variety of sources.
  • 6. 3 Characteristics of Big Data
  • 7. Big Data Challenge
  • 8. Information as an Asset• Cloud will enable larger and larger data to be easily collected and used• People will deposit information into the cloud – Bank, personal ware house• New technology will emerge – Larger and scalable storage technology – Innovative and complex data analysis/visualization for multimedia data – Security technology to ensure privacy• Cloud will be mankind intelligent and memory!
  • 9. Google Cloud Platform• App engines – mobile and web app• Cloud SQL – MySQL on the cloud• Cloud Storage – Data storage• Big Query – Data analysis• Google Compute Engine – Processing of large data
  • 10. Amazon• Amazon EC2 – Computation Service using VM• Amazon DynamoDB – Large scalable NoSQL databased – Fully distributed shared nothing architecture• Amazon Elastic MapReduce (Amazon EMR) – Hadoop based analysis engine – Can be used to analyse data from DynamoDB
  • 11. Trends• A move toward large and scalable Virtual Infrastructure – Providing computing service – Providing basic storage service – Providing Scalable large database • NOSQL – Providing Analysis Service• All these services has to come together – Big data can not moved!
  • 12. Issues• Security – Will you let an important data being accumulate outside your organization? • If it is not an important data, why analyze them ? – Who own the data? If you discontinue the service, is the data being destroy properly. – Protection in multi-tenant environment• Big data can not be moved easily – Processing have to be near. Just can not ship data around • So you finally have to select the same cloud for your processing. Is it available, easy, fast?• New learning, development cost – Need new programming, porting? – Tools is mature enough?
  • 13. When to use Big data on the Cloud• When data is already on the cloud – Virtual organization – Cloud based SaaS Service• For startup – CAPEX to OPEX – No need to maintain large infra – Focus on scalability and pay as you go – Data is on the cloud anyway• For experimental project – Pilot for new services
  • 14. Summary• Big data is coming. – Big data are being accumulate anyway – Knowledge is power. • Better understand your customer so you can offer better service• Tools and Technology is available – Still being developed fast• Cloud is coming, why not doing big data on the cloud – Probably not today but soon
  • 15. Thank you