Cloud and Big Data trends

  • 574 views
Uploaded on

A look at clouds and big data trends and history. While Big Data arrived first on the scene -looking at google file system, hadoop, dynamo- Cloud was first in the hyper cycle. Google trends show this …

A look at clouds and big data trends and history. While Big Data arrived first on the scene -looking at google file system, hadoop, dynamo- Cloud was first in the hyper cycle. Google trends show this clearly. Amazon AWS however has already deployed analytics services on the their cloud while open source IaaS solutions are still struggling to deliver a EC2 clone. Cloud and Big data has three common points: 1-use an EC2 clone and a S3 clone (riakCS, glusterfs etc) to build a cloud 2-Use a big data solutions as a backend to your cloud to provide EBS or large scale image catalogue 3-deploy big data solutions on your cloud with tools like apache whirr, pallet, and newer devops tool chains with vagrant and co.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
574
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
36
Comments
2
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Walmart, 1m customer transactions every hour, db of 2.5 PB in 2010 http://www.economist.com/node/15557443?story_id=15557443
  • Square Kilometer Array 10-500 TB per second ….1 exabyte per dayFacebook June 2012, 100 PB hadoop cluster, ½ PB per day = 180 PB per year -> ~350 PB now ?CERN ~20 PB EOS
  • 250k cables war and peace 450k words, 260M worlds in cable gate = 500x war and peace
  • 200 Million pages, 4 TB

Transcript

  • 1. Cloud and Big Data Sebastien Goasguen, January 29th @sebgoa
  • 2. A view on Big Data
  • 3. http://www.economist.com/node/15557443?story_id=15557443
  • 4. SKA
  • 5. How did we get there ?
  • 6. A natural evolution
  • 7. New Distributed systems for: Large scale datasets • From scientific instruments • From Web apps logs Complex datasets • Not necessarily large. Object stores • S3 clones
  • 8. BigData and map-reduce • While BigData is often associated with HDFS, Map-Reduce is the algorithm used to parallelize data processing. • BigData ≠ Map-Reduce ≠ HDFS • Map-reduce is a way to express embarrassingly parallel work easily. • You can do Map-Reduce without HDFS. • e.g Basho map-reduce on riackCS
  • 9. A really quick view on Clouds
  • 10. Open Source IaaS
  • 11. Today
  • 12. BigData at peak
  • 13. History 2003 –Google File System 2005 – Hadoop 2006 – Hadoop enters ASF incubator (Feb) 2006 – S3 launched 2007 – Paper on Amazon Dynamo 2009 – EMR launched 2013 – CloudStack as a ASF TLP (March) 2013 – Spark/Mesos enters ASF incubator
  • 14. The Apache Software Foundation
  • 15. Apache Software Foundation
  • 16. 35 projects in incubation: • • • 12 Hadoop related ~30% Big Data related Spark 117 top level projects: • • • • • ~16 cloud or bigdata +10% Deltacloud, Libcloud, Whirr, jclouds Hadoop, couchdb, cassandra, mesos Bigtop, accumulo, lucene, UIMA CloudStack
  • 17. Hadoop Ecosystem + Up-coming next generation BD systems
  • 18. Big Data and Cloud (Stack)s
  • 19. Clouds and BigData • Object store + compute IaaS to build EC2+S3 clone • BigData solutions as storage backends for image catalogue and large scale instance storage. • BigData solutions as workloads to CloudStack based clouds.
  • 20. EC2, S3 clone • An open source IaaS with an EC2 wrapper e.g Opennebula • Deploy a S3 compatible object store – separately- e.g riakCS • Two independent distributed systems deployed Cloud = EC2 + S3
  • 21. Big Data as IaaS backend “Big Data” solutions can be used as secondary storage .
  • 22. Example • Open source IaaS + EC2 wrapper, e.g CloudStack • Deploy S3 compatible object store, e.g riakCS or Ceph or glusterFS • Use S3 as image store • Your EC2 service is a customer to your S3 service • Logstash + elasticsearch for logs/monitoring
  • 23. Even use Bare Metal
  • 24. Big Data as a Workload to the Cloud
  • 25. Mesos, Spark are EC2 native o ec2_deploy.py o ec2_deploy.sh o…
  • 26. Tools
  • 27. “PaaS”
  • 28. Dev Pipeline
  • 29. Conclusions • Big Data is “catching up” • Tackle the big three head on: • BigData, Cloud and DevOps • Add a big data backend to your cloud from the start • Provide Big Data services on your cloud
  • 30. Still behind !
  • 31. Final Thoughts Who manages my data transfers ?
  • 32. Event ApacheCON + CloudStack Collaboration Conference Denver April 7-11th. Cloud and Big Data
  • 33. Get Involved with Apache CloudStack Web: http://cloudstack.apache.org/ Mailing Lists: cloudstack.apache.org/mailing-lists.html IRC: irc.freenode.net: 6667 #cloudstack #cloudstack-dev Twitter: @cloudstack LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859 If it didn’t happen on the mailing list, it didn’t happen.