Your SlideShare is downloading. ×
0
Cloud and Big Data
Sebastien Goasguen,
January 29th
@sebgoa
A view on Big Data
http://www.economist.com/node/15557443?story_id=15557443
SKA
How did we get there ?
A natural evolution
New Distributed systems for:
Large scale datasets
• From scientific instruments
• From Web apps logs

Complex datasets
• N...
BigData and map-reduce
• While BigData is often associated with HDFS,
Map-Reduce is the algorithm used to
parallelize data...
A really quick view on Clouds
Open Source IaaS
Today
BigData at
peak
History
2003 –Google File System
2005 – Hadoop
2006 – Hadoop enters ASF incubator (Feb)
2006 – S3 launched
2007 – Paper on...
The Apache Software Foundation
Apache Software Foundation
35 projects in incubation:
•
•
•

12 Hadoop related
~30% Big Data related
Spark

117 top level projects:
•
•
•
•
•

~16 cl...
Hadoop Ecosystem

+ Up-coming next generation BD
systems
Big Data and Cloud (Stack)s
Clouds and BigData
• Object store + compute IaaS to build EC2+S3
clone
• BigData solutions as storage backends for
image c...
EC2, S3 clone
• An open source IaaS with an EC2
wrapper e.g Opennebula
• Deploy a S3 compatible object store –
separately-...
Big Data
as IaaS backend
“Big Data” solutions can be used as secondary
storage
.
Example
• Open source IaaS + EC2 wrapper, e.g
CloudStack
• Deploy S3 compatible object store, e.g
riakCS or Ceph or gluste...
Even use Bare Metal
Big Data as a Workload to the Cloud
Mesos, Spark are EC2 native

o ec2_deploy.py
o ec2_deploy.sh
o…
Tools
“PaaS”
Dev Pipeline
Conclusions
• Big Data is “catching up”
• Tackle the big three head on:
• BigData, Cloud and DevOps
• Add a big data backe...
Still
behind !
Final Thoughts

Who manages my data transfers ?
Event
ApacheCON + CloudStack Collaboration
Conference
Denver April 7-11th.

Cloud and Big Data
Get Involved with Apache
CloudStack
Web: http://cloudstack.apache.org/
Mailing Lists: cloudstack.apache.org/mailing-lists....
Cloud and Big Data trends
Cloud and Big Data trends
Cloud and Big Data trends
Cloud and Big Data trends
Cloud and Big Data trends
Upcoming SlideShare
Loading in...5
×

Cloud and Big Data trends

722

Published on

A look at clouds and big data trends and history. While Big Data arrived first on the scene -looking at google file system, hadoop, dynamo- Cloud was first in the hyper cycle. Google trends show this clearly. Amazon AWS however has already deployed analytics services on the their cloud while open source IaaS solutions are still struggling to deliver a EC2 clone. Cloud and Big data has three common points: 1-use an EC2 clone and a S3 clone (riakCS, glusterfs etc) to build a cloud 2-Use a big data solutions as a backend to your cloud to provide EBS or large scale image catalogue 3-deploy big data solutions on your cloud with tools like apache whirr, pallet, and newer devops tool chains with vagrant and co.

Published in: Technology
2 Comments
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
722
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
41
Comments
2
Likes
6
Embeds 0
No embeds

No notes for slide
  • Walmart, 1m customer transactions every hour, db of 2.5 PB in 2010 http://www.economist.com/node/15557443?story_id=15557443
  • Square Kilometer Array 10-500 TB per second ….1 exabyte per dayFacebook June 2012, 100 PB hadoop cluster, ½ PB per day = 180 PB per year -> ~350 PB now ?CERN ~20 PB EOS
  • 250k cables war and peace 450k words, 260M worlds in cable gate = 500x war and peace
  • 200 Million pages, 4 TB
  • Transcript of "Cloud and Big Data trends"

    1. 1. Cloud and Big Data Sebastien Goasguen, January 29th @sebgoa
    2. 2. A view on Big Data
    3. 3. http://www.economist.com/node/15557443?story_id=15557443
    4. 4. SKA
    5. 5. How did we get there ?
    6. 6. A natural evolution
    7. 7. New Distributed systems for: Large scale datasets • From scientific instruments • From Web apps logs Complex datasets • Not necessarily large. Object stores • S3 clones
    8. 8. BigData and map-reduce • While BigData is often associated with HDFS, Map-Reduce is the algorithm used to parallelize data processing. • BigData ≠ Map-Reduce ≠ HDFS • Map-reduce is a way to express embarrassingly parallel work easily. • You can do Map-Reduce without HDFS. • e.g Basho map-reduce on riackCS
    9. 9. A really quick view on Clouds
    10. 10. Open Source IaaS
    11. 11. Today
    12. 12. BigData at peak
    13. 13. History 2003 –Google File System 2005 – Hadoop 2006 – Hadoop enters ASF incubator (Feb) 2006 – S3 launched 2007 – Paper on Amazon Dynamo 2009 – EMR launched 2013 – CloudStack as a ASF TLP (March) 2013 – Spark/Mesos enters ASF incubator
    14. 14. The Apache Software Foundation
    15. 15. Apache Software Foundation
    16. 16. 35 projects in incubation: • • • 12 Hadoop related ~30% Big Data related Spark 117 top level projects: • • • • • ~16 cloud or bigdata +10% Deltacloud, Libcloud, Whirr, jclouds Hadoop, couchdb, cassandra, mesos Bigtop, accumulo, lucene, UIMA CloudStack
    17. 17. Hadoop Ecosystem + Up-coming next generation BD systems
    18. 18. Big Data and Cloud (Stack)s
    19. 19. Clouds and BigData • Object store + compute IaaS to build EC2+S3 clone • BigData solutions as storage backends for image catalogue and large scale instance storage. • BigData solutions as workloads to CloudStack based clouds.
    20. 20. EC2, S3 clone • An open source IaaS with an EC2 wrapper e.g Opennebula • Deploy a S3 compatible object store – separately- e.g riakCS • Two independent distributed systems deployed Cloud = EC2 + S3
    21. 21. Big Data as IaaS backend “Big Data” solutions can be used as secondary storage .
    22. 22. Example • Open source IaaS + EC2 wrapper, e.g CloudStack • Deploy S3 compatible object store, e.g riakCS or Ceph or glusterFS • Use S3 as image store • Your EC2 service is a customer to your S3 service • Logstash + elasticsearch for logs/monitoring
    23. 23. Even use Bare Metal
    24. 24. Big Data as a Workload to the Cloud
    25. 25. Mesos, Spark are EC2 native o ec2_deploy.py o ec2_deploy.sh o…
    26. 26. Tools
    27. 27. “PaaS”
    28. 28. Dev Pipeline
    29. 29. Conclusions • Big Data is “catching up” • Tackle the big three head on: • BigData, Cloud and DevOps • Add a big data backend to your cloud from the start • Provide Big Data services on your cloud
    30. 30. Still behind !
    31. 31. Final Thoughts Who manages my data transfers ?
    32. 32. Event ApacheCON + CloudStack Collaboration Conference Denver April 7-11th. Cloud and Big Data
    33. 33. Get Involved with Apache CloudStack Web: http://cloudstack.apache.org/ Mailing Lists: cloudstack.apache.org/mailing-lists.html IRC: irc.freenode.net: 6667 #cloudstack #cloudstack-dev Twitter: @cloudstack LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859 If it didn’t happen on the mailing list, it didn’t happen.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×