Cloud and Big Data trends

Cloud and Big Data
Sebastien Goasguen,
January 29th
@sebgoa

http://www.economist.com/node/15557443?story_id=15557443

New Distributed systems for:
Large scale datasets
• From scientific instruments
• From Web apps logs

Complex datasets
• Not necessarily large.

Object stores
• S3 clones

BigData and map-reduce
• While BigData is often associated with HDFS,
Map-Reduce is the algorithm used to
parallelize data processing.
• BigData ≠ Map-Reduce ≠ HDFS
• Map-reduce is a way to express
embarrassingly parallel work easily.
• You can do Map-Reduce without HDFS.
• e.g Basho map-reduce on riackCS

History
2003 –Google File System
2005 – Hadoop
2006 – Hadoop enters ASF incubator (Feb)
2006 – S3 launched
2007 – Paper on Amazon Dynamo
2009 – EMR launched
2013 – CloudStack as a ASF TLP (March)
2013 – Spark/Mesos enters ASF incubator

The Apache Software Foundation

35 projects in incubation:
•
•
•

12 Hadoop related
~30% Big Data related
Spark

117 top level projects:
•
•
•
•
•

~16 cloud or bigdata +10%
Deltacloud, Libcloud, Whirr, jclouds
Hadoop, couchdb, cassandra, mesos
Bigtop, accumulo, lucene, UIMA
CloudStack

Hadoop Ecosystem

+ Up-coming next generation BD
systems

Clouds and BigData
• Object store + compute IaaS to build EC2+S3
clone
• BigData solutions as storage backends for
image catalogue and large scale instance
storage.

• BigData solutions as workloads to CloudStack
based clouds.

EC2, S3 clone
• An open source IaaS with an EC2
wrapper e.g Opennebula
• Deploy a S3 compatible object store –
separately- e.g riakCS
• Two independent distributed systems
deployed

Cloud = EC2 + S3

Big Data
as IaaS backend
“Big Data” solutions can be used as secondary
storage
.

Example
• Open source IaaS + EC2 wrapper, e.g
CloudStack
• Deploy S3 compatible object store, e.g
riakCS or Ceph or glusterFS
• Use S3 as image store
• Your EC2 service is a customer to your
S3 service
• Logstash + elasticsearch for
logs/monitoring

Big Data as a Workload to the Cloud

Mesos, Spark are EC2 native

o ec2_deploy.py
o ec2_deploy.sh
o…

Conclusions
• Big Data is “catching up”
• Tackle the big three head on:
• BigData, Cloud and DevOps
• Add a big data backend to your cloud
from the start
• Provide Big Data services on your cloud

Final Thoughts

Who manages my data transfers ?

Event
ApacheCON + CloudStack Collaboration
Conference
Denver April 7-11th.

Cloud and Big Data

Get Involved with Apache
CloudStack
Web: http://cloudstack.apache.org/
Mailing Lists: cloudstack.apache.org/mailing-lists.html
IRC: irc.freenode.net: 6667 #cloudstack #cloudstack-dev
Twitter: @cloudstack
LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859
If it didn’t happen on the mailing list, it didn’t happen.

Cloud and Big Data trends

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Cloud and Big Data trends

Similar to Cloud and Big Data trends (20)

More from Sebastien Goasguen

More from Sebastien Goasguen (20)

Recently uploaded

Recently uploaded (20)

Cloud and Big Data trends

Editor's Notes