BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14

© 2013 IBM Corporation1
BigData processing in the cloud – Guest Lecture -
University of Applied Sciences Rapperswil - 29.4.14
Romeo Kienzler
IBM Innovation Center
Source: http://res.sys-con.com/story/oct12/2398990/Cloud_BigData_468.jpg

What is BIG data?

What is BIG data?
Big Data
Hadoop

What is BIG data?
Business Intelligence
Data Warehouse

Map-Reduce → Hadoop → BigInsights

BigData UseCases
●
Google Index
●
40 X 10^9 = 40.000.000.000 => 40 billion pages indexed
●
Will break 100 PB barrier soon
●
Derived from MapReduce
●
now “caffeine” based on “percolator”
●
Incremental vs. batch
●
In-Memory vs. disk

BigData UseCases
●
CERN LHC
●
25 petabytes per year
●
Facebook
●
Hive Datawarehouse
●
300 PB, growing 600 TB / d
●
> 100 k servers
●
Genomics
●
Enterprises
●
Data center analytics (Logflies, OS/NW monitors, ...)
●
Predictive Maintenance, Cybersecurity
●
Social Media Analytics
●
DWH offload
●
Call Detail Record (CDR) data preservation
http://www.balthasar-glaettli.ch/vorratsdaten/

BigData Analytics

BigData Analytics – Predictive Analytics
"sometimes it's not
who has the best
algorithm that wins;
it's who has the most
data."
(C) Google Inc.
The Unreasonable Effectiveness of Data¹
¹http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-cp/reading/TheUnreasonable%20EffectivenessofData_IEEE_IS2009.pdf
No Sampling => Work with full dataset => No p-Value/z-Scores anymore

Data Parallelism

Aggregated Bandwith between CPU, Main
Memory and Hard Drive
1 TB (at 10 GByte/s)
- 1 Node - 100 sec
- 10 Nodes - 10 sec
- 100 Nodes - 1 sec
- 1000 Nodes - 100 msec

Fault Tolerance / Commodity Hardware
AMD Turion II Neo N40L (2x 1,5GHz / 2MB / 15W), 8 GB RAM,
3TB SEAGATE Barracuda 7200.14
< CHF 500
 100 K => 200 X (2, 4, 3) => 400 Cores, 1,6 TB RAM, 200 TB HD
 MTBF ~ 365 d > 1,5 d
Source: http://www.cloudcomputingpatterns.org/Watchdog

HDFS – Hadoop File System

Map-Reduce
Source: http://www.cloudcomputingpatterns.org/Map_Reduce

What role is the cloud playing here?

“Elastic” Scale-Out
Source: http://www.cloudcomputingpatterns.org/Continuously_Changing_Workload

of

of
CPU Cores

of
CPU Cores Storage

of
CPU Cores Storage Memory

linear
Source: http://www.cloudcomputingpatterns.org/Elastic_Platform

BigData Scale-Out
How do Databases Scale-Out?

BigData Scale-Out

Shared Disk Architectures

Shared Nothing Architectures

Born on the cloud Databases
Source: http://www.constructioncloudcomputing.com/wp-content/uploads/2010/10/dreamstime_7360880-480x300.jpg
Source: http://www.cloudcomputingpatterns.org/Execution_Environment

Google AppEngine
Google App Engine is a Platform as a Service (PaaS) offering that lets
you build and run applications on Google’s infrastructure. App Engine
applications are easy to build, easy to maintain, and easy to scale as
your traffic and data storage needs change. With App Engine, there are
no servers for you to maintain. You simply upload your application and
it’s ready to go.
Source: http://www.cloudcomputingpatterns.org/Platform_as_a_Service_%28PaaS%29

Google AppEngine Database Services

IBM BlueMix
BlueMix is a Platform as a Service Cloud,
based on Cloud Foundry, employing Enterprise
grade services enriched with IBM Software and
hosted at SOFTLAYER

IBM BlueMix, a Cloudfoundry runtime
Linux VM
Linux VM
Code
Runtime
Framework+
Droplet
Linux VM
Container Container Container
SQL
Push
SSO
Services:
...
DropletDroplet

●
Summary
●
BigData is born on the cloud
●
Cloud facilitates resource provisioning, configuration and deployment
●
Highly innovative area
●
Technology
●
UseCases
●
Links
●
http://en.wikipedia.org/wiki/MapReduce
●
http://www.se-radio.net/2013/12/episode-199-michael-stonebraker/
●
Sign up for the free BlueMix beta
●
http://bluemix.net
●
Come to the BlueMix Days
●
http://bit.ly/1lsIY8J
●
Use our software
●
Biginsights:
http://www.ibm.com/software/data/infosphere/biginsights/quick-start/

BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14

Similar to BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14 (20)

More from Romeo Kienzler

More from Romeo Kienzler (20)

Recently uploaded

Recently uploaded (20)

BigData processing in the cloud – Guest Lecture - University of Applied Sciences Rapperswil - 29.4.14