New Data Stack Workshop: Building a Scalable Cloud Datacenter

®

New Data Stack Workshop: Building a
Scalable Cloud Datacenter

Ping Li, Accel Partners
ping@accel.com

July 14, 2010
Stanford University

1

®

Delivering Cloud Computing

“Cloud Frame” Mainframe
Monitoring—Security
(RACF)
Monitoring—Performance
• Elasticity (Mainview)
Provisioning & Configuration
• Multi-app/user Management

• User-provisioned Virtualization
(z/VM)
• Portability Resource Scheduler
(z/VM & OS 370)

Performance Acceleration &
dedicated processors (OS 370)

Backup and DR Tivoli
Storage Manager, Parallel Sysplex
Private/Public Clustering, failover, and mirroring
(OS 370 & purpose built hw & microcode)

• Cloud data centers will share infrastructure layers common to
mainframes but redelivered for cloud capabilities
• “New Data Stack” will form foundation for cloud computing
Accel Partners Confidential 2

®

Data Explosion

Cloud New Data Stack
Application
Data

Business
Transaction
Data
Legacy Stack

• 2,500 exabytes of new information in 2012 with Internet/web as primary driver
• “Digital universe” grew by 62% last year to 800K petabytes and will grow to
1.2 zettabytes this year
Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.
.

®

“New Data” Trends

61% CAGR Data is growing faster than
Data processing power – leading to
42% CAGR
coping strategies like throwing
Transistors
away data or frequent
archiving to tape
Circa 1975 – Transaction Data Circa 2010 – Cloud Data

2,000 users = Huge 2,000 users = Tiny

Smaller data sets (bytes) Extremely large data sets (petabytes)

Highly structured, relatively small Unstructured, complex data blobs
data records (images, voice, logs, video) – doesn’t
fit nicely into rows/columns
Absolute consistency is the Application responsiveness/scale
primary requirement – ACID trumps immediate consistency
transactions
Source: Gartner.
.

®

New Data Stack Technologies

Legacy Cloud

Centralized/monolithic computing layer Distributed computing layer (virtual machines,
Distributed computing layer (virtual machines,
Centralized/monolithic computing layer
Map Reduce, networked commodity servers)
Map Reduce, networked commodity servers)
Computer networking limited
Computer networking limited
High speed networking is pervasive
High speed networking is pervasive
Relational databases
Relational databases
Non-relational/”no sql” data stores
Non-relational/”no sql” data stores
FC SAN/NAS
FC SAN/NAS
Distributed file systems
Distributed file systems
Disks/Tape (memory scarce/expensive)
Disks/Tape (memory scarce/expensive)
Flash/SSD (high performance and abundant)
Flash/SSD (high performance and abundant)
Proprietary/closed vendors
Proprietary/closed vendors
Open platforms
Open platforms
Enterprise-scale
Enterprise-scale
Internet/cloud scale
Internet/cloud scale


®

Agenda

1:15 pm Northscale
Sharon Barr, Vice President Engineering
James Phillips, Founder, Chief Product Officer
Dustin Sailings, Chief Architect
Bob Wiederhold, President, CEO

2:15 pm Cloudera
Amr Awadallah, CTO/Co-Founder
Jeff Hammerbacher, Chief Scientist/Co-Founder

3:15 pm Facebook
Bobby Johnson, Director, Software Engineering
Mark Rabkin, Software Engineer

4:15 pm Fusion-io
Robert Wipfel, Fellow

5:30 pm Cocktails!


Elastic Data Management Software
for web applications and cloud computing environments

The opportunity.

“ Relational database technology has served us well for 40 years, and will likely continue to
do so for the foreseeable future to support transactions requiring ACID guarantees. But a
large, and increasingly dominant, class of software systems and data do not need those
guarantees. Much of the data manipulated by Web applications have less strict
transactional requirements but, for lack of a practical alternative, many IT teams continue
to use relational technology, needlessly tolerating its cost and scalability limitations. For
these applications and data, distributed key-value cache and database technologies such
as NorthScale provide a promising alternative. ”
Carl Olofson
Research Vice President
Database Management Software Research
IDC

Modern interactive software architecture

To support more users …

… simply add more
commodity web servers
(or virtual machines)
behind a load balancer …

… but you must get a
bigger, more complex
database server.

3

Application scales linearly, data hits a wall

Application Scales Out
Just add more commodity web servers

Database Scales Up
Get a bigger, more complex server

4

What’s driving the curves?

RDBMS NorthScale
RDBMS NorthScale Schema committee
750 OPS 750 OPS Shard if needed
Add new table(s)
$7,500 3x $2,500 Re-normalize
Create indices
RDBMS NorthScale 15,000 OPS 15,000 OPS Update views
Tune performance
750 OPS 15,000 OPS $125,000 10x $12,500 Insert and select. Set and get.

1. 2. 3.
Transaction overhead. Expensive hardware. Complex administration.
Same hardware, over an order of magnitude More costly to start with, and the cost differential RDBMS technology is extremely complex and
difference in supportable user base. widens with growth. expensive to administer.

5

Billions in data management savings available

Alternative database
technology needed

Relational database
technology ideal

RDBMS ideal for intended purpose, will continue to be appropriate for
debit-credit data – costly overkill for most new data
Relational database technology was $18.8 billion market in 2007 (IDC) 6

Big leap from relational database to alternatives

Where do I start? What data should I move first? Which alternative
database technology will “win”? This looks really complicated.
7

NorthScale solution.

“ I can’t tell you how many email requests I’ve received
from our developers asking for something that is as
simple and fast as memcached, but that promises data
durability. Cassandra is just far too complex and
heavyweight and we won’t be doing any more deployments.
NorthScale is definitely on to something here. ”

Director of Engineering
Leading Social Network

Before: Where you are today

Relational database technology powers 99.999% of web applications.
9

Step 1: Cache relational data in memcached

NorthScale Memcached Servers

Relational Database

Memcached is simple, fast and infinitely scalable. It is easy to adopt,
and delivers immediate cost, performance and scalability benefits.
10

Step 2: Gradually migrate data to membase

NorthScale Memcached Servers NorthScale Membase Servers

Relational Database

11

After: Elastic compute and data layers
Data layer now scales with linear cost and constant performance.

Application Scales Out
Just add more commodity web servers

Database Scales Out
Just add more commodity data servers

Scaling out flattens the cost and performance curves. 12

An evolutionary path toward elastic data

13

Membase is an elastic key-value database

Application user

Web application server

Membase data servers

In the data center On the administrator console

15

Membase is Simple, Fast, Elastic

Five minutes or less to a working
cluster
• Downloads for Linux and Windows
• Start with a single node
• One button press joins nodes to a cluster
Easy to develop against
• Just SET and GET – no schema required
• Drop it in. 10,000+ existing applications
already “speak membase” (via memcached)
• Practically every language and application
framework is supported, out of the box
Easy to manage
• One-click failover and cluster rebalancing
• Graphical and programmatic interfaces
• Configurable alerting

16


Predictable
• “Never keep an application waiting”
• Quasi-deterministic latency and throughput
Low latency
• Auto-migration of hot data to lowest latency
storage technology (RAM, SSD, Disk)
• Selectable write behavior – asynchronous,
synchronous (on replication, persistence)
• Back-channel rebalancing [FUTURE]
High throughput
• Multi-threaded
• Low lock contention
• Asynchronous wherever possible
• Automatic write de-duplication
17


Scale out
• Spread I/O and data across commodity
servers (or VMs)
• Consistent performance with linear cost
• Dynamic rebalancing of a live cluster
All nodes are created equal
• No special case nodes
• Clone to grow
Extensible
• Filtered TAP interface provides hook points
for external systems (e.g. full-text search,
backup, warehouse)
• Data bucket – engine API for specialized
container types
• Membase NodeCode [FUTURE]

18

vBucket mapping

Key vBucket vBucket Servers
(hash function) (table lookup)

All possible vBucket‐Server Map ‐ Example
membase keys vBuckets Host Server/Replica Servers vBuckets Host Server/Replica Servers

Key1
vBucket1 ServerA / ServerB, ServerC
Key2
vBucket1 Server1 / Server2, Server3 vBucket2 ServerA / ServerB, ServerC
Key3
Key4 vBucket3 ServerB / ServerA, ServerC
Key5 vBucket4 ServerB / ServerA, ServerC
Key6
vBucket2 Server1 / Server2, Server3 vBucket5 ServerC / ServerA, ServerB
Key7
Key8 vBucket6 ServerC / ServerA, ServerB

Key9
Key10
vBucket3 Server2 / Server3, Server4

Keym vBucketn Serverp / Serverq, Serverr

19

Deployment options

Membase Server Membase Server Membase Server
OTC Memcached Server Embedded proxy Standalone proxy “vBucket-aware” client

cluster operations cluster operations cluster operations

data operations data operations data operations
data operations

proxy vbucket proxy vbucket proxy vbucket
map map map

11211 11210 11211 11210 11211 11210 11211

vbucket
proxy map
OTC server OTC server
memcached memcached OTC NEW vbucket
list list memcached localhost memcached
client client map
client client

application application
logic logic application application
logic logic

Deployment Option 1 Deployment Option 2 Deployment Option 3

20

Membase “write” data flow – application view

User action results in the need
to change the VALUE of KEY
1

Application updates key’s VALUE,
2 performs SET operation

4 Membase (memcached) client hashes
3 KEY, identifies KEY’s master server
SET request sent over
network to master server

5
Membase replicates KEY-VALUE pair,
caches it in memory and stores it to disk
21

Membase data flow – under the hood

SET request arrives at SET acknowledgement
KEY’s master server
1 5 returned to application

Listener‐Sender
2 2 Listener‐Sender
Listener-Sender
RAM* RAM* RAM*

membase storage engine

3

SSD SSD SSD SSD SSD SSD
SSD SSD SSD
SSD SSD SSD

Disk Disk Disk
4 Disk Disk Disk
Disk Disk Disk
Disk Disk Disk

Replica Server 1 for KEY Master server for KEY Replica Server 2 for KEY
22

Membase Architecture
11211 11210
memcapable 1.0 memcapable 2.0

moxi

REST management API/Web UI

vBucket state and replication manager
Global singleton supervisor

Rebalance orchestrator
Configuration manager
memcached

Node health monitor
Process monitor
protocol listener/sender

Heartbeat
Data Manager Cluster Manager
engine interface

membase storage engine http on each node one per cluster

Erlang/OTP

HTTP erlang port mapper distributed erlang
8080 4369 21100 – 21199

Membase Architecture
11211 11210
memcapable 1.0 memcapable 2.0

moxi

vBucket state and replication manager
REST management API/Web UI

Global singleton supervisor

Rebalance orchestrator
Configuration manager
memcached

Node health monitor
Process monitor
protocol listener/sender

Heartbeat
engine interface

membase storage engine http on each node one per cluster

Erlang/OTP

HTTP erlang port mapper distributed erlang
8080 4369 21100 – 21199

Data buckets are secure membase “slices”

Application user

Web application server

Bucket 1
Bucket 2
Aggregate Cluster Memory and Disk Capacity

Membase data servers

In the data center On the administrator console
25

NorthScale in production

Leading cloud service (PAAS) Social game leader – FarmVille,
provider Mafia Wars, Café World
Over 65,000 hosted applications Over 230 million monthly users
NorthScale Memcached Server NorthScale Membase Server
serving over 1,200 Heroku is the 500,000 ops-per-second
customers (as of June 10, 2010) database behind FarmVille and
Café World
26

Evolving a New Analytical Platform
What Works and What’s Missing

Jeff Hammerbacher
Chief Scientist, Cloudera
July 14, 2010

Wednesday, July 14, 2010

My Background
Thanks for Asking
▪ hammer@cloudera.com
▪ Studied Mathematics at Harvard
▪ Worked as a Quant on Wall Street
▪ Conceived, built, and led Data team at Facebook
▪ Nearly 30 amazing engineers and data scientists
▪ Several open source projects and research papers
▪ Founder of Cloudera
▪ Chief Scientist
▪ Also, check out the book “Beautiful Data”


Presentation Outline
▪ 1. Defining the Platform
▪ BI: Science for Profit
▪ Need tools for whole research cycle
▪ SQL Server 2008 R2: defining the platform
▪ 2. State of the Platform Ecosystem
▪ 3. Foundations for a New Implementation
▪ Hadoop
▪ Boiling the Frog
▪ 4. Future Developments
▪ Questions and Discussion


1. Deﬁning the Platform


BI is looking more like science (for proﬁt)


Jim Gray: Science entering Fourth Paradigm
“We have to do better at producing tools to
support the whole research cycle”


RDBMS only a small part of this tool set


Example: SQL Server 2008 R2


RDBMS: SQL Server


ETL: SQL Server Integration Services
RDBMS: SQL Server


RDBMS: SQL Server
Reporting: SQL Server Reporting Services


RDBMS: SQL Server
Analysis: SQL Server Analysis Services


RDBMS: SQL Server
Search: Full-Text Search


CEP: StreamInsight
RDBMS: SQL Server


CEP: StreamInsight
RDBMS: SQL Server
OLAP: PowerPivot


MDM: Master Data Services
CEP: StreamInsight
RDBMS: SQL Server
OLAP: PowerPivot


Collaboration: SharePoint
MDM: Master Data Services
CEP: StreamInsight
RDBMS: SQL Server
OLAP: PowerPivot


What do we call this uniﬁed suite?


For today: Analytical Data Platform


For today: Analytical Data Platform
LAMP Stack for Analytical Data Management


2. The State of the Platform Ecosystem


Who makes up the platform ecosystem?


Platform Providers


Infrastructure Providers
Platform Providers


Platform Providers
Application Developers


Content Providers
Platform Providers


Content Providers
Platform Providers
End Users


What is new about the ecosystem today?


Content Providers
1. > 95% of enterprise data is unstructured
2. Data volumes growing rapidly


1. Cloud
2. Warehouse-Scale Computers


Platform Providers
1. Open source
2. Driven by consumer web properties


1. Data Scientists
2. Diversity of languages


End Users
1. Browser is the client
2. Tell a story about the business


3. Foundations for a New Implementation


New foundations: HDFS and MapReduce


2005: Doug/Mike start project inside Nutch


2006: Doug joins Yahoo!


2007: Make Hadoop scale


Yahoo! makes Pig open source


Jim Gray’s “Fourth Paradigm” lecture


Randy Bryant’s “DISC” lecture


Randy Bryant’s “DISC” lecture
Powerset makes HBase open source


2008: Make Hadoop fast


Yahoo! wins Daytona terabyte sort benchmark


First Hadoop Summit


First Hadoop Summit
Yahoo! builds production webmap with Hadoop


Facebook makes Hive open source
First Hadoop Summit


“MapReduce: A Major Step Backwards”
Facebook makes Hive open source
First Hadoop Summit


2009: Insert Hadoop into the enterprise


Cloudera releases CDH


First Hadoop World NYC


Yahoo! sorts a petabyte with Hadoop


Cloudera adds training, support, services


“The Unreasonable Effectiveness of Data”
Cloudera adds training, support, services


2010: Integrate Hadoop into the enterprise


IBM announces InfoSphere BigInsights


Yahoo! completes enterprise-class security


Datameer and Karmasphere funded


Quest, Talend, Netezza, and more integrate


Hive adds JDBC and ODBC
Quest, Talend, Netezza, and more integrate


Hadoop will be an Analytical Data Platform


4. Future Developments


Capture: Log collection and CEP


Curate: Workﬂow and Scheduling


Curate: Secondary and Full-Text Indexing


Curate: Learn Structure from Data


Analyze: Mesos-enabled frameworks


Analyze: Link working set and historical data


All behind a single user interface


HUE
Making Many Computers Feel Like One


!"#$%&'()* !"#$%"&'$"()*+(%*,-.((/0*12%#"()*30*"#*$42*
2)$2%/%"#2*(/2)*#('%52*/6-$+(%7*+(%*5(7/628*.-$-
! !"#$%&'#$()! '**)+,-.,"$"#/)0)12"+#3,"/)3"#$&,.$&'#$)43#5),"$)
"#$%&'()%&($*+&),%"#-"(-)./01,
! 63-.*313$()! 7*,2($&')-'"'%$/)
&$823&$()+,-.,"$"#)9$&/3,"/)
0)($.$"($"+3$/
! :.$")/,2&+$)! ;<<=)>.'+5$)
*3+$"/$(
! ?$*3'@*$)! .'#+5$()43#5)13A$/)
1&,-)12#2&$)&$*$'/$/)#,)
3-.&,9$)/#'@3*3#B
! 62..,&#$()! 7*,2($&')$-.*,B/)CD<=),1)#5$).&,E$+#)1,2"($&/)'"()
'#)*$'/#),"$)+,--3##$&)1,&)CF<=),1)#5$/$),.$")/,2&+$)
+,-.,"$"#/G


ioMemory for Scale-out

Robert Wipfel, Fellow
rwipfel@fusionio.com

14th July, 2010, Accel Partners Panel Discussion

Factors impacting Scale-out

Balance
• CPU
• Disk
• Network

Energy Contention
• Servers • Sharing
• RAM • Locking
• Disks

Management and Monitoring

Graceful Recovery Throughput
• No SPOFs • IOPS
• Fast Replay • Bandwidth

Latency
• Distributed
• Dependencies

What’s *really* Needed…

DRAM Disk Need

Want Want Want
•  Really fast •  Non-volatile •  Non-volatile
Don’t Want •  Cheap •  Really fast
•  Volatile •  Large capacity •  Large capacity
•  Expensive Don’t Want •  Reasonable price
•  Limited capacity •  Really slow •  Low energy

Solution: ioMemory

A disruption called ioMemory
•  High speed like DRAM
•  Persistence and capacity of disks

PCIe based NAND Flash Storage
•  Very high IOPS
•  Micro-second latency
•  Very high data throughput

Why is it called ioMemory?

SAN, NAS, RAIDed DAS
ioMemory

SSDs
DRAM

50µs

5
orders
ooof
of

3
f

6
orders
rders

(10E-‐6)

magnitude

magnitude

magnitude

L3
L2
L1

Nanosecond (10E-9) ACCESS DELAY IN TIME Millisecond (10E-3)

ioMemory Performance

Raw Storage Performance Application Performance

H2benchw 3.6: IOMeter Database Benchmark I/O:
Interface Bandwidth MB/s Average Throughput MB/s

Fusion-io ioDrive Fusion-io ioDrive
Maximum Write Maximum Write
24 GB, Flash, PCIe x4 24 GB, Flash, PCIe x4

Fusion-io ioDrive Fusion-io ioDrive
Improved Write Improved Write
40 GB, Flash, PCIe x4 40 GB, Flash, PCIe x4

2x Faster
Fusion-io ioDrive
Maximum Capacity
80 GB, Flash, PCIe x4
50x Faster
Fusion-io ioDrive
Maximum Capacity
80 GB, Flash, PCIe x4

Storage I/O
SSD SATA Vendor A
3.0Gbps 2.5 RAID 0
Application I/O
SSD SATA Vendor A
3.0Gbps 2.5 RAID 0
128 GB, Flash SATA/300 128 GB, Flash SATA/300

SSD SATA Vendor B
SSD SATA Vendor C
3.0Gbps 2.5 RAID 0
64 GB, Flash SATA/300 32 GB, Flash SATA/300

SSD SATA Vendor C SSD SATA Vendor B
32 GB, Flash SATA/300 3.0Gbps 2.5 RAID 0
64 GB, Flash SATA/300

7/14/10

ioMemory Reliability

Strong ECC
Wear leveling
Bad block
re-mapping
Data labeling
Parity-
protected
pipelines Power cut
protection

PCI bus
Flashback protection
Chip protection Checksums
MTBF = 2 Million Hours + Poison bit

ioMemory is not a Solid State Disk

SSD
Application CPU RAID Controller SSD
3a
3

4
4a

51 2

9
6 8
5
3b SSD

4b

ioMemory
Application CPU ioMemory

1

2

ioMemory is Green

133,493 kWh/yr
K I L O W A T T S

3,013 kWh/yr

97 kWh/yr

ioDrive SSD 15,000 RPM
Fusion-io ZeusIOPS FC HDD

Case Study

One of the world’s fastest growing Webmonsters
•  Over 900% more database queries per second

•  Dramatically improved server replication for most current data

•  Over 800% improvement to disaster recovery back-up time

•  Cut server footprint, power costs, and IT overhead by 75%

•  Full and immediate ROI on repurposed servers with

•  Continued ROI on operational cost saving

Case Study

Internet security company that protects over 1 billion inboxes

•  5x improvement to
•  Database replication performance
•  Data intensive query response
•  Analysis routines

•  Eliminating 210 failure points from system

•  Implemented full system redundancy

•  Dramatically lowered power and cooling expenses

Disruption

By deploying ioMemory… Cloudmark eliminated the need for this…

Other Customer Examples

Does a 30 to 1 box reduction for their reliable messaging system

HMO achieves a 200 HDD to 1 ioDrive reduction for their Data Warehouse

Department of Defense takes NASTRAN from 3-days to 6-hours

Stock exchange doubles the performance of their trading systems

Shows a 35x performance increase of unstructured search at OracleWorld

Demos Dynamics NAV can get a 4x performance improvement

ioMemory Products

80 GB
•  119,790 (4k read packet size)
•  89,549 (75/25 r/w mix 4k packet size)

160 GB 320 GB
•  116,046 (4k read packet size) •  185,022 (4k read packet size)
•  93,199 (75/25 r/w mix 4k packet size) •  129,699 (75/25 r/w mix 4k packet size)

320 GB 640 GB
•  71,256 (4k read packet size) •  122,601 (4k read packet size)
•  67,659 (75/25 r/w mix 4k packet size) •  121,008 (75/25 r/w mix 4k packet size)

OEM Partners

Conﬁden8al
Informa8on:
Fusion-‐io
19

New Data Stack Workshop: Building a Scalable Cloud Datacenter

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to New Data Stack Workshop: Building a Scalable Cloud Datacenter

Similar to New Data Stack Workshop: Building a Scalable Cloud Datacenter (20)

Recently uploaded

Recently uploaded (20)

New Data Stack Workshop: Building a Scalable Cloud Datacenter