9. 9
Pets, Cattle & Chicken
Big Data
Pets: pussinboots
Build to specs
& Maintain
Traditional
Enterprise IT
Cattle: node72
Deploy, Run, Add/Delete,
& Update
Largescale
Data Processing
Chicken: application[…]
Containerized Apps
Lightweight & Stateless
Elastic scalable
applications
10. 10
Pets
Big Data
The traditional server
Build to fulfil a particular task
Failing systems get healed ASAP
Single point(s)-of-failure
Periodic downtime inevitable
Typically managed manually
(sometimes assisted by scripts)
Domain of the sys-admin
11. 11
Cattle
Big Data
Just another node in a network
No single-point(s)-of-failure
Rolling upgrades
Downtime a thing from the past
Failing systems get deleted
Managed by automation
Domain of the system
(automation) engineers
23. 23
Recap: Service Levels
Big Data
Service Level obtained by ability to deploy on
any-cloud any-time (=extreme availability)
Challenges
Broadband networking: Minor, just tech
Vendor contestability: Verify
Cloud maturity: Probably some work
25. 25
A new approach
Big Data
Business Users
· Seek functional solution for a
particular “job”
· Formulate the question
IT solution
IT
· Defining requirements
· Technical feasibility
· Translation to technical
design
· Build process
· Integrate
IT platform
· Build for specific “jobs”
· Value-driven
· Rich in functionality
Business Users
Business
· Decides to use particular
functionality, or not
· Explores its uses
· Subscription based
Traditional
Specification driven
Traditional
Specification driven
Cloud
Functionality driven
Cloud
Functionality driven
Ideal for
(Big) Data Analytics!
26. 26
Recap: Paradigm Shifts
Big Data
Functionality abstracted from resource capacity
Dropping capacity costs enabler for endless new
possibilities
Continuous development the new standard
Controllable Service Levels
The Cloud delivery model: instant
30. 30
(2003) – Google File System (Ghemawat, Gobioff, Leung)
Distributed fault-tolerant file system
(2004) – MapReduce (Dean, Ghemawat)
Parallel programming model
Google’s solutions
Hadoop Fundamentals
31. 31
2005: Yahoo funds the development of a software framework for general
parallel computation tasks.
2006: Hadoop is founded as an open-source project under the Apache
Software Foundation banner.
Features:
Massive scalability on commodity hardware
Redundant, fault-tolerant storage of data
Job coordination for generic tasks
Hadoop Origins
Hadoop Fundamentals
33. 33
Distributed Filesystem
Redundant storage by replicating data n times
Optimized for streaming large files (write once, read many times)
Grow/Shrink on the fly
HDFS architecture
Hadoop Fundamentals
34. 34
Files are stored as collections of blocks.
Block size is configurable but static (default 128 Mb)
HDFS Blocks
Hadoop Fundamentals
(source: Hadoop for Dummies)
35. 35
Blocks are replicated n times througout the cluster
Replication strategy affected by cluster/rack layout
HDFS Blocks
Hadoop Fundamentals
(source: Hadoop for Dummies)
44. 44
Silos, Lakes & Rice-paddies
Big Data
Datasilos
Structured. Well
organized, but
incomplete.
Datalakes
“Put it all in Hadoop”
(To-be) Reality
Structured & Unstructured
data in lots of different
places.
50. 50
Big Data: Platform Provisioning
Big Data
Volume Intensive Applications
Hadoop HDFS/ Map & Reduce
NoSQL databases: HBASE
Compute & Memory
Intensive Applications
Controllers
Services
Orchestration
Monitoring
Authentication
Security
Loadbalancers
Network
Object
Storage
Other
NoSQL
HDFS
51. 51
Scalable platform
Big Data
Highly cost-efficient if
you recognize any of
these:
Most data is static
20% of data is needed in
80% of the time
Test on small (sub-)sets,
upscale when it works
Performance demand
fluctuates
Cons:
Cost of moving data (only an
issue if data grows truly large. E.g.
over 100 Terabyte per set)
55. 55
Think strategic about data!
Use multi tiered storage (archive = 1/10 cost) if possible
Don’t go for commodity-only up till 100TB
Create compute & big data zones in your infra
Use infra+applications fit for the task at hand
Build with scalability in mind
Ensure your platform is easily be kept up-to-date
Design with redeployment in mind (“cattle-/chicken like”)
Be Agile!
Create a Cloud strategy
Recap: Solving the Challenges
Big Data
57. 57
Data Analytics & - Processing
Big Data
Hadoop Platform as-a-
service
Instant deployment
Easy-to-use by higher
level applications
Elastic scalable
capacity
58. 58
Massive Logging @ Vancis
Big Data
Processing millions of
events per second
Easily scalable to much
more
Drastically shortening
time between system-
failures and diagnostics
Roadmap: self-healing
platforms
59. 59
Xomnia Webprofiler
Big Data
Processing up to million
events per second
Easily scalable to much
more
Secure: anti-DDOS +
filtering to Analytics
Platform
Roadmap (option):
realtime response to
webapplication