SlideShare a Scribd company logo
1 of 31
Download to read offline
®
© 2014 MapR Technologies 1
®
© 2014 MapR Technologies
Zeta Architecture
Ted Dunning– Chief Application Architect
Hive London – May 20, 2015
®
© 2014 MapR Technologies 2
Agenda
•  Current State
–  History
–  Moving Forward
•  The Next Enterprise Architecture
•  Business Implications
•  Concrete Implementations
®
© 2014 MapR Technologies 3© 2014 MapR Technologies
®
Current State
®
© 2014 MapR Technologies 4
Study History to Prepare for the Future
•  A data center was built
•  The servers were statically
partitioned
•  If we want to break the cycle
we have to break the
partitions and become
dynamic
®
© 2014 MapR Technologies 5
Understanding the Why’s
•  Isolation of resources
–  Assists in troubleshooting
–  Prevents the analytics team from impacting production
•  Maximum throughput of an application
–  Guaranteed volume (maximum): compute, memory and storage
•  Business Continuity
–  We know exactly what is backed up, when, and where
–  Difficult to perfect and to test
®
© 2014 MapR Technologies 6
Issues with Isolated Workloads
•  Segregated servers lead to under utilized hardware
–  Wasted capacity and energy
•  Complicated processes to move data to the required processing
servers
–  Operational impact, including extra monitoring
–  Time delays moving data (not real-time)
–  Troubleshooting time when there are issues
•  Difficult to thoroughly test DEV vs. QA vs. Production
–  Environments have different shapes and sizes
–  They will not have identical configurations
®
© 2014 MapR Technologies 7
Goals Moving Forward
•  Leverage all existing hardware
•  Create isolation in a different way
•  Improve production operational processes
•  Fix process of moving from DEV to QA to Production
•  Support real-time business continuity
®
© 2014 MapR Technologies 8© 2014 MapR Technologies
®
The Next “Last” Enterprise Architecture
®
© 2014 MapR Technologies 9
The Next Generation Enterprise Architecture
•  Dynamic compute resources
•  Common storage platform
•  Real-time application support
•  Flexible programming models
•  Deployment management
•  Solution based approach
•  Applications to operate a
business
* This is a pluggable architecture
Distributed
File System
Enterprise
Applications
Global Resource
Management
®
© 2014 MapR Technologies 10
Technologies That Work
Global Resource
Management
Distributed
File System
Enterprise
Applications
Mesos + Myriad
YARN
MapR-FS HDFSS3
Web
Servers
Business
Applications
®
© 2014 MapR Technologies 11
We Will Call This Architecture…
®
© 2014 MapR Technologies 12
What’s in a Name
•  The letter Z is the last letter in the English
alphabet, but Zeta is not the last letter of the
Greek alphabet
–  But this is the last generalized architecture you
will need.
•  Sixth letter of the Greek alphabet
–  Hexagon represents the 6 surrounding pieces
•  Zeta represents the number 7
–  7 total components in this architecture
–  Components work with a global resource
manager
®
© 2014 MapR Technologies 13
Origin Story of the Zeta Architecture
•  Cultivated by Jim Scott
–  Created the pretty diagrams
–  Put a nice name on it
–  Documented the concepts
•  Not really a new concept
–  Google pretty much pioneered these
technology concepts
–  They have never really discussed it
cohesively in this way
®
© 2014 MapR Technologies 14
Zeta Architecture at Google
Global Resource
Management
Distributed
File System
Enterprise
Applications
Borg & Omega
GoogleFS
HTTP
Servers
GMail
®
© 2014 MapR Technologies 15© 2014 MapR Technologies
®
Concrete Implementations
®
© 2014 MapR Technologies 16
Web Server Logs
•  Web server generates logs
•  Land on local disk
–  Logs periodically rotated
•  Shipped to other servers
•  Run jobs on logs
®
© 2014 MapR Technologies 17
Web Server Logs
•  Web server generates logs
•  Land on DFS
–  Logs still rotate
–  Logs now tolerant of a server
failure prior to rotation
–  Logs are instantaneously
available for computation
•  Run jobs on logs
–  Data locality
®
© 2014 MapR Technologies 18
Advertising Platform
®
© 2014 MapR Technologies 19
Advertising Platform - Simplified
®
© 2014 MapR Technologies 20
Advertising Platform on Zeta
®
© 2014 MapR Technologies 21© 2014 MapR Technologies
®
Business Implications
®
© 2014 MapR Technologies 22
Integration of Existing Systems
•  Use standards like NFS to connect existing
systems
•  Pluggable security models fit into your
companies current standards
•  Many conventional tools work well
•  Not everything works well in this model
–  Oracle, DB2, SQL Server, PostgreSQL, MySQL
•  They tend to not support being resource managed,
containers or other DFS
•  Applications in this architecture can still use them
•  If they start supporting these technologies then things
change
®
®
© 2014 MapR Technologies 23
Rethink the Data Center
•  All Servers
–  Run Mesos
–  Participate in the Distributed File System
•  Dynamic Allocation of Resources
–  Spin up more web servers
–  Custom Business Applications
–  Big Data Analytics
•  Data Locality
–  No more shipping data
–  Store and process the data where it was created
®
© 2014 MapR Technologies 24
Simplified Architecture
•  Less moving parts
–  Less things to go wrong
•  Better resource utilization
–  Scale any application up or down on demand
•  Common deployment model (new isolation model)
–  Repeatability between environments (dev, qa, production)
•  Shared file system
–  Get at the data anywhere in the cluster
–  Simplifies business continuity
®
© 2014 MapR Technologies 25
Business Continuity
•  Resilience
–  Redundancy
–  High Availability
–  Spare Capacity
•  Recovery
–  Snapshots
–  Disaster Recovery
•  Contingency
–  Protect against the unforeseen
–  Multisite Capability
Production
WAN
Production Research
Datacenter	
  1	
   Datacenter	
  2	
  
WAN EC2
®
© 2014 MapR Technologies 26
Platform-wide Security and Compliance
•  Authentication, Authorization, Auditing
–  Users and jobs
–  All tiers
•  Data protection
–  Wire-level encryption between servers
–  Masking
•  Regulatory Compliance
–  Automatic expiration of “old” data
–  Data locality supported by distributed file system
®
© 2014 MapR Technologies 27
Net Benefit
•  Reduced operating expenses (OPEX)
–  Better utilization of available capacity and data center space
•  Reduced capital expenses (CAPEX)
–  Less total hardware needed
•  Improves time to market
–  Streamlined deployments
–  Environments become consistent and predictable
•  Delivers a competitive advantage
–  Via platform scaling
–  Performance improvements
®
© 2014 MapR Technologies 28
Recap
•  Saves valuable time and money
•  Enables stronger business continuity capabilities
•  Google has been doing this for years
–  Real-time is the crux of everything Google does
•  Time for the rest of us to operate at Google scale
–  The technologies are there and they play together nicely
–  Process changes must occur internally to achieve this architecture
•  This approach will become the “traditional” way of thinking
–  Don’t get beat to it by your competitors
®
© 2014 MapR Technologies 29
Go Forth and Implement the Zeta Architecture
®
© 2014 MapR Technologies 30
$50M$50Min Free Training
www.mapr.com/odt
®
© 2014 MapR Technologies 31
Q&A
@ted_dunning maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot

Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014aziksa
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopDataWorks Summit
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperBlueData, Inc.
 
2017 DB Trends for Powering Real-Time Systems of Engagement
2017 DB Trends for Powering Real-Time Systems of Engagement2017 DB Trends for Powering Real-Time Systems of Engagement
2017 DB Trends for Powering Real-Time Systems of EngagementAerospike, Inc.
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesSoftware Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesOdinot Stanislas
 
Panasonic information systems
Panasonic information systemsPanasonic information systems
Panasonic information systemsYash Mittal
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureUtkarsh Pandey
 
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Slides: Start Small, Grow Big with a Unified Scale-Out InfrastructureSlides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Slides: Start Small, Grow Big with a Unified Scale-Out InfrastructureNetApp
 
MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...
MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...
MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...Dell EMC World
 
Denver Broncos Case Study
Denver Broncos Case StudyDenver Broncos Case Study
Denver Broncos Case StudyNetApp
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationDataWorks Summit
 
FlexPod as a Competitive Edge
FlexPod as a Competitive EdgeFlexPod as a Competitive Edge
FlexPod as a Competitive Edge NetApp
 
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Hitachi Vantara
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeDataWorks Summit
 
Hortonworks HDP, Is it goog enough ?
Hortonworks HDP, Is it goog enough ?Hortonworks HDP, Is it goog enough ?
Hortonworks HDP, Is it goog enough ?Huxi LI
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudDataWorks Summit
 

What's hot (20)

Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with Hadoop
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White Paper
 
2017 DB Trends for Powering Real-Time Systems of Engagement
2017 DB Trends for Powering Real-Time Systems of Engagement2017 DB Trends for Powering Real-Time Systems of Engagement
2017 DB Trends for Powering Real-Time Systems of Engagement
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesSoftware Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture Technologies
 
Panasonic information systems
Panasonic information systemsPanasonic information systems
Panasonic information systems
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 
NetApp All Flash storage
NetApp All Flash storageNetApp All Flash storage
NetApp All Flash storage
 
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Slides: Start Small, Grow Big with a Unified Scale-Out InfrastructureSlides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
 
MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...
MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...
MT44 Dell EMC Data Protection: What You Need to Know About Data Protection Ev...
 
Denver Broncos Case Study
Denver Broncos Case StudyDenver Broncos Case Study
Denver Broncos Case Study
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop Implementation
 
FlexPod as a Competitive Edge
FlexPod as a Competitive EdgeFlexPod as a Competitive Edge
FlexPod as a Competitive Edge
 
Nutanix basic
Nutanix basicNutanix basic
Nutanix basic
 
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
Hortonworks HDP, Is it goog enough ?
Hortonworks HDP, Is it goog enough ?Hortonworks HDP, Is it goog enough ?
Hortonworks HDP, Is it goog enough ?
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 

Viewers also liked

Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Stavros Kontopoulos
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduCloudera, Inc.
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Spark Summit EU Supporting Spark (Brussels 2016)
Spark Summit EU Supporting Spark (Brussels 2016)Spark Summit EU Supporting Spark (Brussels 2016)
Spark Summit EU Supporting Spark (Brussels 2016)Stavros Kontopoulos
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 

Viewers also liked (9)

Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Spark Summit EU Supporting Spark (Brussels 2016)
Spark Summit EU Supporting Spark (Brussels 2016)Spark Summit EU Supporting Spark (Brussels 2016)
Spark Summit EU Supporting Spark (Brussels 2016)
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 

Similar to Zeta architecture - Hive London May15

Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentMapR Technologies
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
Managing Performance in the Cloud
Managing Performance in the CloudManaging Performance in the Cloud
Managing Performance in the CloudDevOpsGroup
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web TechnologiesSyed Muhammad Ali Hasnain
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld
 
Postgres for the Future
Postgres for the FuturePostgres for the Future
Postgres for the FutureEDB
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 

Similar to Zeta architecture - Hive London May15 (20)

Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Integrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environmentIntegrating Hadoop into your enterprise IT environment
Integrating Hadoop into your enterprise IT environment
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
Managing Performance in the Cloud
Managing Performance in the CloudManaging Performance in the Cloud
Managing Performance in the Cloud
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web Technologies
 
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
VMworld 2013: Separating Cloud Hype from Reality in Healthcare – a Real-Life ...
 
Postgres for the Future
Postgres for the FuturePostgres for the Future
Postgres for the Future
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 

More from MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Zeta architecture - Hive London May15

  • 1. ® © 2014 MapR Technologies 1 ® © 2014 MapR Technologies Zeta Architecture Ted Dunning– Chief Application Architect Hive London – May 20, 2015
  • 2. ® © 2014 MapR Technologies 2 Agenda •  Current State –  History –  Moving Forward •  The Next Enterprise Architecture •  Business Implications •  Concrete Implementations
  • 3. ® © 2014 MapR Technologies 3© 2014 MapR Technologies ® Current State
  • 4. ® © 2014 MapR Technologies 4 Study History to Prepare for the Future •  A data center was built •  The servers were statically partitioned •  If we want to break the cycle we have to break the partitions and become dynamic
  • 5. ® © 2014 MapR Technologies 5 Understanding the Why’s •  Isolation of resources –  Assists in troubleshooting –  Prevents the analytics team from impacting production •  Maximum throughput of an application –  Guaranteed volume (maximum): compute, memory and storage •  Business Continuity –  We know exactly what is backed up, when, and where –  Difficult to perfect and to test
  • 6. ® © 2014 MapR Technologies 6 Issues with Isolated Workloads •  Segregated servers lead to under utilized hardware –  Wasted capacity and energy •  Complicated processes to move data to the required processing servers –  Operational impact, including extra monitoring –  Time delays moving data (not real-time) –  Troubleshooting time when there are issues •  Difficult to thoroughly test DEV vs. QA vs. Production –  Environments have different shapes and sizes –  They will not have identical configurations
  • 7. ® © 2014 MapR Technologies 7 Goals Moving Forward •  Leverage all existing hardware •  Create isolation in a different way •  Improve production operational processes •  Fix process of moving from DEV to QA to Production •  Support real-time business continuity
  • 8. ® © 2014 MapR Technologies 8© 2014 MapR Technologies ® The Next “Last” Enterprise Architecture
  • 9. ® © 2014 MapR Technologies 9 The Next Generation Enterprise Architecture •  Dynamic compute resources •  Common storage platform •  Real-time application support •  Flexible programming models •  Deployment management •  Solution based approach •  Applications to operate a business * This is a pluggable architecture Distributed File System Enterprise Applications Global Resource Management
  • 10. ® © 2014 MapR Technologies 10 Technologies That Work Global Resource Management Distributed File System Enterprise Applications Mesos + Myriad YARN MapR-FS HDFSS3 Web Servers Business Applications
  • 11. ® © 2014 MapR Technologies 11 We Will Call This Architecture…
  • 12. ® © 2014 MapR Technologies 12 What’s in a Name •  The letter Z is the last letter in the English alphabet, but Zeta is not the last letter of the Greek alphabet –  But this is the last generalized architecture you will need. •  Sixth letter of the Greek alphabet –  Hexagon represents the 6 surrounding pieces •  Zeta represents the number 7 –  7 total components in this architecture –  Components work with a global resource manager
  • 13. ® © 2014 MapR Technologies 13 Origin Story of the Zeta Architecture •  Cultivated by Jim Scott –  Created the pretty diagrams –  Put a nice name on it –  Documented the concepts •  Not really a new concept –  Google pretty much pioneered these technology concepts –  They have never really discussed it cohesively in this way
  • 14. ® © 2014 MapR Technologies 14 Zeta Architecture at Google Global Resource Management Distributed File System Enterprise Applications Borg & Omega GoogleFS HTTP Servers GMail
  • 15. ® © 2014 MapR Technologies 15© 2014 MapR Technologies ® Concrete Implementations
  • 16. ® © 2014 MapR Technologies 16 Web Server Logs •  Web server generates logs •  Land on local disk –  Logs periodically rotated •  Shipped to other servers •  Run jobs on logs
  • 17. ® © 2014 MapR Technologies 17 Web Server Logs •  Web server generates logs •  Land on DFS –  Logs still rotate –  Logs now tolerant of a server failure prior to rotation –  Logs are instantaneously available for computation •  Run jobs on logs –  Data locality
  • 18. ® © 2014 MapR Technologies 18 Advertising Platform
  • 19. ® © 2014 MapR Technologies 19 Advertising Platform - Simplified
  • 20. ® © 2014 MapR Technologies 20 Advertising Platform on Zeta
  • 21. ® © 2014 MapR Technologies 21© 2014 MapR Technologies ® Business Implications
  • 22. ® © 2014 MapR Technologies 22 Integration of Existing Systems •  Use standards like NFS to connect existing systems •  Pluggable security models fit into your companies current standards •  Many conventional tools work well •  Not everything works well in this model –  Oracle, DB2, SQL Server, PostgreSQL, MySQL •  They tend to not support being resource managed, containers or other DFS •  Applications in this architecture can still use them •  If they start supporting these technologies then things change ®
  • 23. ® © 2014 MapR Technologies 23 Rethink the Data Center •  All Servers –  Run Mesos –  Participate in the Distributed File System •  Dynamic Allocation of Resources –  Spin up more web servers –  Custom Business Applications –  Big Data Analytics •  Data Locality –  No more shipping data –  Store and process the data where it was created
  • 24. ® © 2014 MapR Technologies 24 Simplified Architecture •  Less moving parts –  Less things to go wrong •  Better resource utilization –  Scale any application up or down on demand •  Common deployment model (new isolation model) –  Repeatability between environments (dev, qa, production) •  Shared file system –  Get at the data anywhere in the cluster –  Simplifies business continuity
  • 25. ® © 2014 MapR Technologies 25 Business Continuity •  Resilience –  Redundancy –  High Availability –  Spare Capacity •  Recovery –  Snapshots –  Disaster Recovery •  Contingency –  Protect against the unforeseen –  Multisite Capability Production WAN Production Research Datacenter  1   Datacenter  2   WAN EC2
  • 26. ® © 2014 MapR Technologies 26 Platform-wide Security and Compliance •  Authentication, Authorization, Auditing –  Users and jobs –  All tiers •  Data protection –  Wire-level encryption between servers –  Masking •  Regulatory Compliance –  Automatic expiration of “old” data –  Data locality supported by distributed file system
  • 27. ® © 2014 MapR Technologies 27 Net Benefit •  Reduced operating expenses (OPEX) –  Better utilization of available capacity and data center space •  Reduced capital expenses (CAPEX) –  Less total hardware needed •  Improves time to market –  Streamlined deployments –  Environments become consistent and predictable •  Delivers a competitive advantage –  Via platform scaling –  Performance improvements
  • 28. ® © 2014 MapR Technologies 28 Recap •  Saves valuable time and money •  Enables stronger business continuity capabilities •  Google has been doing this for years –  Real-time is the crux of everything Google does •  Time for the rest of us to operate at Google scale –  The technologies are there and they play together nicely –  Process changes must occur internally to achieve this architecture •  This approach will become the “traditional” way of thinking –  Don’t get beat to it by your competitors
  • 29. ® © 2014 MapR Technologies 29 Go Forth and Implement the Zeta Architecture
  • 30. ® © 2014 MapR Technologies 30 $50M$50Min Free Training www.mapr.com/odt
  • 31. ® © 2014 MapR Technologies 31 Q&A @ted_dunning maprtech tdunning@mapr.com Engage with us! MapR maprtech mapr-technologies