SlideShare a Scribd company logo
1 of 18
Big Data Spain 2012
 The International Big Data Conference in Spain
              Madrid, 16th Nov 2012
          ETSI Telecomunicacion UPM
              www.bigdataspain.org
Architecture for Scale
       A Case Study
Who am I?
•   CTO and Co-Founder of AppFirst
•   Application Virtualization
    o   UNIX server applications

    o   Solaris 2.6 applications on Solaris 10

•   Real-time Operating Systems
    o   Hubble Space Telescope

    o   Under Wing Armaments

    o   Medical Instruments

•   Launch Processing System
    o   NASA Kennedy Space Center

    o   Hardware and Software Design of Ground-based Launch Control
        Systems
AppFirst Collects, Aggregates and Correlates
  Information from Production Applications

•   NYC based software start-up
•   Application
    o   Aggregate & summarize data from 10ks of remote
        servers
    o   Provide information for web apps and APIs
•   A Few Metrics
    o   45k to 50k summaries per minute
    o   GBs per remote server per day
    o   TBs of new data daily
    o   Query & retrieve information in < 100 MS
    o   Data store for up to 1 year
Simplified Architecture
Design for Scale




•Micro scale
oApplication Components
•Macro scale
oThe Entire Service
Micro Scale:
    Data Processing

Requirements:
•Process a constant stream of data
o3 snapshots per minute, per remote server
•Create summaries in real-time
oup to 1 minute behind wall clock time
•Provide query results in < 100 MS
Micro Scale:
 Efficiency

We found that:
•Summaries of the data were needed in order to keep
queries < 100 MS
oServer
oProcess
oProcess sets
oTopology
•Time series needed for each summary type
oMinute
oHour
oDay


                 We tried:
                 •Flat files
                 •Network file systems
                 •Distributed file systems
                 •Relational databases
                 •NoSQL key-value store
                 •Memory based SQL databases
                 •Distributed shared memory
Micro Scale:
We learned the hard way



   Tape is Dead
    Disk is Tape
   Flash is Disk
 RAM Locality is King

      Jim Gray
      Microsoft
   December 2006
Micro Scale:
                      Solution



                               Aggregation:
                               •HPC pipeline processing model
                               •RAM based data model
                               •Queues as message bus
                               •Stateless processing
                               •Adaptive control
                               •Queries are fully abstracted




Horizontal scale may require that you revisit your design
Micro Scale




                 We all know we need to scale horizontally

Cluster                                               Stateless
                                                    •Any data processing with any time constraint
•Use components that cluster                        •Processes can be run on any server
•Don’t do backups, use replication                  •Processes can be migrated
•Redis, memcached, RabbitMQ, Hbase can be clustered •Multiple processes can be added as load varies
•Postgresql & MySQl don’t really cluster            •All data stored in distributed shared memory
                                                    •Message passing between components
                                                    •Send keys and not data
Macro Scale:
                        Application Capacity




  Load:
  •Most significant load impact from remote servers
  •User interaction, APIs, and queries do not load the system as much as remote servers
  •Support 100, 1,000, 10,000, 100,000 remote servers

Will a design that supports 10,000 remote servers scale to support
                     100,000 remote servers?
Infinite Scale




•Paralyzes the design team                      •But... you don’t want to say no to the business
•Fosters bad behavior                           •The whole purpose is to add users
•Unrealistic expectations                       •When the business brings a customer with
•Developers forced to take unrealistic action   10,000 servers you want to say; bring it on
Macro Scale:
                Capacity




We started with a snapshot:
•Supported 1000 remote servers
•Micro scale results made it possible to scale out
•fairly flexible application component design
•Scale out to 10,000 remote servers
•This is a financial calculation
•Scaled out in linear fashion
•Data processing
•Storage
•Started in linear fashion then determined actual requirements
Macro Scale Solution:
                 The Pod




Pod 0                                                                 Pod 1
        Pod Architecture:
        •Segmented infrastructure along the lines of load sources
        •Create infrastructure to support specific load
        •Instantiate additional infrastructure with additional load
        •When a pod gets to 85-90% capacity spin out a new pod
        •Capacity of a pod is a financial calculation
        •Scale within a pod in 1000 server increments
        •Need to automate the deployment of a pod
Adaptive Control                         The Pod Rocks                           Metrics are king
•You can’t react fast enough             •Isolated
•Scale out                               •Distributed                            •Business metrics
•Scale back                              •Located where needed                   •Application metrics
•Migrate                                 •Behind the firewall




        Time Series Data                                         Don’t trust the data
        •Issues relate to a specific time                        •Clocks are skewed
        •Complete state information for any given minute         •Encodings fail
        •Don’t know what info is needed before a problem         •Save all bad data & replay
        occurs; all data every minute                            •Think defensive
Conclusions
•Stateless Data
oKey to horizontal scale

•Disk is tape
oRAM based design is critical, not optional

•Cluster
oUse components that cluster, not just master/salve

•Design for infinite scale does not work
•Pod approach is an answer for infinite scale
Thank You!

  Donn Rochette
donn@appfirst.com
 www.appfirst.com

More Related Content

What's hot

20 Altair PBS Professional Features in 20 minutes, 2018
20 Altair PBS Professional Features in 20 minutes, 201820 Altair PBS Professional Features in 20 minutes, 2018
20 Altair PBS Professional Features in 20 minutes, 2018Susheel Patidar
 
Devoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National PoliceDevoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National PoliceBert Jan Schrijver
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemZhenzhong Xu
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesLightbend
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lightbend
 
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum PainKafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum Painconfluent
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDBFoundationDB
 
Load balancing theory and practice
Load balancing theory and practiceLoad balancing theory and practice
Load balancing theory and practiceFoundationDB
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLDataStax
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScyllaDB
 
6/18/14 Billing & Payments Engineering Meetup I
6/18/14 Billing & Payments Engineering Meetup I6/18/14 Billing & Payments Engineering Meetup I
6/18/14 Billing & Payments Engineering Meetup IMathieu Chauvin
 
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsLightbend
 
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public CloudScylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public CloudScyllaDB
 
Low-Latency Data Processing in the Era of Serverless @DevoxxUA
Low-Latency Data Processing in the Era of Serverless @DevoxxUALow-Latency Data Processing in the Era of Serverless @DevoxxUA
Low-Latency Data Processing in the Era of Serverless @DevoxxUANazarii Cherkas
 
Cassandra-as-a-Service
Cassandra-as-a-ServiceCassandra-as-a-Service
Cassandra-as-a-ServiceInstaclustr
 
Asynchronous design with Spring and RTI: 1M events per second
Asynchronous design with Spring and RTI: 1M events per secondAsynchronous design with Spring and RTI: 1M events per second
Asynchronous design with Spring and RTI: 1M events per secondStuart (Pid) Williams
 
Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Cassandra Summit 2014: Launching PlayStation 4 with Apache CassandraCassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Cassandra Summit 2014: Launching PlayStation 4 with Apache CassandraDataStax Academy
 
How to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba CloudHow to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba CloudAlibaba Cloud
 
AliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAlibaba Cloud
 

What's hot (20)

20 Altair PBS Professional Features in 20 minutes, 2018
20 Altair PBS Professional Features in 20 minutes, 201820 Altair PBS Professional Features in 20 minutes, 2018
20 Altair PBS Professional Features in 20 minutes, 2018
 
Devoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National PoliceDevoxx PL 2018 - Microservices in action at the Dutch National Police
Devoxx PL 2018 - Microservices in action at the Dutch National Police
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystem
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and Microservices
 
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
Lessons From HPE: From Batch To Streaming For 20 Billion Sensors With Lightbe...
 
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum PainKafka Summit SF 2017 - Running Kafka for Maximum Pain
Kafka Summit SF 2017 - Running Kafka for Maximum Pain
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
FoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACIDFoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACID
 
Load balancing theory and practice
Load balancing theory and practiceLoad balancing theory and practice
Load balancing theory and practice
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
 
6/18/14 Billing & Payments Engineering Meetup I
6/18/14 Billing & Payments Engineering Meetup I6/18/14 Billing & Payments Engineering Meetup I
6/18/14 Billing & Payments Engineering Meetup I
 
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native ApplicationsThe Reactive Principles: Eight Tenets For Building Cloud Native Applications
The Reactive Principles: Eight Tenets For Building Cloud Native Applications
 
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public CloudScylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
Scylla Summit 2022: ScyllaDB Cloud: Simplifying Deployment to the Public Cloud
 
Low-Latency Data Processing in the Era of Serverless @DevoxxUA
Low-Latency Data Processing in the Era of Serverless @DevoxxUALow-Latency Data Processing in the Era of Serverless @DevoxxUA
Low-Latency Data Processing in the Era of Serverless @DevoxxUA
 
Cassandra-as-a-Service
Cassandra-as-a-ServiceCassandra-as-a-Service
Cassandra-as-a-Service
 
Asynchronous design with Spring and RTI: 1M events per second
Asynchronous design with Spring and RTI: 1M events per secondAsynchronous design with Spring and RTI: 1M events per second
Asynchronous design with Spring and RTI: 1M events per second
 
Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Cassandra Summit 2014: Launching PlayStation 4 with Apache CassandraCassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
 
How to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba CloudHow to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba Cloud
 
AliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core Features
 

Viewers also liked

Viewers also liked (13)

AWS Cost Allocation Using Tags And Linked Accounts
AWS Cost Allocation Using Tags And Linked AccountsAWS Cost Allocation Using Tags And Linked Accounts
AWS Cost Allocation Using Tags And Linked Accounts
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Pill Camera ppt
Pill Camera pptPill Camera ppt
Pill Camera ppt
 
PILL CAMERA
PILL CAMERAPILL CAMERA
PILL CAMERA
 
Consequence driven decomposition of large-scale power system security analysis
Consequence driven decomposition of large-scale power system security analysisConsequence driven decomposition of large-scale power system security analysis
Consequence driven decomposition of large-scale power system security analysis
 
Pill camera ppt
Pill camera pptPill camera ppt
Pill camera ppt
 
Pill camera presentation
Pill camera presentationPill camera presentation
Pill camera presentation
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Big Data Spain 2012 Conference Architecture Case Study

Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]AppFirst
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatQiming Teng
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenParticular Software
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scaleOvais Tariq
 
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...Grid Dynamics
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - GeektalkMalisa Ncube
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Gridsjlorenzocima
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB OverviewAndrew Liu
 
eHarmony in the Cloud
eHarmony in the CloudeHarmony in the Cloud
eHarmony in the CloudCraig Dickson
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestHakka Labs
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestKrishna Gade
 
Share point disaster avoidance architecture for large scale enterprises
Share point disaster avoidance architecture for large scale enterprisesShare point disaster avoidance architecture for large scale enterprises
Share point disaster avoidance architecture for large scale enterprisesSentri
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...Amazon Web Services
 
Managing Performance in the Cloud
Managing Performance in the CloudManaging Performance in the Cloud
Managing Performance in the CloudDevOpsGroup
 

Similar to Big Data Spain 2012 Conference Architecture Case Study (20)

Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]
 
Suning OpenStack Cloud and Heat
Suning OpenStack Cloud and HeatSuning OpenStack Cloud and Heat
Suning OpenStack Cloud and Heat
 
Enterprise Journey to the Cloud
Enterprise Journey to the CloudEnterprise Journey to the Cloud
Enterprise Journey to the Cloud
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud Computing - Geektalk
Cloud Computing - GeektalkCloud Computing - Geektalk
Cloud Computing - Geektalk
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
 
eHarmony in the Cloud
eHarmony in the CloudeHarmony in the Cloud
eHarmony in the Cloud
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
 
Share point disaster avoidance architecture for large scale enterprises
Share point disaster avoidance architecture for large scale enterprisesShare point disaster avoidance architecture for large scale enterprises
Share point disaster avoidance architecture for large scale enterprises
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
 
Managing Performance in the Cloud
Managing Performance in the CloudManaging Performance in the Cloud
Managing Performance in the Cloud
 

More from Big Data Spain

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Big Data Spain 2012 Conference Architecture Case Study

  • 1. Big Data Spain 2012 The International Big Data Conference in Spain Madrid, 16th Nov 2012 ETSI Telecomunicacion UPM www.bigdataspain.org
  • 2. Architecture for Scale A Case Study
  • 3. Who am I? • CTO and Co-Founder of AppFirst • Application Virtualization o UNIX server applications o Solaris 2.6 applications on Solaris 10 • Real-time Operating Systems o Hubble Space Telescope o Under Wing Armaments o Medical Instruments • Launch Processing System o NASA Kennedy Space Center o Hardware and Software Design of Ground-based Launch Control Systems
  • 4. AppFirst Collects, Aggregates and Correlates Information from Production Applications • NYC based software start-up • Application o Aggregate & summarize data from 10ks of remote servers o Provide information for web apps and APIs • A Few Metrics o 45k to 50k summaries per minute o GBs per remote server per day o TBs of new data daily o Query & retrieve information in < 100 MS o Data store for up to 1 year
  • 6. Design for Scale •Micro scale oApplication Components •Macro scale oThe Entire Service
  • 7. Micro Scale: Data Processing Requirements: •Process a constant stream of data o3 snapshots per minute, per remote server •Create summaries in real-time oup to 1 minute behind wall clock time •Provide query results in < 100 MS
  • 8. Micro Scale: Efficiency We found that: •Summaries of the data were needed in order to keep queries < 100 MS oServer oProcess oProcess sets oTopology •Time series needed for each summary type oMinute oHour oDay We tried: •Flat files •Network file systems •Distributed file systems •Relational databases •NoSQL key-value store •Memory based SQL databases •Distributed shared memory
  • 9. Micro Scale: We learned the hard way Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006
  • 10. Micro Scale: Solution Aggregation: •HPC pipeline processing model •RAM based data model •Queues as message bus •Stateless processing •Adaptive control •Queries are fully abstracted Horizontal scale may require that you revisit your design
  • 11. Micro Scale We all know we need to scale horizontally Cluster Stateless •Any data processing with any time constraint •Use components that cluster •Processes can be run on any server •Don’t do backups, use replication •Processes can be migrated •Redis, memcached, RabbitMQ, Hbase can be clustered •Multiple processes can be added as load varies •Postgresql & MySQl don’t really cluster •All data stored in distributed shared memory •Message passing between components •Send keys and not data
  • 12. Macro Scale: Application Capacity Load: •Most significant load impact from remote servers •User interaction, APIs, and queries do not load the system as much as remote servers •Support 100, 1,000, 10,000, 100,000 remote servers Will a design that supports 10,000 remote servers scale to support 100,000 remote servers?
  • 13. Infinite Scale •Paralyzes the design team •But... you don’t want to say no to the business •Fosters bad behavior •The whole purpose is to add users •Unrealistic expectations •When the business brings a customer with •Developers forced to take unrealistic action 10,000 servers you want to say; bring it on
  • 14. Macro Scale: Capacity We started with a snapshot: •Supported 1000 remote servers •Micro scale results made it possible to scale out •fairly flexible application component design •Scale out to 10,000 remote servers •This is a financial calculation •Scaled out in linear fashion •Data processing •Storage •Started in linear fashion then determined actual requirements
  • 15. Macro Scale Solution: The Pod Pod 0 Pod 1 Pod Architecture: •Segmented infrastructure along the lines of load sources •Create infrastructure to support specific load •Instantiate additional infrastructure with additional load •When a pod gets to 85-90% capacity spin out a new pod •Capacity of a pod is a financial calculation •Scale within a pod in 1000 server increments •Need to automate the deployment of a pod
  • 16. Adaptive Control The Pod Rocks Metrics are king •You can’t react fast enough •Isolated •Scale out •Distributed •Business metrics •Scale back •Located where needed •Application metrics •Migrate •Behind the firewall Time Series Data Don’t trust the data •Issues relate to a specific time •Clocks are skewed •Complete state information for any given minute •Encodings fail •Don’t know what info is needed before a problem •Save all bad data & replay occurs; all data every minute •Think defensive
  • 17. Conclusions •Stateless Data oKey to horizontal scale •Disk is tape oRAM based design is critical, not optional •Cluster oUse components that cluster, not just master/salve •Design for infinite scale does not work •Pod approach is an answer for infinite scale
  • 18. Thank You! Donn Rochette donn@appfirst.com www.appfirst.com