SlideShare a Scribd company logo
1 of 23
Download to read offline
ROBIN
SYSTEMS
Build your own Data Pipeline with Docker
Adeesh Fulay & Giri Kesavan
ROBIN FREE COMMUNITY EDITION
CONFIDENTIAL – RESTRICTED DISTRIBUTION
https://robinsystems.com/get-robin/
Free Forever (Robin Licenses)
Up to 5 EC2 Instances
Auto-deploy to AWS using ‘gorobin’ tool
Use pre-defined bundles or bring your
own app
jon@robinsystems.com
adeesh@robinsystems.com
LET’S PLAY BUZZWORD BINGO!!
CONFIDENTIAL – RESTRICTED DISTRIBUTION https://www.shutterstock.com/image-illustration/word-cloud-predictive-analytics-related-tags-218879485
EXAMPLES FROM EVERYDAY LIFE
CONFIDENTIAL – RESTRICTED DISTRIBUTION
INDUSTRY-WIDE USE CASES
CONFIDENTIAL – RESTRICTED DISTRIBUTION
DATA PIPELINE
› Data pipeline is an automated process
that executes at regular interval to
ingest, cleanse, transform and/or
aggregate incoming feed of data to
generate the output dataset in the
format that is suitable for downstream
processing, with no manual intervention.
CONFIDENTIAL – RESTRICTED DISTRIBUTION
data pipeline: (Worldwide – Last 5 Years)
KEY CHARACTERISTIC – CHOICE
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Container-based
Compute Plane
Integrated Scale-Out
Block Storage
Integrated Networking
WHAT IS ROBIN?
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Works Anywhere
BareMetal, VM or Cloud
With All Apps
No changes to apps or workflows
Storage Node
Compute Node
Storage Node
Compute Node Converged Node Converged Node
WHAT IS ROBIN?
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Big Data NoSQL RDBMS Other
Custom
Apps
Application Aware Workflow Manager
UpgradeCloneDeploy ShareScale
Access ControlHigh AvailabilityQOS Control Placement Control Security
DATA PIPELINE ON DOCKER??
CONFIDENTIAL – RESTRICTED DISTRIBUTION
EASY EXAMPLE: TWITTER STREAMING USING ELK
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Master Node
Data Node Data Node
Master Node
DEMO 1
› Tweet now with the words ‘robinsystems’ in it
CONFIDENTIAL – RESTRICTED DISTRIBUTION
HOW WE DID IT?
• 3 EC2 Instances, 6 EBS volumes
• 6 Docker containers, 3 independent
images
• Ovs bridge for cross host networking
• Private IP for each container
• Virtual volumes mounted on each
container
• Ports mapped for ES and Kibana for
external access
CONFIDENTIAL – RESTRICTED DISTRIBUTION
EC2	M4.2xl EC2	M4.2xl
EBS EBS EBS EBS
Robin
Storage
Plane
Robin
Compute
Plane
EC2	M4.2xl
…
OVS	Bridge OVS	Bridge OVS	Bridge
Primary
Private IPs
Virtual
Volumes
Containers ..… Secondary
Private IPs
KNIME-spark
gateway
Data at Rest
(Encryption)
Kerberos (AD)
REAL-WORLD EXAMPLE: SECURITY ANALYSIS
CONFIDENTIAL – RESTRICTED DISTRIBUTION
HDP
HDFS
Ranger
(AuthZ)
SSLStream	
Sources
Data Store
Security Layer
Other Services
HOW WE DID IT?
• 30 physical servers
• 2 Multitenant clusters - Dev (800 TB) &
InfoSec (1.5 PB)
• 100+ Docker containers
• ~1 service per container
• Ovs bridge for cross host networking
• Routable IP address for each container
• Virtual volumes mounted on each
container
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Physical	
Servers
Physical	
Servers
HDDs HDDs HDDsStorage
Pool
Compute
Pool
Physical	
Servers
…
OVS	Bridge OVS	Bridge OVS	Bridge
Virtual
Volumes
Containers
...
2 Multitenant
clusters
BENEFITS
› Rapid Deploy
› Deploy time/cluster = 40 mins (Originally 2 weeks)
› No need to size hardware by App
› Decouple compute and storage
› HDP recommends keeping only 48-96 TB per data node
› Improved server and storage utilization (~40%)
› Enforce data locality for performance
› Multitenancy for any application with performance isolation
CONFIDENTIAL – RESTRICTED DISTRIBUTION
DEMO 2
CONFIDENTIAL – RESTRICTED DISTRIBUTION
How do I provide
developers access
to data ?
Can I run multiple
pipelines on the same
setup without
compromising
performance ?
When data sets &
workload grows ,
can I avoid under
provisioning ?
How do you
handle spikes and
growth ?
How do I quickly
deploy my entire
pipeline ?
How do I provide
developers access
to data ?
Can I run multiple
pipelines on the same
setup without
compromising
performance ?
When data sets &
workload grows ,
can I avoid under
provisioning ?
How do you
handle spikes and
growth ?
How do I quickly
deploy my entire
pipeline ?
DATA PIPELINE - CHALLENGES
CONFIDENTIAL – RESTRICTED DISTRIBUTION
How do I avoid
under or over
provisioning
resources?
DATA PIPELINE IS A CLUSTER OF MULTIPLE CLUSTERED APPLICATIONS EXPECTED
TO WORK IN UNISON
How do I quickly
deploy my entire
pipeline ?
How do you
handle spikes and
growth ?
How do I provide
developers access
to data ?
Can I run multiple
pipelines on the same
setup without
compromising
performance ?
CHALLENGES WITH MANAGING DATA PIPELINES
CONFIDENTIAL – RESTRICTED DISTRIBUTION
8 Billion
Exploding Data Volume
24 feeds into ElasticSearch
8 billion security events per day
53 billion documents
1 Week
Poor Agility
Week+ to Provision Clusters
10hrs+ to take a Snapshot
$3 Million
High Cost
Real Time traffic too much
for VMs on Commodity Hardware
Expensive All-Flash servers
to meet performance needs
$3M+ hardware spend and growing
CHALLENGES WITH MANAGING DATA PIPELINES
CONFIDENTIAL – RESTRICTED DISTRIBUTION
“Getting Big Data projects to Production is a
challenge …
Only 15% of businesses reported deploying
their Big Data project to Production”
– Gartner Big Data Survey, Oct 2016
Let Applications Drive Infrastructure
ROBIN BENEFITS
Big Data NoSQL RDBMS Other
Custom
Apps
Instant Access Lower Complexity Lower Cost
Deploy Entire Pipelines in Minutes Same Workflow across Apps More Apps on Same Resources
Instant Sharing – No Data Copy SLA Guarantees Self-Service Dev, Automation for Ops
Test-before-Commit Dynamic Scaling Faster Time to Market
Simplify Ops10x Lower TCO2x3x Faster Projects
ROBIN FREE COMMUNITY EDITION
CONFIDENTIAL – RESTRICTED DISTRIBUTION
https://robinsystems.com/get-robin/
Free Forever (Robin Licenses)
Up to 5 EC2 Instances
Auto-deploy to AWS using ‘gorobin’ tool
Use pre-defined bundles or bring your
own app
jon@robinsystems.com
adeesh@robinsystems.com
www.robinsystems.com info@robinsystems.com
THANK YOU

More Related Content

What's hot

RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedis Labs
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
 
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMWalmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMRedis Labs
 
AliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAlibaba Cloud
 
Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Oracle Databases on AWS - Getting the Best Out of RDS and EC2Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Oracle Databases on AWS - Getting the Best Out of RDS and EC2Maris Elsins
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17 - Lyft - Geospatial at Scale - Daniel HochmanRedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17 - Lyft - Geospatial at Scale - Daniel HochmanRedis Labs
 
Software defined storage real or bs-2014
Software defined storage real or bs-2014Software defined storage real or bs-2014
Software defined storage real or bs-2014Howard Marks
 
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...{code} by Dell EMC
 
Counting image views using redis cluster
Counting image views using redis clusterCounting image views using redis cluster
Counting image views using redis clusterRedis Labs
 
Storage Integrations for Container Orchestrators
Storage Integrations for Container OrchestratorsStorage Integrations for Container Orchestrators
Storage Integrations for Container Orchestrators{code} by Dell EMC
 
RedisConf17 - Redis Development, An Update - @antirez
RedisConf17 - Redis Development, An Update - @antirezRedisConf17 - Redis Development, An Update - @antirez
RedisConf17 - Redis Development, An Update - @antirezRedis Labs
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your StartupAmazon Web Services
 
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More! Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More! Redis Labs
 
Data Con LA 2019 - Integrating Kafka with a Real-Time Database by David Anderson
Data Con LA 2019 - Integrating Kafka with a Real-Time Database by David AndersonData Con LA 2019 - Integrating Kafka with a Real-Time Database by David Anderson
Data Con LA 2019 - Integrating Kafka with a Real-Time Database by David AndersonData Con LA
 
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...DevOpsDays Tel Aviv
 
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...{code} by Dell EMC
 
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...Data Con LA
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedis Labs
 

What's hot (20)

RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMWalmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM
 
AliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core FeaturesAliCloud Object Storage Service (OSS) Core Features
AliCloud Object Storage Service (OSS) Core Features
 
Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Oracle Databases on AWS - Getting the Best Out of RDS and EC2Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Oracle Databases on AWS - Getting the Best Out of RDS and EC2
 
Cloudy with a Chance of Databases
Cloudy with a Chance of DatabasesCloudy with a Chance of Databases
Cloudy with a Chance of Databases
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17 - Lyft - Geospatial at Scale - Daniel HochmanRedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
 
Software defined storage real or bs-2014
Software defined storage real or bs-2014Software defined storage real or bs-2014
Software defined storage real or bs-2014
 
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
Data Analytics Using Container Persistence Through SMACK - Manny Rodriguez-Pe...
 
Counting image views using redis cluster
Counting image views using redis clusterCounting image views using redis cluster
Counting image views using redis cluster
 
Storage Integrations for Container Orchestrators
Storage Integrations for Container OrchestratorsStorage Integrations for Container Orchestrators
Storage Integrations for Container Orchestrators
 
RedisConf17 - Redis Development, An Update - @antirez
RedisConf17 - Redis Development, An Update - @antirezRedisConf17 - Redis Development, An Update - @antirez
RedisConf17 - Redis Development, An Update - @antirez
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More! Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!
 
Data Con LA 2019 - Integrating Kafka with a Real-Time Database by David Anderson
Data Con LA 2019 - Integrating Kafka with a Real-Time Database by David AndersonData Con LA 2019 - Integrating Kafka with a Real-Time Database by David Anderson
Data Con LA 2019 - Integrating Kafka with a Real-Time Database by David Anderson
 
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...
 
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...
Mesosphere and the Enterprise: Run Your Applications on Apache Mesos - Steve ...
 
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
Data Con LA 2019 - Orchestration of Blue-Green deployment model with AWS Docu...
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with Redis
 

Similar to Data Pipeline with Docker on AWS

Build agile and elastic data pipeline
Build agile and elastic data pipelineBuild agile and elastic data pipeline
Build agile and elastic data pipelineDeba Chatterjee
 
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage Red_Hat_Storage
 
Oracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle CloudOracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle Cloudpasalapudi
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSAmazon Web Services LATAM
 
The Best of Both Worlds: Implementing Hybrid IT with AWS
The Best of Both Worlds: Implementing Hybrid IT with AWSThe Best of Both Worlds: Implementing Hybrid IT with AWS
The Best of Both Worlds: Implementing Hybrid IT with AWSRightScale
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakAmazon Web Services
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with techMongoDB
 
Building Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraBuilding Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraAmazon Web Services
 
AWS Summit Atlanta Keynote
AWS Summit Atlanta KeynoteAWS Summit Atlanta Keynote
AWS Summit Atlanta KeynoteKristana Kane
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAmazon Web Services
 
Oracle Cloud Infraestructure Update
Oracle Cloud Infraestructure UpdateOracle Cloud Infraestructure Update
Oracle Cloud Infraestructure UpdateRaphaelCampelo
 
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWSTLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWSAmazon Web Services
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Jamie Kinney
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Gary Arora
 
Latest Microsoft Azure Solutions and Announcements - Presented by atidan june...
Latest Microsoft Azure Solutions and Announcements - Presented by atidan june...Latest Microsoft Azure Solutions and Announcements - Presented by atidan june...
Latest Microsoft Azure Solutions and Announcements - Presented by atidan june...David J Rosenthal
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
 

Similar to Data Pipeline with Docker on AWS (20)

Build agile and elastic data pipeline
Build agile and elastic data pipelineBuild agile and elastic data pipeline
Build agile and elastic data pipeline
 
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
Red Hat Storage Day LA - Performance and Sizing Software Defined Storage
 
02 오라클
02 오라클02 오라클
02 오라클
 
Oracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle CloudOracle E-Business Suite On Oracle Cloud
Oracle E-Business Suite On Oracle Cloud
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
The Best of Both Worlds: Implementing Hybrid IT with AWS
The Best of Both Worlds: Implementing Hybrid IT with AWSThe Best of Both Worlds: Implementing Hybrid IT with AWS
The Best of Both Worlds: Implementing Hybrid IT with AWS
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
How leading financial services organisations are winning with tech
How leading financial services organisations are winning with techHow leading financial services organisations are winning with tech
How leading financial services organisations are winning with tech
 
How Easy to Automate Application Deployment on AWS
How Easy to Automate Application Deployment on AWSHow Easy to Automate Application Deployment on AWS
How Easy to Automate Application Deployment on AWS
 
Getting Started on AWS
Getting Started on AWS Getting Started on AWS
Getting Started on AWS
 
Building Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraBuilding Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit Canberra
 
AWS Summit Atlanta Keynote
AWS Summit Atlanta KeynoteAWS Summit Atlanta Keynote
AWS Summit Atlanta Keynote
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the Cloud
 
Oci meetup v1
Oci meetup v1Oci meetup v1
Oci meetup v1
 
Oracle Cloud Infraestructure Update
Oracle Cloud Infraestructure UpdateOracle Cloud Infraestructure Update
Oracle Cloud Infraestructure Update
 
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWSTLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
TLC303_Walkthrough Setting up a Highly Available Communications Platform on AWS
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
Latest Microsoft Azure Solutions and Announcements - Presented by atidan june...
Latest Microsoft Azure Solutions and Announcements - Presented by atidan june...Latest Microsoft Azure Solutions and Announcements - Presented by atidan june...
Latest Microsoft Azure Solutions and Announcements - Presented by atidan june...
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Data Pipeline with Docker on AWS

  • 1. ROBIN SYSTEMS Build your own Data Pipeline with Docker Adeesh Fulay & Giri Kesavan
  • 2. ROBIN FREE COMMUNITY EDITION CONFIDENTIAL – RESTRICTED DISTRIBUTION https://robinsystems.com/get-robin/ Free Forever (Robin Licenses) Up to 5 EC2 Instances Auto-deploy to AWS using ‘gorobin’ tool Use pre-defined bundles or bring your own app jon@robinsystems.com adeesh@robinsystems.com
  • 3. LET’S PLAY BUZZWORD BINGO!! CONFIDENTIAL – RESTRICTED DISTRIBUTION https://www.shutterstock.com/image-illustration/word-cloud-predictive-analytics-related-tags-218879485
  • 4. EXAMPLES FROM EVERYDAY LIFE CONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 5. INDUSTRY-WIDE USE CASES CONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 6. DATA PIPELINE › Data pipeline is an automated process that executes at regular interval to ingest, cleanse, transform and/or aggregate incoming feed of data to generate the output dataset in the format that is suitable for downstream processing, with no manual intervention. CONFIDENTIAL – RESTRICTED DISTRIBUTION data pipeline: (Worldwide – Last 5 Years)
  • 7. KEY CHARACTERISTIC – CHOICE CONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 8. Container-based Compute Plane Integrated Scale-Out Block Storage Integrated Networking WHAT IS ROBIN? CONFIDENTIAL – RESTRICTED DISTRIBUTION Works Anywhere BareMetal, VM or Cloud With All Apps No changes to apps or workflows Storage Node Compute Node Storage Node Compute Node Converged Node Converged Node
  • 9. WHAT IS ROBIN? CONFIDENTIAL – RESTRICTED DISTRIBUTION Big Data NoSQL RDBMS Other Custom Apps Application Aware Workflow Manager UpgradeCloneDeploy ShareScale Access ControlHigh AvailabilityQOS Control Placement Control Security
  • 10. DATA PIPELINE ON DOCKER?? CONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 11. EASY EXAMPLE: TWITTER STREAMING USING ELK CONFIDENTIAL – RESTRICTED DISTRIBUTION Master Node Data Node Data Node Master Node
  • 12. DEMO 1 › Tweet now with the words ‘robinsystems’ in it CONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 13. HOW WE DID IT? • 3 EC2 Instances, 6 EBS volumes • 6 Docker containers, 3 independent images • Ovs bridge for cross host networking • Private IP for each container • Virtual volumes mounted on each container • Ports mapped for ES and Kibana for external access CONFIDENTIAL – RESTRICTED DISTRIBUTION EC2 M4.2xl EC2 M4.2xl EBS EBS EBS EBS Robin Storage Plane Robin Compute Plane EC2 M4.2xl … OVS Bridge OVS Bridge OVS Bridge Primary Private IPs Virtual Volumes Containers ..… Secondary Private IPs
  • 14. KNIME-spark gateway Data at Rest (Encryption) Kerberos (AD) REAL-WORLD EXAMPLE: SECURITY ANALYSIS CONFIDENTIAL – RESTRICTED DISTRIBUTION HDP HDFS Ranger (AuthZ) SSLStream Sources Data Store Security Layer Other Services
  • 15. HOW WE DID IT? • 30 physical servers • 2 Multitenant clusters - Dev (800 TB) & InfoSec (1.5 PB) • 100+ Docker containers • ~1 service per container • Ovs bridge for cross host networking • Routable IP address for each container • Virtual volumes mounted on each container CONFIDENTIAL – RESTRICTED DISTRIBUTION Physical Servers Physical Servers HDDs HDDs HDDsStorage Pool Compute Pool Physical Servers … OVS Bridge OVS Bridge OVS Bridge Virtual Volumes Containers ... 2 Multitenant clusters
  • 16. BENEFITS › Rapid Deploy › Deploy time/cluster = 40 mins (Originally 2 weeks) › No need to size hardware by App › Decouple compute and storage › HDP recommends keeping only 48-96 TB per data node › Improved server and storage utilization (~40%) › Enforce data locality for performance › Multitenancy for any application with performance isolation CONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 17. DEMO 2 CONFIDENTIAL – RESTRICTED DISTRIBUTION
  • 18. How do I provide developers access to data ? Can I run multiple pipelines on the same setup without compromising performance ? When data sets & workload grows , can I avoid under provisioning ? How do you handle spikes and growth ? How do I quickly deploy my entire pipeline ? How do I provide developers access to data ? Can I run multiple pipelines on the same setup without compromising performance ? When data sets & workload grows , can I avoid under provisioning ? How do you handle spikes and growth ? How do I quickly deploy my entire pipeline ? DATA PIPELINE - CHALLENGES CONFIDENTIAL – RESTRICTED DISTRIBUTION How do I avoid under or over provisioning resources? DATA PIPELINE IS A CLUSTER OF MULTIPLE CLUSTERED APPLICATIONS EXPECTED TO WORK IN UNISON How do I quickly deploy my entire pipeline ? How do you handle spikes and growth ? How do I provide developers access to data ? Can I run multiple pipelines on the same setup without compromising performance ?
  • 19. CHALLENGES WITH MANAGING DATA PIPELINES CONFIDENTIAL – RESTRICTED DISTRIBUTION 8 Billion Exploding Data Volume 24 feeds into ElasticSearch 8 billion security events per day 53 billion documents 1 Week Poor Agility Week+ to Provision Clusters 10hrs+ to take a Snapshot $3 Million High Cost Real Time traffic too much for VMs on Commodity Hardware Expensive All-Flash servers to meet performance needs $3M+ hardware spend and growing
  • 20. CHALLENGES WITH MANAGING DATA PIPELINES CONFIDENTIAL – RESTRICTED DISTRIBUTION “Getting Big Data projects to Production is a challenge … Only 15% of businesses reported deploying their Big Data project to Production” – Gartner Big Data Survey, Oct 2016
  • 21. Let Applications Drive Infrastructure ROBIN BENEFITS Big Data NoSQL RDBMS Other Custom Apps Instant Access Lower Complexity Lower Cost Deploy Entire Pipelines in Minutes Same Workflow across Apps More Apps on Same Resources Instant Sharing – No Data Copy SLA Guarantees Self-Service Dev, Automation for Ops Test-before-Commit Dynamic Scaling Faster Time to Market Simplify Ops10x Lower TCO2x3x Faster Projects
  • 22. ROBIN FREE COMMUNITY EDITION CONFIDENTIAL – RESTRICTED DISTRIBUTION https://robinsystems.com/get-robin/ Free Forever (Robin Licenses) Up to 5 EC2 Instances Auto-deploy to AWS using ‘gorobin’ tool Use pre-defined bundles or bring your own app jon@robinsystems.com adeesh@robinsystems.com