SlideShare a Scribd company logo
1 of 18
Download to read offline
Real-Time Data Processing Pipeline &
Visualization with Docker, Spark, Kafka
and Cassandra
Roberto G. Hashioka – 2016-10-04 – TIAD – Paris
Personal Information
• Roberto Gandolfo Hashioka
• @rogaha (Github) e @rhashioka (Twitter)
• Finance -> Software Engineer
• Growth & Data Engineer at Docker
Summary
• Background / Motivation
• Project Goals
• How to build it?
• DEMO
Background
• Gather of data from multiple sources and process them in “real-time”
• Transform raw data into meaningful and useful information used to enable more effective
decision-making process
• Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities
for future investments
• Data transparency and standardization
Project Goals
• Create a data processing pipeline that can handle a huge amount of events per second
• Automate the development environment — Docker compose.
• Automate the remote machines management — Docker for AWS / Machine.
• Reduce the time to market / time to development — New hires / new features.
Project / Language Stack
How to build it?
• Step 1: Install Docker for Mac/Win and dockerize all the applications
link: https://www.docker.com/products/docker
Exemplo de Dockerfile
-----------------------------------------------------------------------------------------------------------
FROM ubuntu:14.04
MAINTAINER Roberto Hashioka (roberto@docker.com)
RUN apt-get update && apt-get install -y nginx
RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html
EXPOSE 80
------------------------------------------------------------------------------------------------------------
$ docker build –t rogaha/web_demotiad2016 .
$ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016
How to build it?
• Step 2: Define your services stack with a docker-compose file
Docker Compose
containers:
web:
build: .
command: python app.py
ports:
- "5000:5000"
volumes:
- .:/code
links:
- redis
environment:
- PYTHONUNBUFFERED=1
redis:
image: redis:latest
command: redis-server --appendonly yes
How to build it?
• Step 3: Test the applications locally from your laptop using containers
How to build it?
How to build it?
• Step 4: Provision your remote servers and deploy your containers
How to build it?
How to build it?
• Step 5: Scale your services with Docker swarm
DEMO
source code: https://github.com/rogaha/data-processing-pipeline
Open Source Projects Used
• Docker (https://github.com/docker/docker)
• An open platform for distributed applications for developers and sysadmins
• Apache Spark / Spark SQL (https://github.com/apache/spark)
• A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD)
• Apache Kafka (https://github.com/apache/kafka)
• A fast and scalable pub-sub messaging service
• Apache Zookeeper (https://github.com/apache/zookeeper)
• A distributed configuration service, synchronization service, and naming registry for large distributed systems
• Apache Cassandra (https://github.com/apache/cassandra)
• Scalable, high-available and distributed columnar NoSQL database
• D3 (https://github.com/mbostock/d3)
• A JavaScript visualization library for HTML and SVG.
Thanks!
Questions?
@rhashioka

More Related Content

What's hot

What's hot (20)

Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for UnknownsTectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns
 
Git ops & Continuous Infrastructure with terra*
Git ops  & Continuous Infrastructure with terra*Git ops  & Continuous Infrastructure with terra*
Git ops & Continuous Infrastructure with terra*
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federation
 
Cloud infrastructure as code
Cloud infrastructure as codeCloud infrastructure as code
Cloud infrastructure as code
 
Cloud Native Unleashed
Cloud Native UnleashedCloud Native Unleashed
Cloud Native Unleashed
 
Scaling i/o bound Microservices
Scaling i/o bound MicroservicesScaling i/o bound Microservices
Scaling i/o bound Microservices
 
Die große Cloud-native FaaS-Hitparade
Die große Cloud-native FaaS-HitparadeDie große Cloud-native FaaS-Hitparade
Die große Cloud-native FaaS-Hitparade
 
使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster 使用 Prometheus 監控 Kubernetes Cluster
使用 Prometheus 監控 Kubernetes Cluster
 
Gitlab ci, cncf.sk
Gitlab ci, cncf.skGitlab ci, cncf.sk
Gitlab ci, cncf.sk
 
Terraform Code Reviews: Supercharged with Conftest
Terraform Code Reviews: Supercharged with ConftestTerraform Code Reviews: Supercharged with Conftest
Terraform Code Reviews: Supercharged with Conftest
 
Zero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesZero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with Kubernetes
 
Kubernetes or OpenShift - choosing your container platform for Dev and Ops
Kubernetes or OpenShift - choosing your container platform for Dev and OpsKubernetes or OpenShift - choosing your container platform for Dev and Ops
Kubernetes or OpenShift - choosing your container platform for Dev and Ops
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
 
DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...
DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...
DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...
 
Building streaming applications using a managed Kafka service | DevNation Tec...
Building streaming applications using a managed Kafka service | DevNation Tec...Building streaming applications using a managed Kafka service | DevNation Tec...
Building streaming applications using a managed Kafka service | DevNation Tec...
 
The Big Cloud native FaaS Lebowski
The Big Cloud native FaaS Lebowski The Big Cloud native FaaS Lebowski
The Big Cloud native FaaS Lebowski
 
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
 
Kubernetes extensibility: crd & operators
Kubernetes extensibility: crd & operators Kubernetes extensibility: crd & operators
Kubernetes extensibility: crd & operators
 

Similar to TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018
Ortus Solutions, Corp
 

Similar to TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra (20)

Intro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloIntro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer Aiello
 
Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and Docker
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - Indroduc
 
Getting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDBGetting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDB
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
 
Cloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesCloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit Kubernetes
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes Toolbox
 
betterCode Workshop: Effizientes DevOps-Tooling mit Go
betterCode Workshop:  Effizientes DevOps-Tooling mit GobetterCode Workshop:  Effizientes DevOps-Tooling mit Go
betterCode Workshop: Effizientes DevOps-Tooling mit Go
 
Into The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerInto The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and docker
 
Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018
 
Containers as a Service with Docker
Containers as a Service with DockerContainers as a Service with Docker
Containers as a Service with Docker
 
Docker Container As A Service - March 2016
Docker Container As A Service - March 2016Docker Container As A Service - March 2016
Docker Container As A Service - March 2016
 
Docker Enterprise Workshop - Technical
Docker Enterprise Workshop - TechnicalDocker Enterprise Workshop - Technical
Docker Enterprise Workshop - Technical
 
Deploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows ContainersDeploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows Containers
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
 

More from The Incredible Automation Day

More from The Incredible Automation Day (20)

A smooth migration to Docker focusing on build pipelines - TIAD Camp Docker
A smooth migration to Docker focusing on build pipelines - TIAD Camp DockerA smooth migration to Docker focusing on build pipelines - TIAD Camp Docker
A smooth migration to Docker focusing on build pipelines - TIAD Camp Docker
 
Docker in real life and in the Cloud - TIAD Camp Docker
Docker in real life and in the Cloud - TIAD Camp DockerDocker in real life and in the Cloud - TIAD Camp Docker
Docker in real life and in the Cloud - TIAD Camp Docker
 
Orchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp DockerOrchestrating Docker in production - TIAD Camp Docker
Orchestrating Docker in production - TIAD Camp Docker
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
 
Strategy, planning and governance for enterprise deployments of containers - ...
Strategy, planning and governance for enterprise deployments of containers - ...Strategy, planning and governance for enterprise deployments of containers - ...
Strategy, planning and governance for enterprise deployments of containers - ...
 
Cluster SQL - TIAD Camp Microsoft Cloud Readiness
Cluster SQL - TIAD Camp Microsoft Cloud ReadinessCluster SQL - TIAD Camp Microsoft Cloud Readiness
Cluster SQL - TIAD Camp Microsoft Cloud Readiness
 
Build the VPC - TIAD Camp Microsoft Cloud Readiness
Build the VPC - TIAD Camp Microsoft Cloud ReadinessBuild the VPC - TIAD Camp Microsoft Cloud Readiness
Build the VPC - TIAD Camp Microsoft Cloud Readiness
 
Opening Keynote - TIAD Camp Microsoft Cloud Readiness
Opening Keynote - TIAD Camp Microsoft Cloud ReadinessOpening Keynote - TIAD Camp Microsoft Cloud Readiness
Opening Keynote - TIAD Camp Microsoft Cloud Readiness
 
Replatforming - TIAD Camp Microsoft Cloud Readiness
Replatforming - TIAD Camp Microsoft Cloud ReadinessReplatforming - TIAD Camp Microsoft Cloud Readiness
Replatforming - TIAD Camp Microsoft Cloud Readiness
 
GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
GitLab CI Packer - TIAD Camp Microsoft Cloud ReadinessGitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness
 
Active Directory - TIAD Camp Microsoft Cloud Readiness
Active Directory - TIAD Camp Microsoft Cloud ReadinessActive Directory - TIAD Camp Microsoft Cloud Readiness
Active Directory - TIAD Camp Microsoft Cloud Readiness
 
Application Stack - TIAD Camp Microsoft Cloud Readiness
Application Stack - TIAD Camp Microsoft Cloud ReadinessApplication Stack - TIAD Camp Microsoft Cloud Readiness
Application Stack - TIAD Camp Microsoft Cloud Readiness
 
Keynote TIAD Camp Serverless
Keynote TIAD Camp ServerlessKeynote TIAD Camp Serverless
Keynote TIAD Camp Serverless
 
From AIX to Zero-ops by Pierre Baillet
From AIX to Zero-ops by Pierre BailletFrom AIX to Zero-ops by Pierre Baillet
From AIX to Zero-ops by Pierre Baillet
 
Serverless low cost analytics by Adways y Audric Guigon
Serverless low cost analytics by Adways y Audric GuigonServerless low cost analytics by Adways y Audric Guigon
Serverless low cost analytics by Adways y Audric Guigon
 
Operationnal challenges behind Serverless architectures by Laurent Bernaille
Operationnal challenges behind Serverless architectures by Laurent BernailleOperationnal challenges behind Serverless architectures by Laurent Bernaille
Operationnal challenges behind Serverless architectures by Laurent Bernaille
 
Build chatbots with api.ai and Google cloud functions
Build chatbots with api.ai and Google cloud functionsBuild chatbots with api.ai and Google cloud functions
Build chatbots with api.ai and Google cloud functions
 
Real time serverless data pipelines on AWS
Real time serverless data pipelines on AWSReal time serverless data pipelines on AWS
Real time serverless data pipelines on AWS
 
Azure functions
Azure functionsAzure functions
Azure functions
 
TIAD 2016 - Beyond windowsautomation
TIAD 2016 - Beyond windowsautomation TIAD 2016 - Beyond windowsautomation
TIAD 2016 - Beyond windowsautomation
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

  • 1. Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra Roberto G. Hashioka – 2016-10-04 – TIAD – Paris
  • 2. Personal Information • Roberto Gandolfo Hashioka • @rogaha (Github) e @rhashioka (Twitter) • Finance -> Software Engineer • Growth & Data Engineer at Docker
  • 3. Summary • Background / Motivation • Project Goals • How to build it? • DEMO
  • 4. Background • Gather of data from multiple sources and process them in “real-time” • Transform raw data into meaningful and useful information used to enable more effective decision-making process • Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities for future investments • Data transparency and standardization
  • 5. Project Goals • Create a data processing pipeline that can handle a huge amount of events per second • Automate the development environment — Docker compose. • Automate the remote machines management — Docker for AWS / Machine. • Reduce the time to market / time to development — New hires / new features.
  • 7. How to build it? • Step 1: Install Docker for Mac/Win and dockerize all the applications link: https://www.docker.com/products/docker
  • 8. Exemplo de Dockerfile ----------------------------------------------------------------------------------------------------------- FROM ubuntu:14.04 MAINTAINER Roberto Hashioka (roberto@docker.com) RUN apt-get update && apt-get install -y nginx RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html EXPOSE 80 ------------------------------------------------------------------------------------------------------------ $ docker build –t rogaha/web_demotiad2016 . $ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016
  • 9. How to build it? • Step 2: Define your services stack with a docker-compose file
  • 10. Docker Compose containers: web: build: . command: python app.py ports: - "5000:5000" volumes: - .:/code links: - redis environment: - PYTHONUNBUFFERED=1 redis: image: redis:latest command: redis-server --appendonly yes
  • 11. How to build it? • Step 3: Test the applications locally from your laptop using containers
  • 13. How to build it? • Step 4: Provision your remote servers and deploy your containers
  • 15. How to build it? • Step 5: Scale your services with Docker swarm
  • 17. Open Source Projects Used • Docker (https://github.com/docker/docker) • An open platform for distributed applications for developers and sysadmins • Apache Spark / Spark SQL (https://github.com/apache/spark) • A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD) • Apache Kafka (https://github.com/apache/kafka) • A fast and scalable pub-sub messaging service • Apache Zookeeper (https://github.com/apache/zookeeper) • A distributed configuration service, synchronization service, and naming registry for large distributed systems • Apache Cassandra (https://github.com/apache/cassandra) • Scalable, high-available and distributed columnar NoSQL database • D3 (https://github.com/mbostock/d3) • A JavaScript visualization library for HTML and SVG.