There’s a lot of buzz around different DevOps tools being thrown around, and it can be difficult to break through the noise. We plan to share our success story of what to do/not to do while powering your software with the most acclaimed DevOps technologies. From provisioning clusters with Kubernetes to scaling the product for global user base; from Streaming live data using Kafka/Spark to consolidating it in Athena; from monitoring with Kibana to continuously integrating & deploying with CircleCI, we promise to you a smooth ride. Come hear our journey of moving a monolith to elastic infrastructure
5. A leading manufacturing company
Headquarters in Italy
OUR PROBLEM STATEMENT
What our client’s business looks like
Streaming large data
Digital Transformation
Intent to get better insights
on the process using data
Each factory with its
own physical data storage
Close to 20 plants globally
at
6. OUR PROBLEM STATEMENT
What constraints did we have
at
Extensibility
and availability
of products in
20 plants
Growing
number of
products and
data
Keeping a low
operational and
maintenance
cost
Staying cloud
agnostic
Building a proof
of concept
before an org.
wide roll out
7. Factory HQ
OUR PROBLEM STATEMENT
What technology challenges were we dealing with
at
1
APPLICATION
INFRASTRUCTURE
4
CONTINUOUS
DEPLOYMENT
2
DATA
STREAMING
3
QUERYING
SERVICE
10. 1. APPLICATION INFRASTRUCTURE
How we made the choice
at
✓ Open source
✓ Provides primitives for modern application
✓ auto scaling of service
✓ automated rollouts and roll backs
✓ auto discovery
✓ self healing
✓ Zero down time
✓ Cloud agnostic deployment
11. 1. APPLICATION INFRASTRUCTURE
How we implemented it
at
To create the basic infrastructure on AWS
To provision the Kubernetes cluster
14. 2. DATA STREAMING
How we made the choice
at
➡ Low cost
➡ Inbuilt notification feature
➡ Does not maintain a persistent
checkpoint.
➡ No infra maintenance
➡ To retain data for more than 24
hours, you need to pay
additional cost
➡ Slow as compared to Kafka due
to it’s replication factor over
multiple zones
➡ High Cost
➡ AWS specific
➡ Opensource
➡ Fast, Durable, Scalable and very
high throughput
➡ Lower cost
➡ Durable logs that allow us to
replay messages.
15. 2. DATA STREAMING
How we implemented it
at
We used confluent platform which
leverage the features of Kafka.
➡ Used official docker images
➡ Deployed on Kubernetes
➡ Added persistent volume
➡ Used spark streaming to do the
manipulation of data
19. 3. QUERYING SERVICE
Why we made the choice
at
➡ Low infrastructure cost
➡ Built on Presto sql query engine.
➡ You only pay for the queries you run
➡ No ETL need, uses structured data stored on S3 as objects
➡ Supports CSV, parquet, JSON and structured file formats
Parquet files Data queried
21. 3. QUERYING SERVICE
How Athena query works
at
plant=Location 1
plant=Location 2
plant=Location 3
evs_start_date=2018-10-28 06:00
evs_start_date=2018-10-29 06:00
plant=Location n evs_start_date=2018-11-30 06:00
file1.parquet
file2.parquet
file3.parquet
filen.parquet
Select * from <table> where country=‘Europe’ and plant=‘Location3’ and evs_start_date=‘2018-10-29 6:00’
country=USA
country=China
country=India
country=Europe
24. 4. CONTINUOUS DEPLOYMENT
How they made the choice
at
Better on-premise support
Competitive enterprise plan pricing
Better starting cost for enterprises
28. THE FINAL PICTURE
How it ended up looking like
at
Multiple Data
Sources
Kafka S3 (Avro files) Structured streaming S3( Parquet files) Athena
1 2 3 4 5
Logstash Log FilesElasticsearchKibana
CircleCI
Deployment
Application
Kubernetes Cluster
30. OUR LEARNINGS
Limitations through our journey
at
Current Architecture Limitation:
➡ Athena supports by default 20 DDL and 20 DML queries at the same
time
➡ Athena loads the data from scratch for each request instead of caching
➡ Small size of parquet files made the query very slow, since it scans
through all the files
AWS Athena is a good tool to do data analysis. Wasn’t suitable for our use-case
where we wanted to handle a lot more concurrent user requests
31. OUR LEARNINGS
Improvements through our journey
at
Source Kafka RDS App
➡ Developed custom Kafka producer and consumer
➡ Kafka producer processes the data before dumping on RDS
➡ Reduced the amount of data stored on RDS
➡ Used indexing to make the query run faster
➡ Moved away from the concurrency issues
32. OUR LEARNINGS
Testing data Sanity on pipelines
at
Validate the incoming data & check for:
➡ any holes/ missing values
➡ quality of data for bad values
➡ faster notification on if streaming and refresh jobs fail
33. atOUR LEARNINGS
Tests coverage on pipeline
Build
Deploy
to Test
Run UI
Tests
Run API
Smoke Tests
Deploy
to QA
Run Data
Sanity Tests
Deploy
to Prod
Run Data
Sanity Tests
Run Front-end
Unit Tests
Run Backend
Unit Tests
35. THE FUTURE
What we plan to do next
at
CHAOS
MONKEY
Test the ecosystem for resilience and it’s self healing mechanism
INFRASTRUCTURE
IMPROVEMENTS
Helm to manage Kubernetes applications
Moving towards Amazon EKS
NOTIFY
FAILURES
Send appropriate alerts as well as logs to the concerned group