SlideShare a Scribd company logo
Designing for Operability and Managability
Gaurav Bahrani
CTO,
Shanker Balan
Managing Consultant, sysCredence
Introduction
● Gaurav Bahrani, CTO, MeTripping
○ Building intelligent search engine for travel
○ Expertise in building large scale distributed systems
■ SQL, NoSQL, Big Data
■ Database engines
■ Fault-tolerant systems
○ ex-VPE Cloud Lending Solutions (Fin-tech startup), ex-Yahoo, ex-MS, ex-HP
● Shanker Balan, Freelance DevOps Consultant
○ Infrastructure & Cloud
○ DevOps Consulting For Startups
■ Infibeam, Instamojo, Logistimo, Widas, Quintype, dAlchemy IOT
○ ex-InMobi, ex-Yahoo
Agenda
1. MeTripping - Introduction
2. Operability & Manageability Challenges
3. Design & Architecture Best Practices
4. Q & A
MeTripping - Introduction (1)
MeTripping - Introduction (2)
Architecture
Challenges
● Scale and performance
● Varying user traffic
● Data integration with 10s of data provides - different formats and SLAs
dynamic
data
static
data
Operability / Managability Challenges
● Infrastructure & Environment
● Build / Release Process
● Metrics & Availability
● Scaling & Cost Management
● Security & Compliance
● Team Structure
Infrastructure & Environment
● OS Standardisation
○ Latest LTS Releases / Minimal Container OS
○ Minimal Docker Images (Alpine / Atomic)
● Package Management
○ Tarball Installation vs. Package Repos
○ Adopt Docker
● Config Management
○ Hand Manage
○ Ansible vs. Chef vs. Puppet
● Service Management
○ Manual start / stop of services
○ Supervisor vs. Systemd
Build & Release Process
● Build on laptops
● Using IDE For Deployment
● Hand Manage artifacts to remote servers
● Version Management
Metrics & Availability
● Health Checks & External Service Availability
○ Site 24x7 / Uptime Robot / Gomez
● Server Health Monitoring
○ CloudWatch, DataDog, Nagios, Sensu etc
● Application Performance Monitoring
○ Istio / Hystrix
○ Newrelic, App Dynamics, Elastic APM, StackDriver
○ CloudWatch, sysDig
● Logs (ELK)
Security & Compliance
● Secure Coding Guidelines
○ OWASP Top 10
○ Follow Industry Best Practices (PCI, HIPAA)
● Access Controls
○ Central User Management
○ Do not use shared accounts
○ Follow least privilege model
● Restrict Network Access
○ Use both Public & Private Networks
○ Restrict login access only to trusted networks
○ Protect Admin Pages with Google SSO + .htaccess
Application Availability and Scalability
● Resource allocation issues
○ Compute
■ Using old generation servers
■ Using “burstable” instances for production
■ Using high CPU instances without looking at actual CPU utilisation
○ Storage
■ Using magnetic storage
■ Under-provisioning / over-provisioning of storage
■ Provisioned IOPS with Databases
■ Using ephemeral storage
○ Network
■ Ephemeral IPs for Internet facing servers
■ SSL Termination on Application (Apache / Nginx)
■ Nginx / Apache as Application Load Balancers
■ Serving static assets from application
■ Mapping domains to Load Balancer IPs
Managing Costs
● Use less SaaS & PaaS
○ Binpack with Docker
○ Run local MySQL, ElasticSearch, Kafka, ELK etc
● Separate Accounts For BUs & Environments
○ Non Prod Environments (staging, dev etc)
○ Prod Environments
● Shutdown Non Prod Environments when not in use
● Housekeep regularly
Team Structure
● DevOps is hardest to hire (and retain)
● Training freshers in DevOps is time consuming
● What works well
○ Make Engineering Self Sufficient With Operations (Dev+Ops)
■ Make monitoring and deployment as self-service
○ Use Infrastructure As Code tools (Terraform)
○ Rotate oncall within the Dev Team
● Have a shared team to manage Infra
○ Account management
○ IT Stuff
○ Backup / Restore etc
Design & Architecture Best Practices
● System instrumentation - Systems and application monitoring
● Web-services architecture
● System standardisation (dockers)
○ Consistent environments
○ Simplified builds / releases
○ Scalable architecture
● Data systems best practices
○ Design for scale and performance
System Instrumentation - Systems / application monitoring
● Application monitoring setup is “must-have” requirement for all applications
○ Helps identify system and application deficiencies
○ Helps identify problems, proactively
○ Results in efficient (performance and cost effective) systems
Web-services architecture
● Create web-services and not “spider-web” of services
● Create fewer “power packed” services vs. many, many “simplistic” services
○ Push down complex data relationships into application code / database
● Create separate services for different data response times
○ Web-services for data stored in redis / memcached / elasticsearch be kept separate from web-services for
data from RDBMS
● Use tools such as Postman and Swagger to author and document web-services
Elasticsearch Postgres / Mongo Web Crawler
Hadoop / Spark
Middle Tier
Redis
System standardisation (1)
● Standard AMI for all systems
System standardisation (2)
● Minimalistic “coreos” and manage configurations via infrastructure with
Terraform
System standardization (3)
● Standard base docker image for all
dockers
○ OS: Ubuntu 16.04
○ Python: 3.4
○ Setup non-system user
System standardisation (4)
● Separate Git repository for build and
configurations
○ MeTrippingDeloyment has docker compose ymls for build
and deployment settings for dev / stage / prod
environments
○ .env files contain environment settings (sourced in by
docker-compose)
System standardisation (5)
● Build: docker-compose.sh -f docker-compose-common.yml -rv v1 -rt 2018.03.19 build mt-ranker-build
● Deploy: docker-compose.sh -f docker-compose-staging.yml -rv v1 -rt 2018.03.19 up -d mt-ranker
Data Systems Best Practices
● Embrace hybrid (SQL + NoSQL + Big Data) system design
○ Store transaction data in RDBMS
■ Consider data partitioning
■ Move archive data to Big Data systems with Long Term Storage Backend
○ Store dimension / non-transaction data in NoSQL
■ MondoDB vs. CouchDB vs. Elasticsearch / Solr
○ Move complex data joins to backend data pipelines
○ Simplify star schema
● System design considerations
○ Use “non-constrained” CPUs
○ Use SSDs for data
Summary
● Code -> Build -> Deploy -> Manage -> Burn, Burn, Burn -> Re-Design ->
Re-Code -> Re-Build -> Re-Deploy -> Burn, Burn
vs.
● Design -> Code -> Build -> Deploy -> Manage -> Burn Less
Q & A
Thank You!
Gaurav (gaurav@metripping.com), Shanker (shanker@syscredence.com)

More Related Content

What's hot

Webinar slides: How to Migrate from Oracle DB to MariaDB
Webinar slides: How to Migrate from Oracle DB to MariaDBWebinar slides: How to Migrate from Oracle DB to MariaDB
Webinar slides: How to Migrate from Oracle DB to MariaDB
Severalnines
 
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Severalnines
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Severalnines
 
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebula Project
 
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControlWebinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Severalnines
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQL
MariaDB plc
 
How to power microservices with MariaDB
How to power microservices with MariaDBHow to power microservices with MariaDB
How to power microservices with MariaDB
MariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
MariaDB plc
 
Getting started in the cloud for developers
Getting started in the cloud for developersGetting started in the cloud for developers
Getting started in the cloud for developers
MariaDB plc
 
Netflix Data Benchmark @ HPTS 2017
Netflix Data Benchmark @ HPTS 2017Netflix Data Benchmark @ HPTS 2017
Netflix Data Benchmark @ HPTS 2017
Ioannis Papapanagiotou
 
CCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDBCCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDB
MariaDB plc
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2
MariaDB plc
 
How Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDBHow Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDB
MariaDB plc
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup Group
Morgan Tocker
 
CEPH DAY BERLIN - WELCOME
CEPH DAY BERLIN - WELCOME CEPH DAY BERLIN - WELCOME
CEPH DAY BERLIN - WELCOME
Ceph Community
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
MariaDB plc
 
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebula Project
 
Introducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live FrankfurtIntroducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live Frankfurt
Morgan Tocker
 
2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management
Daliya Spasova
 
SJTU Summary report
SJTU Summary reportSJTU Summary report
SJTU Summary report
Yves Chan
 

What's hot (20)

Webinar slides: How to Migrate from Oracle DB to MariaDB
Webinar slides: How to Migrate from Oracle DB to MariaDBWebinar slides: How to Migrate from Oracle DB to MariaDB
Webinar slides: How to Migrate from Oracle DB to MariaDB
 
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
 
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDBWebinar slides: Migrating to Galera Cluster for MySQL and MariaDB
Webinar slides: Migrating to Galera Cluster for MySQL and MariaDB
 
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
OpenNebulaconf2017EU: OpenNebula 5.4 and Beyond by Tino Vázquez and Ruben S. ...
 
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControlWebinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQL
 
How to power microservices with MariaDB
How to power microservices with MariaDBHow to power microservices with MariaDB
How to power microservices with MariaDB
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
Getting started in the cloud for developers
Getting started in the cloud for developersGetting started in the cloud for developers
Getting started in the cloud for developers
 
Netflix Data Benchmark @ HPTS 2017
Netflix Data Benchmark @ HPTS 2017Netflix Data Benchmark @ HPTS 2017
Netflix Data Benchmark @ HPTS 2017
 
CCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDBCCV: migrating our payment processing system to MariaDB
CCV: migrating our payment processing system to MariaDB
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2
 
How Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDBHow Pixid dropped Oracle and went hybrid with MariaDB
How Pixid dropped Oracle and went hybrid with MariaDB
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup Group
 
CEPH DAY BERLIN - WELCOME
CEPH DAY BERLIN - WELCOME CEPH DAY BERLIN - WELCOME
CEPH DAY BERLIN - WELCOME
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
 
Introducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live FrankfurtIntroducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live Frankfurt
 
2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management
 
SJTU Summary report
SJTU Summary reportSJTU Summary report
SJTU Summary report
 

Similar to Designing for operability and managability

introduction to micro services
introduction to micro servicesintroduction to micro services
introduction to micro services
Spyros Lambrinidis
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
Weston Bassler
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
Amihay Zer-Kavod
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023
Nelson Calero
 
Introducing TiDB Operator
Introducing TiDB OperatorIntroducing TiDB Operator
Introducing TiDB Operator
Kevin Xu
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntLast Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Mark Grebler
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
Sharma Podila
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
Kriangkrai Chaonithi
 
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Andrejs Prokopjevs
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Demi Ben-Ari
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
Shawn Zhu
 
Best Practices with Sitecore
Best Practices with SitecoreBest Practices with Sitecore
Best Practices with Sitecore
Anant Corporation
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Next gen software operations models in the cloud
Next gen software operations models in the cloudNext gen software operations models in the cloud
Next gen software operations models in the cloud
Aarno Aukia
 
Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator [Cologne, Germany]Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator [Cologne, Germany]
Kevin Xu
 

Similar to Designing for operability and managability (20)

introduction to micro services
introduction to micro servicesintroduction to micro services
introduction to micro services
 
Modern Elastic Datacenter Architecture
Modern Elastic Datacenter ArchitectureModern Elastic Datacenter Architecture
Modern Elastic Datacenter Architecture
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-AriThinking DevOps in the era of the Cloud - Demi Ben-Ari
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
 
Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023Database automation guide - Oracle Community Tour LATAM 2023
Database automation guide - Oracle Community Tour LATAM 2023
 
Introducing TiDB Operator
Introducing TiDB OperatorIntroducing TiDB Operator
Introducing TiDB Operator
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntLast Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
 
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-AriThinking DevOps in the Era of the Cloud - Demi Ben-Ari
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
 
Best Practices with Sitecore
Best Practices with SitecoreBest Practices with Sitecore
Best Practices with Sitecore
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Next gen software operations models in the cloud
Next gen software operations models in the cloudNext gen software operations models in the cloud
Next gen software operations models in the cloud
 
Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator [Cologne, Germany]Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator [Cologne, Germany]
 

Recently uploaded

ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 

Recently uploaded (20)

ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 

Designing for operability and managability

  • 1. Designing for Operability and Managability Gaurav Bahrani CTO, Shanker Balan Managing Consultant, sysCredence
  • 2. Introduction ● Gaurav Bahrani, CTO, MeTripping ○ Building intelligent search engine for travel ○ Expertise in building large scale distributed systems ■ SQL, NoSQL, Big Data ■ Database engines ■ Fault-tolerant systems ○ ex-VPE Cloud Lending Solutions (Fin-tech startup), ex-Yahoo, ex-MS, ex-HP ● Shanker Balan, Freelance DevOps Consultant ○ Infrastructure & Cloud ○ DevOps Consulting For Startups ■ Infibeam, Instamojo, Logistimo, Widas, Quintype, dAlchemy IOT ○ ex-InMobi, ex-Yahoo
  • 3. Agenda 1. MeTripping - Introduction 2. Operability & Manageability Challenges 3. Design & Architecture Best Practices 4. Q & A
  • 5. MeTripping - Introduction (2) Architecture Challenges ● Scale and performance ● Varying user traffic ● Data integration with 10s of data provides - different formats and SLAs dynamic data static data
  • 6. Operability / Managability Challenges ● Infrastructure & Environment ● Build / Release Process ● Metrics & Availability ● Scaling & Cost Management ● Security & Compliance ● Team Structure
  • 7. Infrastructure & Environment ● OS Standardisation ○ Latest LTS Releases / Minimal Container OS ○ Minimal Docker Images (Alpine / Atomic) ● Package Management ○ Tarball Installation vs. Package Repos ○ Adopt Docker ● Config Management ○ Hand Manage ○ Ansible vs. Chef vs. Puppet ● Service Management ○ Manual start / stop of services ○ Supervisor vs. Systemd
  • 8. Build & Release Process ● Build on laptops ● Using IDE For Deployment ● Hand Manage artifacts to remote servers ● Version Management
  • 9. Metrics & Availability ● Health Checks & External Service Availability ○ Site 24x7 / Uptime Robot / Gomez ● Server Health Monitoring ○ CloudWatch, DataDog, Nagios, Sensu etc ● Application Performance Monitoring ○ Istio / Hystrix ○ Newrelic, App Dynamics, Elastic APM, StackDriver ○ CloudWatch, sysDig ● Logs (ELK)
  • 10. Security & Compliance ● Secure Coding Guidelines ○ OWASP Top 10 ○ Follow Industry Best Practices (PCI, HIPAA) ● Access Controls ○ Central User Management ○ Do not use shared accounts ○ Follow least privilege model ● Restrict Network Access ○ Use both Public & Private Networks ○ Restrict login access only to trusted networks ○ Protect Admin Pages with Google SSO + .htaccess
  • 11. Application Availability and Scalability ● Resource allocation issues ○ Compute ■ Using old generation servers ■ Using “burstable” instances for production ■ Using high CPU instances without looking at actual CPU utilisation ○ Storage ■ Using magnetic storage ■ Under-provisioning / over-provisioning of storage ■ Provisioned IOPS with Databases ■ Using ephemeral storage ○ Network ■ Ephemeral IPs for Internet facing servers ■ SSL Termination on Application (Apache / Nginx) ■ Nginx / Apache as Application Load Balancers ■ Serving static assets from application ■ Mapping domains to Load Balancer IPs
  • 12. Managing Costs ● Use less SaaS & PaaS ○ Binpack with Docker ○ Run local MySQL, ElasticSearch, Kafka, ELK etc ● Separate Accounts For BUs & Environments ○ Non Prod Environments (staging, dev etc) ○ Prod Environments ● Shutdown Non Prod Environments when not in use ● Housekeep regularly
  • 13. Team Structure ● DevOps is hardest to hire (and retain) ● Training freshers in DevOps is time consuming ● What works well ○ Make Engineering Self Sufficient With Operations (Dev+Ops) ■ Make monitoring and deployment as self-service ○ Use Infrastructure As Code tools (Terraform) ○ Rotate oncall within the Dev Team ● Have a shared team to manage Infra ○ Account management ○ IT Stuff ○ Backup / Restore etc
  • 14. Design & Architecture Best Practices ● System instrumentation - Systems and application monitoring ● Web-services architecture ● System standardisation (dockers) ○ Consistent environments ○ Simplified builds / releases ○ Scalable architecture ● Data systems best practices ○ Design for scale and performance
  • 15. System Instrumentation - Systems / application monitoring ● Application monitoring setup is “must-have” requirement for all applications ○ Helps identify system and application deficiencies ○ Helps identify problems, proactively ○ Results in efficient (performance and cost effective) systems
  • 16. Web-services architecture ● Create web-services and not “spider-web” of services ● Create fewer “power packed” services vs. many, many “simplistic” services ○ Push down complex data relationships into application code / database ● Create separate services for different data response times ○ Web-services for data stored in redis / memcached / elasticsearch be kept separate from web-services for data from RDBMS ● Use tools such as Postman and Swagger to author and document web-services Elasticsearch Postgres / Mongo Web Crawler Hadoop / Spark Middle Tier Redis
  • 17. System standardisation (1) ● Standard AMI for all systems
  • 18. System standardisation (2) ● Minimalistic “coreos” and manage configurations via infrastructure with Terraform
  • 19. System standardization (3) ● Standard base docker image for all dockers ○ OS: Ubuntu 16.04 ○ Python: 3.4 ○ Setup non-system user
  • 20. System standardisation (4) ● Separate Git repository for build and configurations ○ MeTrippingDeloyment has docker compose ymls for build and deployment settings for dev / stage / prod environments ○ .env files contain environment settings (sourced in by docker-compose)
  • 21. System standardisation (5) ● Build: docker-compose.sh -f docker-compose-common.yml -rv v1 -rt 2018.03.19 build mt-ranker-build ● Deploy: docker-compose.sh -f docker-compose-staging.yml -rv v1 -rt 2018.03.19 up -d mt-ranker
  • 22. Data Systems Best Practices ● Embrace hybrid (SQL + NoSQL + Big Data) system design ○ Store transaction data in RDBMS ■ Consider data partitioning ■ Move archive data to Big Data systems with Long Term Storage Backend ○ Store dimension / non-transaction data in NoSQL ■ MondoDB vs. CouchDB vs. Elasticsearch / Solr ○ Move complex data joins to backend data pipelines ○ Simplify star schema ● System design considerations ○ Use “non-constrained” CPUs ○ Use SSDs for data
  • 23. Summary ● Code -> Build -> Deploy -> Manage -> Burn, Burn, Burn -> Re-Design -> Re-Code -> Re-Build -> Re-Deploy -> Burn, Burn vs. ● Design -> Code -> Build -> Deploy -> Manage -> Burn Less
  • 24. Q & A
  • 25. Thank You! Gaurav (gaurav@metripping.com), Shanker (shanker@syscredence.com)