SlideShare a Scribd company logo
1 of 47
Flipkart Website Architecture

      Mistakes & Learnings

          Siddhartha Reddy
          Architect, Flipkart
June 2007
November 2007
December 2012
www.flipkart.com
• Started in 2007
• Current Architecture from mid 2010
• Evolution of the architecture presented as…

       Issue[1]             RCA[2]   Actions   Learnings




•   *1+ Issue: Website is “slow”
•   [2] RCA = Root Cause Analysis
Surviving & reacting to the environment

INFANCY (2007 – MID-2010)
Website is “slow”!
RCA
• Why?
  – MySQL queries taking too long
• Why?
  – Too many queries
  – Many slow queries
  – Queries locking tables
• Why?
  – Capacity
• Hmm…
Fixing it
• Get beefier servers (the obvious)
• Separate master_db, slave_db
  – Writes go to master_db
  – Reads from slave_db
  – Critical reads from master_db
                              Writes                 Reads
   Reads           Writes

           MySQL              MySQL                  MySQL
                                       Replication   Slave
                              Master
Learning from it
• Scale-out databases reads by distributing load
  across systems
• Isolate database writes from reads
  – Writes are (usually) more critical
Website is “slow”!
    (Again)
RCA
• Why?
  – MySQL queries taking too long (on slave_db)
• Why?
  – Too many queries
  – Many slow queries
• Why?
  – Queries from analytics / reporting and other
    backend jobs
• Urm…
Fixing it
• Analytics / reporting DB (archival_db)
    – Use MyISAM — optimized for reads
    – Additional indexes for quicker reporting
                                           Website                  Website
                                           Writes                    Reads
Website                 Website
Writes                   Reads

                                           MySQL                    MySQL
                                                      Replication   Slave 1
                                           Master
MySQL                   MySQL
          Replication   Slave
Master                                          Replication

                        Analytics           MySQL                   Analytics
                         Reads              Slave 2                  Reads
Learning from it
• Isolate the databases being used for serving
  website traffic from those being used for
  analytical/reporting
• Isolate systems being used by production
  website from those being used for background
  processing
Learning the basics

BABY (2010 – 2011)
Website is “slow”!
RCA
• Why?
• How?
  – Instrumentation
RCA - 1
• Why?
     – Logging a lot
     – PHP processes blocking on writing logs
               Request2
              -> Process2




                                                                                      Writing
                                          Waiting




                                                                Waiting
Request1                    Request3                Request2              Request2              Request3
-> Process1                 -> Process3             :Process1             :Process2             :Process3

              Log file
RCA - 2
• Why?
  – Service Oriented Architecture (SOA)
  – Too many calls to remote services per request
     • Creating fresh connection for each call
     • All the calls are made in serial order


                     Connect to   Request    Connect    Request      Send
   Receive request
                      Service1    Service1   Service2   Service2   response
RCA - 3
• Why?
  – Configurability
  – Fetch a lot of “config” from database for serving
    each request
     Receive    Fetch     Fetch     Fetch     Fetch      Send
     request   Config1   Config2   Config3   Config4   response
RCA – 1,2,3
• Why?
  – Logging a lot
  – SOA
  – Configurability
• Why?
  – PHP’s process model
• Argh!
Fixing it
• fk-w3-agent
  – Simple Java “middleware” daemon
  – Deployed on each web server
  – PHP communicates to it through local socket
  – Hosts pluggable “handlers”
fk-w3-agent: LoggingHandler

               Request2                                 Request2
              -> Process2                               -> Process2
Request1                    Request3      Request1                     Request3
-> Process1                 -> Process3   -> Process1                 -> Process3


                                                         fk-w3-
              Log file                                    agent

                                                                 Async / buffered




                                                        Log file
fk-w3-agent: ServiceHandler(s)
                  Connect to     Request           Connect         Request       Send
Receive request
                   Service1      Service1          Service2        Service2    response




                                            Call
         Receive request                                             Send response
                                      fk-w3-agent


                                        fk-w3-
                                        agent

                      Service1                                Service2
fk-w3-agent: ConfigHandler
Receive      Fetch     Fetch        Fetch          Fetch      Send
request     Config1   Config2      Config3        Config4   response




                             Database

                       Fetch all config from
    Receive request                                Send response
                           fk-w3-agent

                           fk-w3-
                            agent
                                 Poll and cache



                          Database
Learning from it
• PHP — good for frontend and templating
  – Gives a lot of agility
  – Limiting process model
     • Hurdle for high performance
• Java — stability and performance
• Horses for courses
Website is “slow”!
    (Again)
RCA
• Why?
  – PHP processes taking up too much time
  – PHP processes taking up too much CPU
• Why?
  – Product info deserialization taking up time/CPU
  – View construction taking up time/CPU
Fixing it
• Caching!
• Cache fully constructed pages
  – For a few minutes
  – Only for highly trafficked pages (Homepage)
• Cache PHP serialized Product objects
  – ~20 million objects
  – Memcache
• Yeah! But…
  – Add caching => add complexity
Caching: Complications (1)
• “Caching fully constructed pages”
• But parts of pages still need to be dynamic
     • Example: Logged-in user’s name
• Impossible to do effective bucket testing
     • Or at least makes it prohibitively complex
Caching: Complications (2)
• “Caching PHP serialized Product objects”
• Without caching:
              getProductInfo()            Fetch from CMS

• With caching, cache hit:
              getProductInfo()           Fetch from Cache

• With caching, cache miss:
                         Fetch from   Fetch from
      getProductInfo()                             Set in Cache
                           Cache         CMS
Caching: Complications (3)
• TTL: ∞ (i.e. no invalidation)
• Pro-actively repopulate products in the cache
  – Receive “notifications” about product updates
     • Notification Server — pushes notifications raised by
       CMS
• Use a persistent, distributed cache
  – Memcache => Membase, Couchbase
Learning from it
• Caching is a powerful tool for performance
  optimization
• Caching adds complexities
  – Reduced by keeping cache close to data source
  – Think deeply about TTL, invalidation
• Use caching to go from “acceptable
  performance” to “awesome performance”
  – Don’t rely on it to get to “acceptable
    performance”
Growing up

KID (2012)
Website is “slow”!
RCA
• Why?
  – Search-service is slow (or Reviews-service is slow
    or Recommendations-service is slow)
• But why is rest of website slow?
  – Requests to the slow service are blocking
    processing threads
• Eh?!
Let’s do some math
• Let’s say
   – Mean (or median) response time: 100 ms
   – 8-core server
   – All requests are CPU bound
• Throughput: 80 requests per second (rps)
• Let’s also say
   – 95th Percentile response time: 1000 ms
       • Call them “bad requests”
• 4 bad requests in a second
   – Throughput down to 44 rps
• 8 bad requests in a second?
   – Throughput down to 8 rps
Fixing it
• Aggressive timeouts for all service calls
  – Isolate impact of a slow service
     • only to pages that depend on it
• Very aggressive timeouts for non-critical
  services
  – Example: Recommendations
     • On a Product page, Search results page etc.
     • Not on My Recommendations page
• Load non-critical parts of pages through AJAX
Learning from it
• Isolate the impact of a poorly performing
  services / systems
• Isolate the required from the good-to-have
Website is “slow”!
    (Again)
RCA
• Why?
  – Load average of web servers has spiked
• Why?
  – Requests per second has spiked
     • From 1000 rps to 1500 rps
• Why?
  – Large number of notifications of product
    information updates
Fixing it
• Separate cluster for receiving product info
  update notifications from the cluster that
  serves users
• Admission control: Don’t let a system receive
  more requests than it can handle
  – Throttling
• Batch the notifications
Learning from it
• Isolate the systems serving internal requests
  from those serving production traffic
• Admission control to ensure that a system is
  isolated from the over-enthusiasm of a client
• Look at the granularity at which we’re working
Increasing complexity

TEENAGER
THANK YOU
Mistake?
• Sub-optimal decision
  – Not all information/scenarios considered
  – Insufficient information
  – Built for a different scenario
• Due to focus on “functional” aspects
• A mistake is a mistake
  – … in retrospect

More Related Content

What's hot

Atomic Design powered by React @ AbemaTV
Atomic Design powered by React @ AbemaTVAtomic Design powered by React @ AbemaTV
Atomic Design powered by React @ AbemaTVYusuke Goto
 
実用段階に入ったOpenStack ~ もうすぐ絶滅するというPrivate Cloudの多様性について ~
実用段階に入ったOpenStack ~ もうすぐ絶滅するというPrivate Cloudの多様性について ~実用段階に入ったOpenStack ~ もうすぐ絶滅するというPrivate Cloudの多様性について ~
実用段階に入ったOpenStack ~ もうすぐ絶滅するというPrivate Cloudの多様性について ~Rakuten Group, Inc.
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~NTT DATA OSS Professional Services
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonTimothy Spann
 
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...Jean-Paul Azar
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理NTT DATA Technology & Innovation
 
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and KafkaN Masahiro
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingRobert Metzger
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Automated Apache Kafka Mocking and Testing with AsyncAPI | Hugo Guerrero, Red...
Automated Apache Kafka Mocking and Testing with AsyncAPI | Hugo Guerrero, Red...Automated Apache Kafka Mocking and Testing with AsyncAPI | Hugo Guerrero, Red...
Automated Apache Kafka Mocking and Testing with AsyncAPI | Hugo Guerrero, Red...HostedbyConfluent
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
カッコいい SharePoint モダンサイトを作ろう
カッコいい SharePoint モダンサイトを作ろうカッコいい SharePoint モダンサイトを作ろう
カッコいい SharePoint モダンサイトを作ろうHirofumi Ota
 

What's hot (20)

Atomic Design powered by React @ AbemaTV
Atomic Design powered by React @ AbemaTVAtomic Design powered by React @ AbemaTV
Atomic Design powered by React @ AbemaTV
 
実用段階に入ったOpenStack ~ もうすぐ絶滅するというPrivate Cloudの多様性について ~
実用段階に入ったOpenStack ~ もうすぐ絶滅するというPrivate Cloudの多様性について ~実用段階に入ったOpenStack ~ もうすぐ絶滅するというPrivate Cloudの多様性について ~
実用段階に入ったOpenStack ~ もうすぐ絶滅するというPrivate Cloudの多様性について ~
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Building Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with PythonBuilding Modern Data Streaming Apps with Python
Building Modern Data Streaming Apps with Python
 
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
え、まって。その並列分散処理、Kafkaのしくみでもできるの? Apache Kafkaの機能を利用した大規模ストリームデータの並列分散処理
 
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)
AlloyDBを触ってみた!(第33回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and Kafka
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
IOS/IOS-XE 運用管理機能アップデート
IOS/IOS-XE 運用管理機能アップデートIOS/IOS-XE 運用管理機能アップデート
IOS/IOS-XE 運用管理機能アップデート
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Automated Apache Kafka Mocking and Testing with AsyncAPI | Hugo Guerrero, Red...
Automated Apache Kafka Mocking and Testing with AsyncAPI | Hugo Guerrero, Red...Automated Apache Kafka Mocking and Testing with AsyncAPI | Hugo Guerrero, Red...
Automated Apache Kafka Mocking and Testing with AsyncAPI | Hugo Guerrero, Red...
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
カッコいい SharePoint モダンサイトを作ろう
カッコいい SharePoint モダンサイトを作ろうカッコいい SharePoint モダンサイトを作ろう
カッコいい SharePoint モダンサイトを作ろう
 

Viewers also liked

Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web appsDirecti Group
 
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...slashn
 
Fungus on White Bread
Fungus on White BreadFungus on White Bread
Fungus on White BreadGaurav Lochan
 
Continuous deployment-at-flipkart
Continuous deployment-at-flipkartContinuous deployment-at-flipkart
Continuous deployment-at-flipkartPankaj Kaushal
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
 
Architecture of a Modern Web App
Architecture of a Modern Web AppArchitecture of a Modern Web App
Architecture of a Modern Web Appscothis
 
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok BanerjeeSlash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjeeslashn
 
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...slashn
 
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...slashn
 
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...slashn
 
Driving User Growth Through Online Marketing
Driving User Growth Through Online MarketingDriving User Growth Through Online Marketing
Driving User Growth Through Online Marketingslashn
 
Introduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBIntroduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBbackslash451
 
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay SinghSlash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singhslashn
 
Soa design pattern
Soa design patternSoa design pattern
Soa design patternLap Doan
 
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMINFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMMilan49
 
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMFlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMtigerjayadev
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...Robert Mederer
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 

Viewers also liked (20)

How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web apps
 
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
 
Fungus on White Bread
Fungus on White BreadFungus on White Bread
Fungus on White Bread
 
Continuous deployment-at-flipkart
Continuous deployment-at-flipkartContinuous deployment-at-flipkart
Continuous deployment-at-flipkart
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
Flipkart
FlipkartFlipkart
Flipkart
 
Architecture of a Modern Web App
Architecture of a Modern Web AppArchitecture of a Modern Web App
Architecture of a Modern Web App
 
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok BanerjeeSlash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
 
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
 
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
 
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
 
Driving User Growth Through Online Marketing
Driving User Growth Through Online MarketingDriving User Growth Through Online Marketing
Driving User Growth Through Online Marketing
 
Introduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBIntroduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDB
 
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay SinghSlash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
 
Soa design pattern
Soa design patternSoa design pattern
Soa design pattern
 
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMINFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
 
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMFlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 

Similar to Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows AzureDamir Dobric
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...SQLExpert.pl
 
Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Yogi Kulkarni
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?Jagadish Venkatraman
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...HostedbyConfluent
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to ProductionJBUG London
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to ProductionC2B2 Consulting
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NETDavid Giard
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackSadayuki Furuhashi
 

Similar to Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy (20)

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows Azure
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Cdn cs6740
Cdn cs6740Cdn cs6740
Cdn cs6740
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy

  • 1. Flipkart Website Architecture Mistakes & Learnings Siddhartha Reddy Architect, Flipkart
  • 5. www.flipkart.com • Started in 2007 • Current Architecture from mid 2010 • Evolution of the architecture presented as… Issue[1] RCA[2] Actions Learnings • *1+ Issue: Website is “slow” • [2] RCA = Root Cause Analysis
  • 6. Surviving & reacting to the environment INFANCY (2007 – MID-2010)
  • 8. RCA • Why? – MySQL queries taking too long • Why? – Too many queries – Many slow queries – Queries locking tables • Why? – Capacity • Hmm…
  • 9. Fixing it • Get beefier servers (the obvious) • Separate master_db, slave_db – Writes go to master_db – Reads from slave_db – Critical reads from master_db Writes Reads Reads Writes MySQL MySQL MySQL Replication Slave Master
  • 10. Learning from it • Scale-out databases reads by distributing load across systems • Isolate database writes from reads – Writes are (usually) more critical
  • 12. RCA • Why? – MySQL queries taking too long (on slave_db) • Why? – Too many queries – Many slow queries • Why? – Queries from analytics / reporting and other backend jobs • Urm…
  • 13. Fixing it • Analytics / reporting DB (archival_db) – Use MyISAM — optimized for reads – Additional indexes for quicker reporting Website Website Writes Reads Website Website Writes Reads MySQL MySQL Replication Slave 1 Master MySQL MySQL Replication Slave Master Replication Analytics MySQL Analytics Reads Slave 2 Reads
  • 14. Learning from it • Isolate the databases being used for serving website traffic from those being used for analytical/reporting • Isolate systems being used by production website from those being used for background processing
  • 15. Learning the basics BABY (2010 – 2011)
  • 17. RCA • Why? • How? – Instrumentation
  • 18. RCA - 1 • Why? – Logging a lot – PHP processes blocking on writing logs Request2 -> Process2 Writing Waiting Waiting Request1 Request3 Request2 Request2 Request3 -> Process1 -> Process3 :Process1 :Process2 :Process3 Log file
  • 19. RCA - 2 • Why? – Service Oriented Architecture (SOA) – Too many calls to remote services per request • Creating fresh connection for each call • All the calls are made in serial order Connect to Request Connect Request Send Receive request Service1 Service1 Service2 Service2 response
  • 20. RCA - 3 • Why? – Configurability – Fetch a lot of “config” from database for serving each request Receive Fetch Fetch Fetch Fetch Send request Config1 Config2 Config3 Config4 response
  • 21. RCA – 1,2,3 • Why? – Logging a lot – SOA – Configurability • Why? – PHP’s process model • Argh!
  • 22. Fixing it • fk-w3-agent – Simple Java “middleware” daemon – Deployed on each web server – PHP communicates to it through local socket – Hosts pluggable “handlers”
  • 23. fk-w3-agent: LoggingHandler Request2 Request2 -> Process2 -> Process2 Request1 Request3 Request1 Request3 -> Process1 -> Process3 -> Process1 -> Process3 fk-w3- Log file agent Async / buffered Log file
  • 24. fk-w3-agent: ServiceHandler(s) Connect to Request Connect Request Send Receive request Service1 Service1 Service2 Service2 response Call Receive request Send response fk-w3-agent fk-w3- agent Service1 Service2
  • 25. fk-w3-agent: ConfigHandler Receive Fetch Fetch Fetch Fetch Send request Config1 Config2 Config3 Config4 response Database Fetch all config from Receive request Send response fk-w3-agent fk-w3- agent Poll and cache Database
  • 26. Learning from it • PHP — good for frontend and templating – Gives a lot of agility – Limiting process model • Hurdle for high performance • Java — stability and performance • Horses for courses
  • 28. RCA • Why? – PHP processes taking up too much time – PHP processes taking up too much CPU • Why? – Product info deserialization taking up time/CPU – View construction taking up time/CPU
  • 29. Fixing it • Caching! • Cache fully constructed pages – For a few minutes – Only for highly trafficked pages (Homepage) • Cache PHP serialized Product objects – ~20 million objects – Memcache • Yeah! But… – Add caching => add complexity
  • 30. Caching: Complications (1) • “Caching fully constructed pages” • But parts of pages still need to be dynamic • Example: Logged-in user’s name • Impossible to do effective bucket testing • Or at least makes it prohibitively complex
  • 31. Caching: Complications (2) • “Caching PHP serialized Product objects” • Without caching: getProductInfo() Fetch from CMS • With caching, cache hit: getProductInfo() Fetch from Cache • With caching, cache miss: Fetch from Fetch from getProductInfo() Set in Cache Cache CMS
  • 32. Caching: Complications (3) • TTL: ∞ (i.e. no invalidation) • Pro-actively repopulate products in the cache – Receive “notifications” about product updates • Notification Server — pushes notifications raised by CMS • Use a persistent, distributed cache – Memcache => Membase, Couchbase
  • 33. Learning from it • Caching is a powerful tool for performance optimization • Caching adds complexities – Reduced by keeping cache close to data source – Think deeply about TTL, invalidation • Use caching to go from “acceptable performance” to “awesome performance” – Don’t rely on it to get to “acceptable performance”
  • 36. RCA • Why? – Search-service is slow (or Reviews-service is slow or Recommendations-service is slow) • But why is rest of website slow? – Requests to the slow service are blocking processing threads • Eh?!
  • 37. Let’s do some math • Let’s say – Mean (or median) response time: 100 ms – 8-core server – All requests are CPU bound • Throughput: 80 requests per second (rps) • Let’s also say – 95th Percentile response time: 1000 ms • Call them “bad requests” • 4 bad requests in a second – Throughput down to 44 rps • 8 bad requests in a second? – Throughput down to 8 rps
  • 38. Fixing it • Aggressive timeouts for all service calls – Isolate impact of a slow service • only to pages that depend on it • Very aggressive timeouts for non-critical services – Example: Recommendations • On a Product page, Search results page etc. • Not on My Recommendations page • Load non-critical parts of pages through AJAX
  • 39. Learning from it • Isolate the impact of a poorly performing services / systems • Isolate the required from the good-to-have
  • 41. RCA • Why? – Load average of web servers has spiked • Why? – Requests per second has spiked • From 1000 rps to 1500 rps • Why? – Large number of notifications of product information updates
  • 42. Fixing it • Separate cluster for receiving product info update notifications from the cluster that serves users • Admission control: Don’t let a system receive more requests than it can handle – Throttling • Batch the notifications
  • 43. Learning from it • Isolate the systems serving internal requests from those serving production traffic • Admission control to ensure that a system is isolated from the over-enthusiasm of a client • Look at the granularity at which we’re working
  • 45.
  • 47. Mistake? • Sub-optimal decision – Not all information/scenarios considered – Insufficient information – Built for a different scenario • Due to focus on “functional” aspects • A mistake is a mistake – … in retrospect

Editor's Notes

  1. “This has basically given us lots of opportunities to make mistakes. And make mistakes we did.”
  2. Website Architecture diagram goes here
  3. No