SlideShare a Scribd company logo
Building a Highly
    Scalable, Open
Source Twitter Clone
 Dan Diephouse (dan@netzooid.com)
  Paul Brown (prb@mult.ifario.us)
Motivation
★   Wide (and growing) variety of
    non-relational databases.
    (viz. NoSQL — http://bit.ly/pLhqQ, http://bit.ly/17MmTk)


★   Twitter application model
    presents interesting
    challenges of scope and
    scale.
    (viz. “Fixing Twitter” http://bit.ly/2VmZdz)
Storage Metaphors
★   Key/Value Store
    Opaque values; fast and simple.
★   Examples:
    ★   Cassandra* — http://bit.ly/EdUEt
    ★   Dynomite — http://bit.ly/12AYmf
    ★   Redis — http://bit.ly/LBtCh
    ★   Tokyo Tyrant — http://bit.ly/oU4uV
    ★   Voldemort – http://bit.ly/oU4uV
Key/Value
Key Value

1




2




3
Storage Metaphors
★   Document-Oriented
    Unstructured content; rich queries.
★   Examples:
    ★   CouchDB — http://bit.ly/JAgUM
    ★   MongoDB — http://bit.ly/HDDOV
    ★   SOLR — http://bit.ly/q4gyi
    ★   XML databases...
Document-Oriented
ID=“dan-tweet-1”,
TEXT=“hello world”

ID=dan-tweet-2,
TEXT=“Twirp!”,
IN-REPLY-TO=“paul-tweet-5”
Storage Metaphors
★   Column-Oriented

    Organized in columns; easily scanned.
★   Examples:
    ★   Cassandra* — http://bit.ly/EdUEt

    ★   BigTable — http://bit.ly/QqMYA
        (available within AppEngine)


    ★   HBase — http://bit.ly/Zck7F
    ★   SimpleDB — http://bit.ly/toh0P
        (Typica library for Java — http://bit.ly/22kxZ4)
Column-Oriented
    Name          Date                  Tweet Text
    Bob           20090506              Eating dinner.
    Dan           20090507              Is it Friday yet?
    Dan           20090506              Beer me!
    Ralph         20090508              My bum itches.



Index   Name         Index   Date         Index   Tweet Text
0       Bob          0       20090506     0       Eating dinner.
1       Dan          1       20090507     1       Is it Friday yet?
2       Dan          2       20090506     2       Beer me!
3       Ralph        3       20090508     3       My bum itches.

        Storage          Storage                  Storage
Every Store is Special.

★   Lots of different little tweaks
    to the storage model.
★   Widely varying levels of
    maturity.
★   Growing communities.
★   Limited (but growing) tooling,
    libraries, and production
    adoption.
Reliability Through
        Replication
★   Consistent hashing to assign
    keys to partitions.
★   Partitions replicated on
    multiple nodes for
    redundancy.
★   Minimum number of successful
    reads to consider a write
    complete.
Reliability Through
         Replication
    PUT (k,v)

Client
Web UI
http://tat1.datapr0n.com:8080
Stores
★   Tweets
    Individual tweets.

★   Friends’ Timeline
    Fixed-length timelines.

★   Users
    Info and followers.

★   Command Queue
    Actions to perform (tweet, follow, etc.).
Data
★   Command (Java serialization)
    Keyed by node name, increasing ID.
★   Tweets (Java serialization)
    Keyed by user name, increasing ID.
★   FriendsTimeline (Java serialization)
    Keyed by username.
    List of date, tweet ID.
★   Users (Java serialization)
    Keyed by username.
    Followers (list), Followed (list), last tweet ID.
Life of a Tweet, Part I
                 1

                     Beer me.                    Users
1.User tweets.                             2


2.Find next
  tweet ID for                             3   Commands




                                Web Tier
  user.
3.Store “tweet                                  Friends
                                                Timeline

  for user”
  command.
                                                Tweets
Life of a Tweet, Part II
                         Where's
1. Read next command.   Demi with
                                                          Users
                        my beer?!?
2. Store tweet in
   user’s timeline                              1
   (Tweets).                                            Commands

                                                    4




                                     Web Tier
3. Store tweet ID in
   friends’
                                                    3
   timelines.                                            Friends
                                                         Timeline
   (Requires *many*
   operations.)
                                                    2

4. DELETE command.                                       Tweets
Some Patterns
★   “Sequences” are implemented
    as race-for-non-collision.
★   “Joins” are common keys or
    keys referenced from values.
★   “Transactions” are idempotent
    operations with DELETE at the
    end.
Operations
★   Deploy to Amazon EC2
    ★   2 nodes for Voldemort
    ★   2 nodes for Tomcat
    ★   1 node for Cacti
★   All “small” instances w/RightScale CentOS
    5.2 image.
★   Minor inconvenience of “EBS” volume for
    MySQL for Cacti.
    (follow Eric Hammond’s tutorial — http://bit.ly/OK5LZ)
Deployment
★   Lots of choices for automated rollout
    (Chef, Capistrano, etc.)
★   Took simplest path — Maven build, Ant
    (scp/ssh and property substitution
    tasks), and bash scripts.
    for i in vn1 vn2; do

      ant -Dnode=${i} setup-v-node

    done

★   Takes ~30 seconds to provision a Tomcat
    or Voldemort node.
Dashboarding
★   As above, lots of choices
    (Cacti — http://bit.ly/qV4gz, Graphite — http://bit.ly/466NAx, etc.)


★   Cacti as simplest choice.
    yum install -y cacti

★   Vanilla SNMP on nodes for host
    data.
★   Minimal extensions to Voldemort
    for stats in Cacti-friendly
    format.
Dashboarding
Performance
★   270 req/sec for getFriendsTimeline against
    web tier.
    ★   21 GETs on V stores to pull data.
    ★   5600 req/sec for V is similar to
        performance reported at NoSQL meetup (20k
        req/sec) when adjusted for hardware.
    ★   Cache on the web tier could make this
        faster...
★   Some hassles when hammering individual keys
    with rapid updates.
Take Aways
★   Linked-list representation deserves some thought
    (and experiments).
    Dynomite + Osmos (http://bit.ly/BYMdW)

★   Additional use cases (search, rich API, replies,
    direct messages, etc.) might alter design.
★   BigTable/HBase approach deserves another look.
★   Source code is available; come and git it.

    http://github.com/prb/bigbird

    git://github.com/prb/bigbird.git
Coordinates
★   Dan Diephouse (@dandiep)
    dan@netzooid.com
    http://netzooid.com
★   Paul Brown (@paulrbrown)
    prb@mult.ifario.us
    http://mult.ifario.us/a

More Related Content

What's hot

Firewall fundamentals
Firewall fundamentalsFirewall fundamentals
Firewall fundamentalsThang Man
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTIONumme ayesha
 
Password selection,piggybacking-
Password selection,piggybacking-Password selection,piggybacking-
Password selection,piggybacking-Baljit Saini
 
IoT Security
IoT SecurityIoT Security
What is two factor or multi-factor authentication
What is two factor or multi-factor authenticationWhat is two factor or multi-factor authentication
What is two factor or multi-factor authentication
Jack Forbes
 
Web rtc, Media stream, Peer connection, Setting up STUN and TURN on Linux and...
Web rtc, Media stream, Peer connection, Setting up STUN and TURN on Linux and...Web rtc, Media stream, Peer connection, Setting up STUN and TURN on Linux and...
Web rtc, Media stream, Peer connection, Setting up STUN and TURN on Linux and...
Amitesh Madhur
 
Hacking
HackingHacking
Hacking
pranav patade
 
Phishing
PhishingPhishing
Phishing
guicelacatalina
 
Blockchain-based Solutions for Identity & Access Management
Blockchain-based Solutions for Identity & Access ManagementBlockchain-based Solutions for Identity & Access Management
Blockchain-based Solutions for Identity & Access Management
Prabath Siriwardena
 
Deep web
Deep webDeep web
Chap 1 Fundamentals of Cyber Security _ Intr to Cyber types.pptx
Chap 1 Fundamentals of Cyber Security _ Intr to Cyber  types.pptxChap 1 Fundamentals of Cyber Security _ Intr to Cyber  types.pptx
Chap 1 Fundamentals of Cyber Security _ Intr to Cyber types.pptx
SharmilaMore5
 
Brute force attack
Brute force attackBrute force attack
Brute force attackjoycruiser
 
Virtual Private Networks (VPN) ppt
Virtual Private Networks (VPN) pptVirtual Private Networks (VPN) ppt
Virtual Private Networks (VPN) ppt
OECLIB Odisha Electronics Control Library
 
BYOD
BYODBYOD
TWO FACTOR AUTHENTICATION - COMPREHENSIVE GUIDE
TWO FACTOR AUTHENTICATION - COMPREHENSIVE GUIDETWO FACTOR AUTHENTICATION - COMPREHENSIVE GUIDE
TWO FACTOR AUTHENTICATION - COMPREHENSIVE GUIDE
CTM360
 
BYOD: Bring Your Own Device Implementation and Security Issues
BYOD: Bring Your Own Device Implementation and Security IssuesBYOD: Bring Your Own Device Implementation and Security Issues
BYOD: Bring Your Own Device Implementation and Security Issues
Harsh Kishore Mishra
 
Firewalls in network security
Firewalls in network securityFirewalls in network security
Firewalls in network security
Vikram Khanna
 
Phishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS WorkingPhishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS Working
Sachin Saini
 
Cyber crime and cyber security
Cyber crime and cyber  securityCyber crime and cyber  security
Cyber crime and cyber securityKeshab Nath
 
Phishing detection & protection scheme
Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection scheme
Mussavir Shaikh
 

What's hot (20)

Firewall fundamentals
Firewall fundamentalsFirewall fundamentals
Firewall fundamentals
 
PHISHING DETECTION
PHISHING DETECTIONPHISHING DETECTION
PHISHING DETECTION
 
Password selection,piggybacking-
Password selection,piggybacking-Password selection,piggybacking-
Password selection,piggybacking-
 
IoT Security
IoT SecurityIoT Security
IoT Security
 
What is two factor or multi-factor authentication
What is two factor or multi-factor authenticationWhat is two factor or multi-factor authentication
What is two factor or multi-factor authentication
 
Web rtc, Media stream, Peer connection, Setting up STUN and TURN on Linux and...
Web rtc, Media stream, Peer connection, Setting up STUN and TURN on Linux and...Web rtc, Media stream, Peer connection, Setting up STUN and TURN on Linux and...
Web rtc, Media stream, Peer connection, Setting up STUN and TURN on Linux and...
 
Hacking
HackingHacking
Hacking
 
Phishing
PhishingPhishing
Phishing
 
Blockchain-based Solutions for Identity & Access Management
Blockchain-based Solutions for Identity & Access ManagementBlockchain-based Solutions for Identity & Access Management
Blockchain-based Solutions for Identity & Access Management
 
Deep web
Deep webDeep web
Deep web
 
Chap 1 Fundamentals of Cyber Security _ Intr to Cyber types.pptx
Chap 1 Fundamentals of Cyber Security _ Intr to Cyber  types.pptxChap 1 Fundamentals of Cyber Security _ Intr to Cyber  types.pptx
Chap 1 Fundamentals of Cyber Security _ Intr to Cyber types.pptx
 
Brute force attack
Brute force attackBrute force attack
Brute force attack
 
Virtual Private Networks (VPN) ppt
Virtual Private Networks (VPN) pptVirtual Private Networks (VPN) ppt
Virtual Private Networks (VPN) ppt
 
BYOD
BYODBYOD
BYOD
 
TWO FACTOR AUTHENTICATION - COMPREHENSIVE GUIDE
TWO FACTOR AUTHENTICATION - COMPREHENSIVE GUIDETWO FACTOR AUTHENTICATION - COMPREHENSIVE GUIDE
TWO FACTOR AUTHENTICATION - COMPREHENSIVE GUIDE
 
BYOD: Bring Your Own Device Implementation and Security Issues
BYOD: Bring Your Own Device Implementation and Security IssuesBYOD: Bring Your Own Device Implementation and Security Issues
BYOD: Bring Your Own Device Implementation and Security Issues
 
Firewalls in network security
Firewalls in network securityFirewalls in network security
Firewalls in network security
 
Phishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS WorkingPhishing attack, with SSL Encryption and HTTPS Working
Phishing attack, with SSL Encryption and HTTPS Working
 
Cyber crime and cyber security
Cyber crime and cyber  securityCyber crime and cyber  security
Cyber crime and cyber security
 
Phishing detection & protection scheme
Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection scheme
 

Similar to Building a Highly Scalable, Open Source Twitter Clone

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught Me
David Boike
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdf
Yogeshwaran R
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
Guillaume Arnaud
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF Workshop
Ido Flatow
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Rich Bowen
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
SATOSHI TAGOMORI
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Grand Central Dispatch
Grand Central DispatchGrand Central Dispatch
Grand Central Dispatch
Robert Brown
 
Real world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalReal world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalHoward Glynn
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialTom Croucher
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Peter Hlavaty
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshyn
vtors
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
Richard Leland
 
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 BostonScaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 Bostonbenbrowning
 
Demystfying container-networking
Demystfying container-networkingDemystfying container-networking
Demystfying container-networking
Balasundaram Natarajan
 
DCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDCSF19 Containers for Beginners
DCSF19 Containers for Beginners
Docker, Inc.
 
Post Metasploitation
Post MetasploitationPost Metasploitation
Post Metasploitation
egypt
 

Similar to Building a Highly Scalable, Open Source Twitter Clone (20)

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught Me
 
IP Multicast on ec2
IP Multicast on ec2IP Multicast on ec2
IP Multicast on ec2
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdf
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF Workshop
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
spdy
spdyspdy
spdy
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Grand Central Dispatch
Grand Central DispatchGrand Central Dispatch
Grand Central Dispatch
 
Real world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalReal world cloud formation feb 2014 final
Real world cloud formation feb 2014 final
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js Tutorial
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshyn
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 BostonScaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
 
Demystfying container-networking
Demystfying container-networkingDemystfying container-networking
Demystfying container-networking
 
DCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDCSF19 Containers for Beginners
DCSF19 Containers for Beginners
 
Post Metasploitation
Post MetasploitationPost Metasploitation
Post Metasploitation
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Building a Highly Scalable, Open Source Twitter Clone

  • 1. Building a Highly Scalable, Open Source Twitter Clone Dan Diephouse (dan@netzooid.com) Paul Brown (prb@mult.ifario.us)
  • 2. Motivation ★ Wide (and growing) variety of non-relational databases. (viz. NoSQL — http://bit.ly/pLhqQ, http://bit.ly/17MmTk) ★ Twitter application model presents interesting challenges of scope and scale. (viz. “Fixing Twitter” http://bit.ly/2VmZdz)
  • 3. Storage Metaphors ★ Key/Value Store Opaque values; fast and simple. ★ Examples: ★ Cassandra* — http://bit.ly/EdUEt ★ Dynomite — http://bit.ly/12AYmf ★ Redis — http://bit.ly/LBtCh ★ Tokyo Tyrant — http://bit.ly/oU4uV ★ Voldemort – http://bit.ly/oU4uV
  • 5. Storage Metaphors ★ Document-Oriented Unstructured content; rich queries. ★ Examples: ★ CouchDB — http://bit.ly/JAgUM ★ MongoDB — http://bit.ly/HDDOV ★ SOLR — http://bit.ly/q4gyi ★ XML databases...
  • 7. Storage Metaphors ★ Column-Oriented Organized in columns; easily scanned. ★ Examples: ★ Cassandra* — http://bit.ly/EdUEt ★ BigTable — http://bit.ly/QqMYA (available within AppEngine) ★ HBase — http://bit.ly/Zck7F ★ SimpleDB — http://bit.ly/toh0P (Typica library for Java — http://bit.ly/22kxZ4)
  • 8. Column-Oriented Name Date Tweet Text Bob 20090506 Eating dinner. Dan 20090507 Is it Friday yet? Dan 20090506 Beer me! Ralph 20090508 My bum itches. Index Name Index Date Index Tweet Text 0 Bob 0 20090506 0 Eating dinner. 1 Dan 1 20090507 1 Is it Friday yet? 2 Dan 2 20090506 2 Beer me! 3 Ralph 3 20090508 3 My bum itches. Storage Storage Storage
  • 9. Every Store is Special. ★ Lots of different little tweaks to the storage model. ★ Widely varying levels of maturity. ★ Growing communities. ★ Limited (but growing) tooling, libraries, and production adoption.
  • 10. Reliability Through Replication ★ Consistent hashing to assign keys to partitions. ★ Partitions replicated on multiple nodes for redundancy. ★ Minimum number of successful reads to consider a write complete.
  • 11. Reliability Through Replication PUT (k,v) Client
  • 13. Stores ★ Tweets Individual tweets. ★ Friends’ Timeline Fixed-length timelines. ★ Users Info and followers. ★ Command Queue Actions to perform (tweet, follow, etc.).
  • 14. Data ★ Command (Java serialization) Keyed by node name, increasing ID. ★ Tweets (Java serialization) Keyed by user name, increasing ID. ★ FriendsTimeline (Java serialization) Keyed by username. List of date, tweet ID. ★ Users (Java serialization) Keyed by username. Followers (list), Followed (list), last tweet ID.
  • 15. Life of a Tweet, Part I 1 Beer me. Users 1.User tweets. 2 2.Find next tweet ID for 3 Commands Web Tier user. 3.Store “tweet Friends Timeline for user” command. Tweets
  • 16. Life of a Tweet, Part II Where's 1. Read next command. Demi with Users my beer?!? 2. Store tweet in user’s timeline 1 (Tweets). Commands 4 Web Tier 3. Store tweet ID in friends’ 3 timelines. Friends Timeline (Requires *many* operations.) 2 4. DELETE command. Tweets
  • 17. Some Patterns ★ “Sequences” are implemented as race-for-non-collision. ★ “Joins” are common keys or keys referenced from values. ★ “Transactions” are idempotent operations with DELETE at the end.
  • 18. Operations ★ Deploy to Amazon EC2 ★ 2 nodes for Voldemort ★ 2 nodes for Tomcat ★ 1 node for Cacti ★ All “small” instances w/RightScale CentOS 5.2 image. ★ Minor inconvenience of “EBS” volume for MySQL for Cacti. (follow Eric Hammond’s tutorial — http://bit.ly/OK5LZ)
  • 19. Deployment ★ Lots of choices for automated rollout (Chef, Capistrano, etc.) ★ Took simplest path — Maven build, Ant (scp/ssh and property substitution tasks), and bash scripts. for i in vn1 vn2; do ant -Dnode=${i} setup-v-node done ★ Takes ~30 seconds to provision a Tomcat or Voldemort node.
  • 20. Dashboarding ★ As above, lots of choices (Cacti — http://bit.ly/qV4gz, Graphite — http://bit.ly/466NAx, etc.) ★ Cacti as simplest choice. yum install -y cacti ★ Vanilla SNMP on nodes for host data. ★ Minimal extensions to Voldemort for stats in Cacti-friendly format.
  • 22. Performance ★ 270 req/sec for getFriendsTimeline against web tier. ★ 21 GETs on V stores to pull data. ★ 5600 req/sec for V is similar to performance reported at NoSQL meetup (20k req/sec) when adjusted for hardware. ★ Cache on the web tier could make this faster... ★ Some hassles when hammering individual keys with rapid updates.
  • 23. Take Aways ★ Linked-list representation deserves some thought (and experiments). Dynomite + Osmos (http://bit.ly/BYMdW) ★ Additional use cases (search, rich API, replies, direct messages, etc.) might alter design. ★ BigTable/HBase approach deserves another look. ★ Source code is available; come and git it. http://github.com/prb/bigbird git://github.com/prb/bigbird.git
  • 24. Coordinates ★ Dan Diephouse (@dandiep) dan@netzooid.com http://netzooid.com ★ Paul Brown (@paulrbrown) prb@mult.ifario.us http://mult.ifario.us/a