SlideShare a Scribd company logo
1 of 24
Download to read offline
Building a Highly
    Scalable, Open
Source Twitter Clone
 Dan Diephouse (dan@netzooid.com)
  Paul Brown (prb@mult.ifario.us)
Motivation
★   Wide (and growing) variety of
    non-relational databases.
    (viz. NoSQL — http://bit.ly/pLhqQ, http://bit.ly/17MmTk)


★   Twitter application model
    presents interesting
    challenges of scope and
    scale.
    (viz. “Fixing Twitter” http://bit.ly/2VmZdz)
Storage Metaphors
★   Key/Value Store
    Opaque values; fast and simple.
★   Examples:
    ★   Cassandra* — http://bit.ly/EdUEt
    ★   Dynomite — http://bit.ly/12AYmf
    ★   Redis — http://bit.ly/LBtCh
    ★   Tokyo Tyrant — http://bit.ly/oU4uV
    ★   Voldemort – http://bit.ly/oU4uV
Key/Value
Key Value

1




2




3
Storage Metaphors
★   Document-Oriented
    Unstructured content; rich queries.
★   Examples:
    ★   CouchDB — http://bit.ly/JAgUM
    ★   MongoDB — http://bit.ly/HDDOV
    ★   SOLR — http://bit.ly/q4gyi
    ★   XML databases...
Document-Oriented
ID=“dan-tweet-1”,
TEXT=“hello world”

ID=dan-tweet-2,
TEXT=“Twirp!”,
IN-REPLY-TO=“paul-tweet-5”
Storage Metaphors
★   Column-Oriented

    Organized in columns; easily scanned.
★   Examples:
    ★   Cassandra* — http://bit.ly/EdUEt

    ★   BigTable — http://bit.ly/QqMYA
        (available within AppEngine)


    ★   HBase — http://bit.ly/Zck7F
    ★   SimpleDB — http://bit.ly/toh0P
        (Typica library for Java — http://bit.ly/22kxZ4)
Column-Oriented
    Name          Date                  Tweet Text
    Bob           20090506              Eating dinner.
    Dan           20090507              Is it Friday yet?
    Dan           20090506              Beer me!
    Ralph         20090508              My bum itches.



Index   Name         Index   Date         Index   Tweet Text
0       Bob          0       20090506     0       Eating dinner.
1       Dan          1       20090507     1       Is it Friday yet?
2       Dan          2       20090506     2       Beer me!
3       Ralph        3       20090508     3       My bum itches.

        Storage          Storage                  Storage
Every Store is Special.

★   Lots of different little tweaks
    to the storage model.
★   Widely varying levels of
    maturity.
★   Growing communities.
★   Limited (but growing) tooling,
    libraries, and production
    adoption.
Reliability Through
        Replication
★   Consistent hashing to assign
    keys to partitions.
★   Partitions replicated on
    multiple nodes for
    redundancy.
★   Minimum number of successful
    reads to consider a write
    complete.
Reliability Through
         Replication
    PUT (k,v)

Client
Web UI
http://tat1.datapr0n.com:8080
Stores
★   Tweets
    Individual tweets.

★   Friends’ Timeline
    Fixed-length timelines.

★   Users
    Info and followers.

★   Command Queue
    Actions to perform (tweet, follow, etc.).
Data
★   Command (Java serialization)
    Keyed by node name, increasing ID.
★   Tweets (Java serialization)
    Keyed by user name, increasing ID.
★   FriendsTimeline (Java serialization)
    Keyed by username.
    List of date, tweet ID.
★   Users (Java serialization)
    Keyed by username.
    Followers (list), Followed (list), last tweet ID.
Life of a Tweet, Part I
                 1

                     Beer me.                    Users
1.User tweets.                             2


2.Find next
  tweet ID for                             3   Commands




                                Web Tier
  user.
3.Store “tweet                                  Friends
                                                Timeline

  for user”
  command.
                                                Tweets
Life of a Tweet, Part II
                         Where's
1. Read next command.   Demi with
                                                          Users
                        my beer?!?
2. Store tweet in
   user’s timeline                              1
   (Tweets).                                            Commands

                                                    4




                                     Web Tier
3. Store tweet ID in
   friends’
                                                    3
   timelines.                                            Friends
                                                         Timeline
   (Requires *many*
   operations.)
                                                    2

4. DELETE command.                                       Tweets
Some Patterns
★   “Sequences” are implemented
    as race-for-non-collision.
★   “Joins” are common keys or
    keys referenced from values.
★   “Transactions” are idempotent
    operations with DELETE at the
    end.
Operations
★   Deploy to Amazon EC2
    ★   2 nodes for Voldemort
    ★   2 nodes for Tomcat
    ★   1 node for Cacti
★   All “small” instances w/RightScale CentOS
    5.2 image.
★   Minor inconvenience of “EBS” volume for
    MySQL for Cacti.
    (follow Eric Hammond’s tutorial — http://bit.ly/OK5LZ)
Deployment
★   Lots of choices for automated rollout
    (Chef, Capistrano, etc.)
★   Took simplest path — Maven build, Ant
    (scp/ssh and property substitution
    tasks), and bash scripts.
    for i in vn1 vn2; do

      ant -Dnode=${i} setup-v-node

    done

★   Takes ~30 seconds to provision a Tomcat
    or Voldemort node.
Dashboarding
★   As above, lots of choices
    (Cacti — http://bit.ly/qV4gz, Graphite — http://bit.ly/466NAx, etc.)


★   Cacti as simplest choice.
    yum install -y cacti

★   Vanilla SNMP on nodes for host
    data.
★   Minimal extensions to Voldemort
    for stats in Cacti-friendly
    format.
Dashboarding
Performance
★   270 req/sec for getFriendsTimeline against
    web tier.
    ★   21 GETs on V stores to pull data.
    ★   5600 req/sec for V is similar to
        performance reported at NoSQL meetup (20k
        req/sec) when adjusted for hardware.
    ★   Cache on the web tier could make this
        faster...
★   Some hassles when hammering individual keys
    with rapid updates.
Take Aways
★   Linked-list representation deserves some thought
    (and experiments).
    Dynomite + Osmos (http://bit.ly/BYMdW)

★   Additional use cases (search, rich API, replies,
    direct messages, etc.) might alter design.
★   BigTable/HBase approach deserves another look.
★   Source code is available; come and git it.

    http://github.com/prb/bigbird

    git://github.com/prb/bigbird.git
Coordinates
★   Dan Diephouse (@dandiep)
    dan@netzooid.com
    http://netzooid.com
★   Paul Brown (@paulrbrown)
    prb@mult.ifario.us
    http://mult.ifario.us/a

More Related Content

What's hot

Blockchain & the IoT
Blockchain & the IoTBlockchain & the IoT
Blockchain & the IoTMat Keep
 
Hardware trojan detection technique using side channel analysis for hardware ...
Hardware trojan detection technique using side channel analysis for hardware ...Hardware trojan detection technique using side channel analysis for hardware ...
Hardware trojan detection technique using side channel analysis for hardware ...Ashish Maurya
 
從開發人員角度十分鐘理解區塊鏈技術
從開發人員角度十分鐘理解區塊鏈技術從開發人員角度十分鐘理解區塊鏈技術
從開發人員角度十分鐘理解區塊鏈技術Will Huang
 
블록체인 이해와 활용
블록체인 이해와 활용블록체인 이해와 활용
블록체인 이해와 활용Seung-Woo Kang
 
Machine Learning for retail and ecommerce
Machine Learning for retail and ecommerceMachine Learning for retail and ecommerce
Machine Learning for retail and ecommerceAndrei Lopatenko
 
Node-REDのノード開発容易化ツール Node generator
Node-REDのノード開発容易化ツールNode generatorNode-REDのノード開発容易化ツールNode generator
Node-REDのノード開発容易化ツール Node generatorBMXUG
 
Hokkaido.cap#9 無線LANのスニッフィング
Hokkaido.cap#9 無線LANのスニッフィングHokkaido.cap#9 無線LANのスニッフィング
Hokkaido.cap#9 無線LANのスニッフィングPanda Yamaki
 
文字コードのお話
文字コードのお話文字コードのお話
文字コードのお話Shunji Konishi
 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed SystemChau Thanh
 
Sparc t4 4 system technical overview
Sparc t4 4 system technical overviewSparc t4 4 system technical overview
Sparc t4 4 system technical overviewsolarisyougood
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
「HDR広色域映像のための色再現性を考慮した色域トーンマッピング」スライド Color Gamut Tone Mapping Considering Ac...
「HDR広色域映像のための色再現性を考慮した色域トーンマッピング」スライド Color Gamut Tone Mapping Considering Ac...「HDR広色域映像のための色再現性を考慮した色域トーンマッピング」スライド Color Gamut Tone Mapping Considering Ac...
「HDR広色域映像のための色再現性を考慮した色域トーンマッピング」スライド Color Gamut Tone Mapping Considering Ac...doboncho
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Blockchain Technology & Capstone Project
Blockchain Technology & Capstone ProjectBlockchain Technology & Capstone Project
Blockchain Technology & Capstone ProjectKevin Zheng
 
이것이 레디스다.
이것이 레디스다.이것이 레디스다.
이것이 레디스다.Kris Jeong
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
無線LANデバイスについて(kernelレベル)
無線LANデバイスについて(kernelレベル) 無線LANデバイスについて(kernelレベル)
無線LANデバイスについて(kernelレベル) Yuki Uchikoba
 

What's hot (20)

Blockchain & the IoT
Blockchain & the IoTBlockchain & the IoT
Blockchain & the IoT
 
Hardware trojan detection technique using side channel analysis for hardware ...
Hardware trojan detection technique using side channel analysis for hardware ...Hardware trojan detection technique using side channel analysis for hardware ...
Hardware trojan detection technique using side channel analysis for hardware ...
 
從開發人員角度十分鐘理解區塊鏈技術
從開發人員角度十分鐘理解區塊鏈技術從開發人員角度十分鐘理解區塊鏈技術
從開發人員角度十分鐘理解區塊鏈技術
 
블록체인 이해와 활용
블록체인 이해와 활용블록체인 이해와 활용
블록체인 이해와 활용
 
Machine Learning for retail and ecommerce
Machine Learning for retail and ecommerceMachine Learning for retail and ecommerce
Machine Learning for retail and ecommerce
 
Node-REDのノード開発容易化ツール Node generator
Node-REDのノード開発容易化ツールNode generatorNode-REDのノード開発容易化ツールNode generator
Node-REDのノード開発容易化ツール Node generator
 
Hokkaido.cap#9 無線LANのスニッフィング
Hokkaido.cap#9 無線LANのスニッフィングHokkaido.cap#9 無線LANのスニッフィング
Hokkaido.cap#9 無線LANのスニッフィング
 
文字コードのお話
文字コードのお話文字コードのお話
文字コードのお話
 
Blockchain in E-Commerce
Blockchain in E-CommerceBlockchain in E-Commerce
Blockchain in E-Commerce
 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed System
 
Sparc t4 4 system technical overview
Sparc t4 4 system technical overviewSparc t4 4 system technical overview
Sparc t4 4 system technical overview
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
「HDR広色域映像のための色再現性を考慮した色域トーンマッピング」スライド Color Gamut Tone Mapping Considering Ac...
「HDR広色域映像のための色再現性を考慮した色域トーンマッピング」スライド Color Gamut Tone Mapping Considering Ac...「HDR広色域映像のための色再現性を考慮した色域トーンマッピング」スライド Color Gamut Tone Mapping Considering Ac...
「HDR広色域映像のための色再現性を考慮した色域トーンマッピング」スライド Color Gamut Tone Mapping Considering Ac...
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Bit flipping attack on aes cbc - ashutosh ahelleya
Bit flipping attack on aes cbc -	ashutosh ahelleyaBit flipping attack on aes cbc -	ashutosh ahelleya
Bit flipping attack on aes cbc - ashutosh ahelleya
 
Blockchain Technology & Capstone Project
Blockchain Technology & Capstone ProjectBlockchain Technology & Capstone Project
Blockchain Technology & Capstone Project
 
이것이 레디스다.
이것이 레디스다.이것이 레디스다.
이것이 레디스다.
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
無線LANデバイスについて(kernelレベル)
無線LANデバイスについて(kernelレベル) 無線LANデバイスについて(kernelレベル)
無線LANデバイスについて(kernelレベル)
 

Similar to Building a Highly Scalable, Open Source Twitter Clone

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeDavid Boike
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdfYogeshwaran R
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMGuillaume Arnaud
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF WorkshopIdo Flatow
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Rich Bowen
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Grand Central Dispatch
Grand Central DispatchGrand Central Dispatch
Grand Central DispatchRobert Brown
 
Real world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalReal world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalHoward Glynn
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialTom Croucher
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesPeter Hlavaty
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshynvtors
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task QueueRichard Leland
 
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 BostonScaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 Bostonbenbrowning
 
DCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDocker, Inc.
 
Post Metasploitation
Post MetasploitationPost Metasploitation
Post Metasploitationegypt
 

Similar to Building a Highly Scalable, Open Source Twitter Clone (20)

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught Me
 
IP Multicast on ec2
IP Multicast on ec2IP Multicast on ec2
IP Multicast on ec2
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdf
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF Workshop
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
spdy
spdyspdy
spdy
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Grand Central Dispatch
Grand Central DispatchGrand Central Dispatch
Grand Central Dispatch
 
Real world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalReal world cloud formation feb 2014 final
Real world cloud formation feb 2014 final
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js Tutorial
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshyn
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 BostonScaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
 
Demystfying container-networking
Demystfying container-networkingDemystfying container-networking
Demystfying container-networking
 
DCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDCSF19 Containers for Beginners
DCSF19 Containers for Beginners
 
Post Metasploitation
Post MetasploitationPost Metasploitation
Post Metasploitation
 

Recently uploaded

Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
CHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopCHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopObject Automation
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
20200723_insight_release_plan
20200723_insight_release_plan20200723_insight_release_plan
20200723_insight_release_planJamie (Taka) Wang
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxChimezie Ogbuji
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Dragino Technology LoRaWANデバイス、ゲートウェイ ユースケース
Dragino Technology   LoRaWANデバイス、ゲートウェイ ユースケースDragino Technology   LoRaWANデバイス、ゲートウェイ ユースケース
Dragino Technology LoRaWANデバイス、ゲートウェイ ユースケースCRI Japan, Inc.
 
AI-based audio transcription solutions (IDP)
AI-based audio transcription solutions (IDP)AI-based audio transcription solutions (IDP)
AI-based audio transcription solutions (IDP)KapilVaidya4
 
Grade 10 Computer Lesson 5: What is Blogging?
Grade 10 Computer Lesson 5: What is Blogging?Grade 10 Computer Lesson 5: What is Blogging?
Grade 10 Computer Lesson 5: What is Blogging?rosariokimberlyannma
 
ServiceNow Integration with MuleSoft.pptx
ServiceNow Integration with MuleSoft.pptxServiceNow Integration with MuleSoft.pptx
ServiceNow Integration with MuleSoft.pptxshyamraj55
 
LLM Threats: Prompt Injections and Jailbreak Attacks
LLM Threats: Prompt Injections and Jailbreak AttacksLLM Threats: Prompt Injections and Jailbreak Attacks
LLM Threats: Prompt Injections and Jailbreak AttacksThien Q. Tran
 
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...James Anderson
 

Recently uploaded (20)

Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
CHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshopCHIPS Alliance_Object Automation Inc_workshop
CHIPS Alliance_Object Automation Inc_workshop
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
20200723_insight_release_plan
20200723_insight_release_plan20200723_insight_release_plan
20200723_insight_release_plan
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Dragino Technology LoRaWANデバイス、ゲートウェイ ユースケース
Dragino Technology   LoRaWANデバイス、ゲートウェイ ユースケースDragino Technology   LoRaWANデバイス、ゲートウェイ ユースケース
Dragino Technology LoRaWANデバイス、ゲートウェイ ユースケース
 
AI-based audio transcription solutions (IDP)
AI-based audio transcription solutions (IDP)AI-based audio transcription solutions (IDP)
AI-based audio transcription solutions (IDP)
 
Grade 10 Computer Lesson 5: What is Blogging?
Grade 10 Computer Lesson 5: What is Blogging?Grade 10 Computer Lesson 5: What is Blogging?
Grade 10 Computer Lesson 5: What is Blogging?
 
ServiceNow Integration with MuleSoft.pptx
ServiceNow Integration with MuleSoft.pptxServiceNow Integration with MuleSoft.pptx
ServiceNow Integration with MuleSoft.pptx
 
LLM Threats: Prompt Injections and Jailbreak Attacks
LLM Threats: Prompt Injections and Jailbreak AttacksLLM Threats: Prompt Injections and Jailbreak Attacks
LLM Threats: Prompt Injections and Jailbreak Attacks
 
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
GDG Cloud Southlake 31: Santosh Chennuri and Festus Yeboah: Empowering Develo...
 

Building a Highly Scalable, Open Source Twitter Clone

  • 1. Building a Highly Scalable, Open Source Twitter Clone Dan Diephouse (dan@netzooid.com) Paul Brown (prb@mult.ifario.us)
  • 2. Motivation ★ Wide (and growing) variety of non-relational databases. (viz. NoSQL — http://bit.ly/pLhqQ, http://bit.ly/17MmTk) ★ Twitter application model presents interesting challenges of scope and scale. (viz. “Fixing Twitter” http://bit.ly/2VmZdz)
  • 3. Storage Metaphors ★ Key/Value Store Opaque values; fast and simple. ★ Examples: ★ Cassandra* — http://bit.ly/EdUEt ★ Dynomite — http://bit.ly/12AYmf ★ Redis — http://bit.ly/LBtCh ★ Tokyo Tyrant — http://bit.ly/oU4uV ★ Voldemort – http://bit.ly/oU4uV
  • 5. Storage Metaphors ★ Document-Oriented Unstructured content; rich queries. ★ Examples: ★ CouchDB — http://bit.ly/JAgUM ★ MongoDB — http://bit.ly/HDDOV ★ SOLR — http://bit.ly/q4gyi ★ XML databases...
  • 7. Storage Metaphors ★ Column-Oriented Organized in columns; easily scanned. ★ Examples: ★ Cassandra* — http://bit.ly/EdUEt ★ BigTable — http://bit.ly/QqMYA (available within AppEngine) ★ HBase — http://bit.ly/Zck7F ★ SimpleDB — http://bit.ly/toh0P (Typica library for Java — http://bit.ly/22kxZ4)
  • 8. Column-Oriented Name Date Tweet Text Bob 20090506 Eating dinner. Dan 20090507 Is it Friday yet? Dan 20090506 Beer me! Ralph 20090508 My bum itches. Index Name Index Date Index Tweet Text 0 Bob 0 20090506 0 Eating dinner. 1 Dan 1 20090507 1 Is it Friday yet? 2 Dan 2 20090506 2 Beer me! 3 Ralph 3 20090508 3 My bum itches. Storage Storage Storage
  • 9. Every Store is Special. ★ Lots of different little tweaks to the storage model. ★ Widely varying levels of maturity. ★ Growing communities. ★ Limited (but growing) tooling, libraries, and production adoption.
  • 10. Reliability Through Replication ★ Consistent hashing to assign keys to partitions. ★ Partitions replicated on multiple nodes for redundancy. ★ Minimum number of successful reads to consider a write complete.
  • 11. Reliability Through Replication PUT (k,v) Client
  • 13. Stores ★ Tweets Individual tweets. ★ Friends’ Timeline Fixed-length timelines. ★ Users Info and followers. ★ Command Queue Actions to perform (tweet, follow, etc.).
  • 14. Data ★ Command (Java serialization) Keyed by node name, increasing ID. ★ Tweets (Java serialization) Keyed by user name, increasing ID. ★ FriendsTimeline (Java serialization) Keyed by username. List of date, tweet ID. ★ Users (Java serialization) Keyed by username. Followers (list), Followed (list), last tweet ID.
  • 15. Life of a Tweet, Part I 1 Beer me. Users 1.User tweets. 2 2.Find next tweet ID for 3 Commands Web Tier user. 3.Store “tweet Friends Timeline for user” command. Tweets
  • 16. Life of a Tweet, Part II Where's 1. Read next command. Demi with Users my beer?!? 2. Store tweet in user’s timeline 1 (Tweets). Commands 4 Web Tier 3. Store tweet ID in friends’ 3 timelines. Friends Timeline (Requires *many* operations.) 2 4. DELETE command. Tweets
  • 17. Some Patterns ★ “Sequences” are implemented as race-for-non-collision. ★ “Joins” are common keys or keys referenced from values. ★ “Transactions” are idempotent operations with DELETE at the end.
  • 18. Operations ★ Deploy to Amazon EC2 ★ 2 nodes for Voldemort ★ 2 nodes for Tomcat ★ 1 node for Cacti ★ All “small” instances w/RightScale CentOS 5.2 image. ★ Minor inconvenience of “EBS” volume for MySQL for Cacti. (follow Eric Hammond’s tutorial — http://bit.ly/OK5LZ)
  • 19. Deployment ★ Lots of choices for automated rollout (Chef, Capistrano, etc.) ★ Took simplest path — Maven build, Ant (scp/ssh and property substitution tasks), and bash scripts. for i in vn1 vn2; do ant -Dnode=${i} setup-v-node done ★ Takes ~30 seconds to provision a Tomcat or Voldemort node.
  • 20. Dashboarding ★ As above, lots of choices (Cacti — http://bit.ly/qV4gz, Graphite — http://bit.ly/466NAx, etc.) ★ Cacti as simplest choice. yum install -y cacti ★ Vanilla SNMP on nodes for host data. ★ Minimal extensions to Voldemort for stats in Cacti-friendly format.
  • 22. Performance ★ 270 req/sec for getFriendsTimeline against web tier. ★ 21 GETs on V stores to pull data. ★ 5600 req/sec for V is similar to performance reported at NoSQL meetup (20k req/sec) when adjusted for hardware. ★ Cache on the web tier could make this faster... ★ Some hassles when hammering individual keys with rapid updates.
  • 23. Take Aways ★ Linked-list representation deserves some thought (and experiments). Dynomite + Osmos (http://bit.ly/BYMdW) ★ Additional use cases (search, rich API, replies, direct messages, etc.) might alter design. ★ BigTable/HBase approach deserves another look. ★ Source code is available; come and git it. http://github.com/prb/bigbird git://github.com/prb/bigbird.git
  • 24. Coordinates ★ Dan Diephouse (@dandiep) dan@netzooid.com http://netzooid.com ★ Paul Brown (@paulrbrown) prb@mult.ifario.us http://mult.ifario.us/a