SlideShare a Scribd company logo
1 of 51
Download to read offline
DISQUS
                         Building Scalable Web Apps



                                 David Cramer
                                    @zeeg




Tuesday, June 21, 2011
Agenda



                •        Terminology
                •        Common bottlenecks


                •        Building a scalable app
                         •   Architecting your database
                         •   Utilizing a Queue
                         •   The importance of an API



Tuesday, June 21, 2011
Performance vs. Scalability
                         “Performance measures the speed with
                         which a single request can be executed,
                         while scalability measures the ability of a
                         request to maintain its performance under
                         increasing load.”




                               (but we’re not just going to scale your code)


Tuesday, June 21, 2011
Sharding

                         “Database sharding is a method of
                         horizontally partitioning data by common
                         properties”




Tuesday, June 21, 2011
Denormalization
                         “Denormalization is the process of
                         attempting to optimize the performance of
                         a database by adding redundant data or by
                         grouping data.”




Tuesday, June 21, 2011
Common Bottlenecks




                •        Database (almost always)
                •        Caching, Invalidation
                •        Lack of metrics, lack of tests




Tuesday, June 21, 2011
Building Tweeter




Tuesday, June 21, 2011
Getting Started




                •        Pick a framework: Django, Flask, Pyramid
                •        Package your app; Repeatability
                •        Solve problems
                •        Invest in architecture




Tuesday, June 21, 2011
Let’s use Django




Tuesday, June 21, 2011
Tuesday, June 21, 2011
Django is..




                •        Fast (enough)
                •        Loaded with goodies
                •        Maintained
                •        Tested
                •        Used




Tuesday, June 21, 2011
Packaging Matters




Tuesday, June 21, 2011
setup.py

                     #!/usr/bin/env python
                     from setuptools import setup, find_packages

                     setup(
                           name='tweeter',
                          version='0.1',
                          packages=find_packages(),
                          install_requires=[
                               'Django==1.3',
                          ],
                          package_data={
                               'tweeter': [
                                   'static/*.*',
                                   'templates/*.*',
                               ],
                           },
                     )



Tuesday, June 21, 2011
setup.py (cont.)




                     $   mkvirtualenv tweeter
                     $   git clone git.example.com:tweeter.git
                     $   cd tweeter
                     $   python setup.py develop




Tuesday, June 21, 2011
setup.py (cont.)




                     ## fabfile.py
                     def setup():
                         run('git clone git.example.com:tweeter.git')
                         run('cd tweeter')
                         run('./bootstrap.sh')



                     ## bootstrap.sh
                     #!/usr/bin/env bash
                     virtualenv env
                     env/bin/python setup.py develop




Tuesday, June 21, 2011
setup.py (cont.)



                     $ fab web setup

                     setup   executed   on   web1
                     setup   executed   on   web2
                     setup   executed   on   web3
                     setup   executed   on   web4
                     setup   executed   on   web5
                     setup   executed   on   web6
                     setup   executed   on   web7
                     setup   executed   on   web8
                     setup   executed   on   web9
                     setup   executed   on   web10




Tuesday, June 21, 2011
Database(s) First




Tuesday, June 21, 2011
Databases




          •     Usually core
          •     Common bottleneck
          •     Hard to change
          •     Tedious to scale

                                               http://www.flickr.com/photos/adesigna/3237575990/




Tuesday, June 21, 2011
What a tweet “looks” like




Tuesday, June 21, 2011
Modeling the data


                     from django.db import models

                     class Tweet(models.Model):
                         user    = models.ForeignKey(User)
                         message = models.CharField(max_length=140)
                         date    = models.DateTimeField(auto_now_add=True)
                         parent = models.ForeignKey('self', null=True)




                     class Relationship(models.Model):
                         from_user = models.ForeignKey(User)
                         to_user   = models.ForeignKey(User)



                                    (Remember, bare bones!)


Tuesday, June 21, 2011
Public Timeline


                     # public timeline
                     SELECT * FROM tweets
                       ORDER BY date DESC
                       LIMIT 100;




                •        Scales to the size of one physical machine
                •        Heavy index, long tail
                •        Easy to cache, invalidate




Tuesday, June 21, 2011
Following Timeline

                     # tweets from people you follow
                     SELECT t.* FROM tweets AS t
                       JOIN relationships AS r
                         ON r.to_user_id = t.user_id
                       WHERE r.from_user_id = '1'
                       ORDER BY t.date DESC
                       LIMIT 100



                •        No vertical partitions
                •        Heavy index, long tail
                •        “Necessary evil” join
                •        Easy to cache, expensive to invalidate


Tuesday, June 21, 2011
Materializing Views




                     PUBLIC_TIMELINE = []

                     def on_tweet_creation(tweet):
                         global PUBLIC_TIME

                         PUBLIC_TIMELINE.insert(0, tweet)

                     def get_latest_tweets(num=100):
                         return PUBLIC_TIMELINE[:num]




                                 Disclaimer: don’t try this at home


Tuesday, June 21, 2011
Introducing Redis


                     class PublicTimeline(object):
                         def __init__(self):
                             self.conn = Redis()
                             self.key = 'timeline:public'

                         def add(self, tweet):
                             score = float(tweet.date.strftime('%s.%m'))
                             self.conn.zadd(self.key, tweet.id, score)

                         def remove(self, tweet):
                             self.conn.zrem(self.key, tweet.id)

                         def list(self, offset=0, limit=-1):
                             tweet_ids = self.conn.zrevrange(self.key, offset, limit)

                             return tweet_ids




Tuesday, June 21, 2011
Cleaning Up




                     from datetime import datetime, timedelta

                     class PublicTimeline(object):
                         def truncate(self):
                             # Remove entries older than 30 days
                             d30 = datetime.now() - timedelta(days=30)
                             score = float(d30.strftime('%s.%m'))
                             self.conn.zremrangebyscore(self.key, d30, -1)




Tuesday, June 21, 2011
Scaling Redis




                     from nydus.db import create_cluster

                     class PublicTimeline(object):
                         def __init__(self):
                             # create a cluster of 9 dbs
                             self.conn = create_cluster({
                                 'engine': 'nydus.db.backends.redis.Redis',
                                 'router': 'nydus.db.routers.redis.PartitionRouter',
                                 'hosts': dict((n, {'db': n}) for n in xrange(64)),
                             })




Tuesday, June 21, 2011
Nydus


                     # create a cluster of Redis connections which
                     # partition reads/writes by key (hash(key) % size)

                     from nydus.db import create_cluster
                     redis = create_cluster({
                         'engine': 'nydus.db.backends.redis.Redis',
                         'router': 'nydus.db...redis.PartitionRouter',
                         'hosts': {
                             0: {'db': 0},
                         }
                     })

                     # maps to a single node
                     res = conn.incr('foo')
                     assert res == 1

                     # executes on all nodes
                     conn.flushdb()



                                        http://github.com/disqus/nydus


Tuesday, June 21, 2011
Vertical vs. Horizontal




Tuesday, June 21, 2011
Looking at the Cluster




                               sql-1-master             sql-1-slave




                                              redis-1


                         DB0       DB1         DB2         DB3        DB4




                         DB5       DB6         DB7         DB8        DB9




Tuesday, June 21, 2011
“Tomorrow’s” Cluster


                         sql-1-master         sql-1-users         sql-1-tweets



                                               redis-1


                             DB0        DB1      DB2        DB3      DB4




                                               redis-2


                             DB5        DB6      DB7        DB8      DB9




Tuesday, June 21, 2011
Asynchronous Tasks




Tuesday, June 21, 2011
In-Process Limitations




                     def on_tweet_creation(tweet):
                         # O(1) for public timeline
                         PublicTimeline.add(tweet)

                         # O(n) for users following author
                         for user_id in tweet.user.followers.all():
                             FollowingTimeline.add(user_id, tweet)

                         # O(1) for profile timeline (my tweets)
                         ProfileTimeline.add(tweet.user_id, tweet)




Tuesday, June 21, 2011
In-Process Limitations (cont.)




                         # O(n) for users following author
                         # 7 MILLION writes for Ashton Kutcher
                         for user_id in tweet.user.followers.all():
                             FollowingTimeline.add(user_id, tweet)




Tuesday, June 21, 2011
Introducing Celery




                     #!/usr/bin/env python
                     from setuptools import setup, find_packages

                     setup(
                          install_requires=[
                               'Django==1.3',
                               'django-celery==2.2.4',
                          ],
                           # ...
                     )




Tuesday, June 21, 2011
Introducing Celery (cont.)




                     @task(exchange='tweet_creation')
                     def on_tweet_creation(tweet_dict):
                         # HACK: not the best idea
                         tweet = Tweet()
                         tweet.__dict__ = tweet_dict

                         # O(n) for users following author
                         for user_id in tweet.user.followers.all():
                             FollowingTimeline.add(user_id, tweet)

                     on_tweet_creation.delay(tweet.__dict__)




Tuesday, June 21, 2011
Bringing It Together

                def home(request):
                    "Shows the latest 100 tweets from your follow stream"

                         if random.randint(0, 9) == 0:
                             return render('fail_whale.html')

                         ids = FollowingTimeline.list(
                             user_id=request.user.id,
                             limit=100,
                         )

                         res = dict((str(t.id), t) for t in 
                                       Tweet.objects.filter(id__in=ids))

                         tweets = []
                         for tweet_id in ids:
                             if tweet_id not in res:
                                 continue
                             tweets.append(res[tweet_id])

                         return render('home.html', {'tweets': tweets})



Tuesday, June 21, 2011
Build an API




Tuesday, June 21, 2011
APIs




           •     PublicTimeline.list
           •     redis.zrange
           •     Tweet.objects.all()
           •     example.com/api/tweets/




Tuesday, June 21, 2011
Refactoring

                def home(request):
                    "Shows the latest 100 tweets from your follow stream"

                         tweet_ids = FollowingTimeline.list(
                             user_id=request.user.id,
                             limit=100,
                         )




                def home(request):
                    "Shows the latest 100 tweets from your follow stream"

                         tweets = FollowingTimeline.list(
                             user_id=request.user.id,
                             limit=100,
                         )



Tuesday, June 21, 2011
Refactoring (cont.)




                     from datetime import datetime, timedelta

                     class PublicTimeline(object):
                        def list(self, offset=0, limit=-1):
                            ids = self.conn.zrevrange(self.key, offset, limit)

                            cache = dict((t.id, t) for t in 
                                         Tweet.objects.filter(id__in=ids))

                            return filter(None, (cache.get(i) for i in ids))




Tuesday, June 21, 2011
Optimization in the API

                 class PublicTimeline(object):
                    def list(self, offset=0, limit=-1):
                        ids = self.conn.zrevrange(self.list_key, offset, limit)

                         # pull objects from a hash map (cache) in Redis
                         cache = dict((i, self.conn.get(self.hash_key(i)))
                                      for i in ids)

                         if not all(cache.itervalues()):
                             # fetch missing from database
                             missing = [i for i, c in cache.iteritems() if not c]
                             m_cache = dict((str(t.id), t) for t in 
                                            Tweet.objects.filter(id__in=missing))

                             # push missing back into cache
                             cache.update(m_cache)
                             for i, c in m_cache.iteritems():
                                 self.conn.set(hash_key(i), c)

                         # return only results that still exist
                         return filter(None, (cache.get(i) for i in ids))



Tuesday, June 21, 2011
Optimization in the API (cont.)




                 def list(self, offset=0, limit=-1):
                        ids = self.conn.zrevrange(self.list_key, offset, limit)

                         # pull objects from a hash map (cache) in Redis
                         cache = dict((i, self.conn.get(self.hash_key(i)))
                                      for i in ids)




                                      Store each object in it’s own key




Tuesday, June 21, 2011
Optimization in the API (cont.)




                         if not all(cache.itervalues()):
                             # fetch missing from database
                             missing = [i for i, c in cache.iteritems() if not c]
                             m_cache = dict((str(t.id), t) for t in 
                                            Tweet.objects.filter(id__in=missing))




                                 Hit the database for misses




Tuesday, June 21, 2011
Optimization in the API (cont.)


                                    Store misses back in the cache

                             # push missing back into cache
                             cache.update(m_cache)
                             for i, c in m_cache.iteritems():
                                 self.conn.set(hash_key(i), c)

                         # return only results that still exist
                         return filter(None, (cache.get(i) for i in ids))




                          Ignore database misses



Tuesday, June 21, 2011
(In)validate the Cache




                 class PublicTimeline(object):
                    def add(self, tweet):
                         score = float(tweet.date.strftime('%s.%m'))

                         # add the tweet into the object cache
                         self.conn.set(self.make_key(tweet.id), tweet)

                         # add the tweet to the materialized view
                         self.conn.zadd(self.list_key, tweet.id, score)




Tuesday, June 21, 2011
(In)validate the Cache




                 class PublicTimeline(object):
                    def remove(self, tweet):
                         # remove the tweet from the materialized view
                         self.conn.zrem(self.key, tweet.id)

                         # we COULD remove the tweet from the object cache
                         self.conn.del(self.make_key(tweet.id))




Tuesday, June 21, 2011
Wrap Up




Tuesday, June 21, 2011
Reflection




                •        Use a framework!
                •        Start simple; grow naturally
                •        Scale can lead to performance
                         •   Not the other way around
                •        Consolidate entry points




Tuesday, June 21, 2011
Reflection (cont.)




                •        100 shards > 10; Rebalancing sucks
                         •   Use VMs
                •        Push to caches, don’t pull
                •        “Denormalize” counters, views
                •        Queue everything




Tuesday, June 21, 2011
Food for Thought




                •        Normalize object cache keys
                •        Application triggers directly to queue
                •        Rethink pagination
                •        Build with future-sharding in mind




Tuesday, June 21, 2011
DISQUS
                           Questions?




                           psst, we’re hiring
                          jobs@disqus.com

Tuesday, June 21, 2011

More Related Content

Viewers also liked

Iria a Todo EL MUNDO
Iria a Todo EL MUNDOIria a Todo EL MUNDO
Iria a Todo EL MUNDOguest8d485e
 
香港六合彩
香港六合彩香港六合彩
香港六合彩wejia
 
Introduction to chef framework
Introduction to chef frameworkIntroduction to chef framework
Introduction to chef frameworkmorgoth
 
La ReconstruccióN Del PaíS
La ReconstruccióN Del PaíSLa ReconstruccióN Del PaíS
La ReconstruccióN Del PaíSmichellchd
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsAgustin Benito Bethencourt
 
Guillems De La Historia
Guillems De La HistoriaGuillems De La Historia
Guillems De La Historiaguestd0403f
 
Battle of luoisbourg keynote0
Battle of luoisbourg keynote0Battle of luoisbourg keynote0
Battle of luoisbourg keynote0iamcanehdian
 
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...Paul McElvaney
 
Lecture somerset webversie
Lecture somerset webversieLecture somerset webversie
Lecture somerset webversieSjef Kerkhofs
 
Right People For The Right Job!
Right People For The Right Job!Right People For The Right Job!
Right People For The Right Job!Geethashree N
 
Aag presentatie 3 februari
Aag presentatie 3 februariAag presentatie 3 februari
Aag presentatie 3 februariSjef Kerkhofs
 
Culto Ferias - 05.07.07
Culto Ferias - 05.07.07Culto Ferias - 05.07.07
Culto Ferias - 05.07.07Jubrac Jacui
 
Matteo baccan raspberry pi - linox 2014
Matteo baccan   raspberry pi - linox 2014Matteo baccan   raspberry pi - linox 2014
Matteo baccan raspberry pi - linox 2014Matteo Baccan
 
Spanish Final Project
Spanish Final ProjectSpanish Final Project
Spanish Final Projectmonikak02
 
MISIÓN INTERCULTURALIDAD - 1E-A
MISIÓN INTERCULTURALIDAD - 1E-AMISIÓN INTERCULTURALIDAD - 1E-A
MISIÓN INTERCULTURALIDAD - 1E-AJuan Serrano Pérez
 
維基經濟學
維基經濟學維基經濟學
維基經濟學sdfvb
 

Viewers also liked (20)

Iria a Todo EL MUNDO
Iria a Todo EL MUNDOIria a Todo EL MUNDO
Iria a Todo EL MUNDO
 
Colores
ColoresColores
Colores
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Introduction to chef framework
Introduction to chef frameworkIntroduction to chef framework
Introduction to chef framework
 
01.2008 AcampãO
01.2008   AcampãO01.2008   AcampãO
01.2008 AcampãO
 
La ReconstruccióN Del PaíS
La ReconstruccióN Del PaíSLa ReconstruccióN Del PaíS
La ReconstruccióN Del PaíS
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
 
Jane's walk 2012 evolution of ottawa
Jane's walk 2012   evolution of ottawaJane's walk 2012   evolution of ottawa
Jane's walk 2012 evolution of ottawa
 
Guillems De La Historia
Guillems De La HistoriaGuillems De La Historia
Guillems De La Historia
 
Battle of luoisbourg keynote0
Battle of luoisbourg keynote0Battle of luoisbourg keynote0
Battle of luoisbourg keynote0
 
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
Kim Brown, Joint Head of Learning Development & Diversity at London Boroughs ...
 
Lecture somerset webversie
Lecture somerset webversieLecture somerset webversie
Lecture somerset webversie
 
Right People For The Right Job!
Right People For The Right Job!Right People For The Right Job!
Right People For The Right Job!
 
Aag presentatie 3 februari
Aag presentatie 3 februariAag presentatie 3 februari
Aag presentatie 3 februari
 
Culto Ferias - 05.07.07
Culto Ferias - 05.07.07Culto Ferias - 05.07.07
Culto Ferias - 05.07.07
 
Matteo baccan raspberry pi - linox 2014
Matteo baccan   raspberry pi - linox 2014Matteo baccan   raspberry pi - linox 2014
Matteo baccan raspberry pi - linox 2014
 
Spanish Final Project
Spanish Final ProjectSpanish Final Project
Spanish Final Project
 
MISIÓN INTERCULTURALIDAD - 1E-A
MISIÓN INTERCULTURALIDAD - 1E-AMISIÓN INTERCULTURALIDAD - 1E-A
MISIÓN INTERCULTURALIDAD - 1E-A
 
維基經濟學
維基經濟學維基經濟學
維基經濟學
 
Bookevent
BookeventBookevent
Bookevent
 

Similar to Building Scalable Web Apps

Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDDAndrea Francia
 
Infusion for the birds
Infusion for the birdsInfusion for the birds
Infusion for the birdscolinbdclark
 
Easy Java Integration Testing with Testcontainers​
Easy Java Integration Testing with Testcontainers​Easy Java Integration Testing with Testcontainers​
Easy Java Integration Testing with Testcontainers​Payara
 
Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)ecommerce poland expo
 
Java EE and Google App Engine
Java EE and Google App EngineJava EE and Google App Engine
Java EE and Google App EngineArun Gupta
 
Governing services, data, rules, processes and more
Governing services, data, rules, processes and moreGoverning services, data, rules, processes and more
Governing services, data, rules, processes and moreRandall Hauch
 
Android 1.5 to 3.0: a compatibility journey
Android 1.5 to 3.0: a compatibility journeyAndroid 1.5 to 3.0: a compatibility journey
Android 1.5 to 3.0: a compatibility journeyEmanuele Di Saverio
 
Develop Gwt application in TDD
Develop Gwt application in TDDDevelop Gwt application in TDD
Develop Gwt application in TDDUberto Barbini
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastoreikailan
 
Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)zeeg
 
Introducing the App Engine datastore
Introducing the App Engine datastoreIntroducing the App Engine datastore
Introducing the App Engine datastoreikailan
 
every-day-automation
every-day-automationevery-day-automation
every-day-automationAmir Barylko
 
20110903 candycane
20110903 candycane20110903 candycane
20110903 candycaneYusuke Ando
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebJakub Nesetril
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHPConFoo
 

Similar to Building Scalable Web Apps (20)

Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDD
 
Infusion for the birds
Infusion for the birdsInfusion for the birds
Infusion for the birds
 
Easy Java Integration Testing with Testcontainers​
Easy Java Integration Testing with Testcontainers​Easy Java Integration Testing with Testcontainers​
Easy Java Integration Testing with Testcontainers​
 
Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)Backbone.js - Michał Taberski (PRUG 2.0)
Backbone.js - Michał Taberski (PRUG 2.0)
 
What's New in GWT 2.2
What's New in GWT 2.2What's New in GWT 2.2
What's New in GWT 2.2
 
Anarchist guide to titanium ui
Anarchist guide to titanium uiAnarchist guide to titanium ui
Anarchist guide to titanium ui
 
Java EE and Google App Engine
Java EE and Google App EngineJava EE and Google App Engine
Java EE and Google App Engine
 
Governing services, data, rules, processes and more
Governing services, data, rules, processes and moreGoverning services, data, rules, processes and more
Governing services, data, rules, processes and more
 
Android 1.5 to 3.0: a compatibility journey
Android 1.5 to 3.0: a compatibility journeyAndroid 1.5 to 3.0: a compatibility journey
Android 1.5 to 3.0: a compatibility journey
 
NoSQL
NoSQLNoSQL
NoSQL
 
Develop Gwt application in TDD
Develop Gwt application in TDDDevelop Gwt application in TDD
Develop Gwt application in TDD
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
 
Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)
 
Simple Build Tool
Simple Build ToolSimple Build Tool
Simple Build Tool
 
Introducing the App Engine datastore
Introducing the App Engine datastoreIntroducing the App Engine datastore
Introducing the App Engine datastore
 
every-day-automation
every-day-automationevery-day-automation
every-day-automation
 
RunDeck
RunDeckRunDeck
RunDeck
 
20110903 candycane
20110903 candycane20110903 candycane
20110903 candycane
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time Web
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHP
 

Recently uploaded

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 

Recently uploaded (20)

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 

Building Scalable Web Apps

  • 1. DISQUS Building Scalable Web Apps David Cramer @zeeg Tuesday, June 21, 2011
  • 2. Agenda • Terminology • Common bottlenecks • Building a scalable app • Architecting your database • Utilizing a Queue • The importance of an API Tuesday, June 21, 2011
  • 3. Performance vs. Scalability “Performance measures the speed with which a single request can be executed, while scalability measures the ability of a request to maintain its performance under increasing load.” (but we’re not just going to scale your code) Tuesday, June 21, 2011
  • 4. Sharding “Database sharding is a method of horizontally partitioning data by common properties” Tuesday, June 21, 2011
  • 5. Denormalization “Denormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data.” Tuesday, June 21, 2011
  • 6. Common Bottlenecks • Database (almost always) • Caching, Invalidation • Lack of metrics, lack of tests Tuesday, June 21, 2011
  • 8. Getting Started • Pick a framework: Django, Flask, Pyramid • Package your app; Repeatability • Solve problems • Invest in architecture Tuesday, June 21, 2011
  • 11. Django is.. • Fast (enough) • Loaded with goodies • Maintained • Tested • Used Tuesday, June 21, 2011
  • 13. setup.py #!/usr/bin/env python from setuptools import setup, find_packages setup( name='tweeter',    version='0.1',    packages=find_packages(),    install_requires=[     'Django==1.3',    ],    package_data={ 'tweeter': [ 'static/*.*', 'templates/*.*', ], }, ) Tuesday, June 21, 2011
  • 14. setup.py (cont.) $ mkvirtualenv tweeter $ git clone git.example.com:tweeter.git $ cd tweeter $ python setup.py develop Tuesday, June 21, 2011
  • 15. setup.py (cont.) ## fabfile.py def setup(): run('git clone git.example.com:tweeter.git') run('cd tweeter') run('./bootstrap.sh') ## bootstrap.sh #!/usr/bin/env bash virtualenv env env/bin/python setup.py develop Tuesday, June 21, 2011
  • 16. setup.py (cont.) $ fab web setup setup executed on web1 setup executed on web2 setup executed on web3 setup executed on web4 setup executed on web5 setup executed on web6 setup executed on web7 setup executed on web8 setup executed on web9 setup executed on web10 Tuesday, June 21, 2011
  • 18. Databases • Usually core • Common bottleneck • Hard to change • Tedious to scale http://www.flickr.com/photos/adesigna/3237575990/ Tuesday, June 21, 2011
  • 19. What a tweet “looks” like Tuesday, June 21, 2011
  • 20. Modeling the data from django.db import models class Tweet(models.Model): user = models.ForeignKey(User) message = models.CharField(max_length=140) date = models.DateTimeField(auto_now_add=True) parent = models.ForeignKey('self', null=True) class Relationship(models.Model): from_user = models.ForeignKey(User) to_user = models.ForeignKey(User) (Remember, bare bones!) Tuesday, June 21, 2011
  • 21. Public Timeline # public timeline SELECT * FROM tweets ORDER BY date DESC LIMIT 100; • Scales to the size of one physical machine • Heavy index, long tail • Easy to cache, invalidate Tuesday, June 21, 2011
  • 22. Following Timeline # tweets from people you follow SELECT t.* FROM tweets AS t JOIN relationships AS r ON r.to_user_id = t.user_id WHERE r.from_user_id = '1' ORDER BY t.date DESC LIMIT 100 • No vertical partitions • Heavy index, long tail • “Necessary evil” join • Easy to cache, expensive to invalidate Tuesday, June 21, 2011
  • 23. Materializing Views PUBLIC_TIMELINE = [] def on_tweet_creation(tweet): global PUBLIC_TIME PUBLIC_TIMELINE.insert(0, tweet) def get_latest_tweets(num=100): return PUBLIC_TIMELINE[:num] Disclaimer: don’t try this at home Tuesday, June 21, 2011
  • 24. Introducing Redis class PublicTimeline(object): def __init__(self): self.conn = Redis() self.key = 'timeline:public' def add(self, tweet): score = float(tweet.date.strftime('%s.%m')) self.conn.zadd(self.key, tweet.id, score) def remove(self, tweet): self.conn.zrem(self.key, tweet.id) def list(self, offset=0, limit=-1): tweet_ids = self.conn.zrevrange(self.key, offset, limit) return tweet_ids Tuesday, June 21, 2011
  • 25. Cleaning Up from datetime import datetime, timedelta class PublicTimeline(object): def truncate(self): # Remove entries older than 30 days d30 = datetime.now() - timedelta(days=30) score = float(d30.strftime('%s.%m')) self.conn.zremrangebyscore(self.key, d30, -1) Tuesday, June 21, 2011
  • 26. Scaling Redis from nydus.db import create_cluster class PublicTimeline(object): def __init__(self): # create a cluster of 9 dbs self.conn = create_cluster({ 'engine': 'nydus.db.backends.redis.Redis', 'router': 'nydus.db.routers.redis.PartitionRouter', 'hosts': dict((n, {'db': n}) for n in xrange(64)), }) Tuesday, June 21, 2011
  • 27. Nydus # create a cluster of Redis connections which # partition reads/writes by key (hash(key) % size) from nydus.db import create_cluster redis = create_cluster({ 'engine': 'nydus.db.backends.redis.Redis', 'router': 'nydus.db...redis.PartitionRouter', 'hosts': { 0: {'db': 0}, } }) # maps to a single node res = conn.incr('foo') assert res == 1 # executes on all nodes conn.flushdb() http://github.com/disqus/nydus Tuesday, June 21, 2011
  • 29. Looking at the Cluster sql-1-master sql-1-slave redis-1 DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7 DB8 DB9 Tuesday, June 21, 2011
  • 30. “Tomorrow’s” Cluster sql-1-master sql-1-users sql-1-tweets redis-1 DB0 DB1 DB2 DB3 DB4 redis-2 DB5 DB6 DB7 DB8 DB9 Tuesday, June 21, 2011
  • 32. In-Process Limitations def on_tweet_creation(tweet): # O(1) for public timeline PublicTimeline.add(tweet) # O(n) for users following author for user_id in tweet.user.followers.all(): FollowingTimeline.add(user_id, tweet) # O(1) for profile timeline (my tweets) ProfileTimeline.add(tweet.user_id, tweet) Tuesday, June 21, 2011
  • 33. In-Process Limitations (cont.) # O(n) for users following author # 7 MILLION writes for Ashton Kutcher for user_id in tweet.user.followers.all(): FollowingTimeline.add(user_id, tweet) Tuesday, June 21, 2011
  • 34. Introducing Celery #!/usr/bin/env python from setuptools import setup, find_packages setup(    install_requires=[     'Django==1.3',     'django-celery==2.2.4',    ], # ... ) Tuesday, June 21, 2011
  • 35. Introducing Celery (cont.) @task(exchange='tweet_creation') def on_tweet_creation(tweet_dict): # HACK: not the best idea tweet = Tweet() tweet.__dict__ = tweet_dict # O(n) for users following author for user_id in tweet.user.followers.all(): FollowingTimeline.add(user_id, tweet) on_tweet_creation.delay(tweet.__dict__) Tuesday, June 21, 2011
  • 36. Bringing It Together def home(request): "Shows the latest 100 tweets from your follow stream" if random.randint(0, 9) == 0: return render('fail_whale.html') ids = FollowingTimeline.list( user_id=request.user.id, limit=100, ) res = dict((str(t.id), t) for t in Tweet.objects.filter(id__in=ids)) tweets = [] for tweet_id in ids: if tweet_id not in res: continue tweets.append(res[tweet_id]) return render('home.html', {'tweets': tweets}) Tuesday, June 21, 2011
  • 37. Build an API Tuesday, June 21, 2011
  • 38. APIs • PublicTimeline.list • redis.zrange • Tweet.objects.all() • example.com/api/tweets/ Tuesday, June 21, 2011
  • 39. Refactoring def home(request): "Shows the latest 100 tweets from your follow stream" tweet_ids = FollowingTimeline.list( user_id=request.user.id, limit=100, ) def home(request): "Shows the latest 100 tweets from your follow stream" tweets = FollowingTimeline.list( user_id=request.user.id, limit=100, ) Tuesday, June 21, 2011
  • 40. Refactoring (cont.) from datetime import datetime, timedelta class PublicTimeline(object): def list(self, offset=0, limit=-1): ids = self.conn.zrevrange(self.key, offset, limit) cache = dict((t.id, t) for t in Tweet.objects.filter(id__in=ids)) return filter(None, (cache.get(i) for i in ids)) Tuesday, June 21, 2011
  • 41. Optimization in the API class PublicTimeline(object): def list(self, offset=0, limit=-1): ids = self.conn.zrevrange(self.list_key, offset, limit) # pull objects from a hash map (cache) in Redis cache = dict((i, self.conn.get(self.hash_key(i))) for i in ids) if not all(cache.itervalues()): # fetch missing from database missing = [i for i, c in cache.iteritems() if not c] m_cache = dict((str(t.id), t) for t in Tweet.objects.filter(id__in=missing)) # push missing back into cache cache.update(m_cache) for i, c in m_cache.iteritems(): self.conn.set(hash_key(i), c) # return only results that still exist return filter(None, (cache.get(i) for i in ids)) Tuesday, June 21, 2011
  • 42. Optimization in the API (cont.) def list(self, offset=0, limit=-1): ids = self.conn.zrevrange(self.list_key, offset, limit) # pull objects from a hash map (cache) in Redis cache = dict((i, self.conn.get(self.hash_key(i))) for i in ids) Store each object in it’s own key Tuesday, June 21, 2011
  • 43. Optimization in the API (cont.) if not all(cache.itervalues()): # fetch missing from database missing = [i for i, c in cache.iteritems() if not c] m_cache = dict((str(t.id), t) for t in Tweet.objects.filter(id__in=missing)) Hit the database for misses Tuesday, June 21, 2011
  • 44. Optimization in the API (cont.) Store misses back in the cache # push missing back into cache cache.update(m_cache) for i, c in m_cache.iteritems(): self.conn.set(hash_key(i), c) # return only results that still exist return filter(None, (cache.get(i) for i in ids)) Ignore database misses Tuesday, June 21, 2011
  • 45. (In)validate the Cache class PublicTimeline(object): def add(self, tweet): score = float(tweet.date.strftime('%s.%m')) # add the tweet into the object cache self.conn.set(self.make_key(tweet.id), tweet) # add the tweet to the materialized view self.conn.zadd(self.list_key, tweet.id, score) Tuesday, June 21, 2011
  • 46. (In)validate the Cache class PublicTimeline(object): def remove(self, tweet): # remove the tweet from the materialized view self.conn.zrem(self.key, tweet.id) # we COULD remove the tweet from the object cache self.conn.del(self.make_key(tweet.id)) Tuesday, June 21, 2011
  • 48. Reflection • Use a framework! • Start simple; grow naturally • Scale can lead to performance • Not the other way around • Consolidate entry points Tuesday, June 21, 2011
  • 49. Reflection (cont.) • 100 shards > 10; Rebalancing sucks • Use VMs • Push to caches, don’t pull • “Denormalize” counters, views • Queue everything Tuesday, June 21, 2011
  • 50. Food for Thought • Normalize object cache keys • Application triggers directly to queue • Rethink pagination • Build with future-sharding in mind Tuesday, June 21, 2011
  • 51. DISQUS Questions? psst, we’re hiring jobs@disqus.com Tuesday, June 21, 2011