Successfully reported this slideshow.
Your SlideShare is downloading. ×

Building Lanyrd

Loading in …3

Check these out next

1 of 65 Ad

More Related Content

Similar to Building Lanyrd (20)

More from Simon Willison (20)


Recently uploaded (20)

Building Lanyrd

  1. 1. Building Lanyrd Simon Willison BrightonPy, 9th August 2011
  2. 2. Definitive database of professional events and speakers
  3. 3. Definitive database Social event recommendation of professional events Comprehensive speaker profiles and speakers Archive of slides, notes and video
  4. 4. A brief history
  5. 5. Casablanca! August 2010
  6. 6. • Aug 31st, 11:22: Launch! (1 linode) • Aug 31st, 12:41: Unlaunch • Aug 31st, 12:54: Read only mode • Aug 31st, 14:15: DB server (2 linodes) • Sep 1st: Limit 50 on dashboard • Sep 1st: disable-dashboard setting
  7. 7. • Sep 3rd: dConstruct (and Twitter bot) • Sep 4th: TechCrunched (read only :( ) • Sep 5th: 3 large EC2 + 1 RDS • Sep 6th: Downgrade to 3 small EC2
  8. 8. December photo: @niqui
  9. 9. • Dec 8: Calacanis + Scoble at the same time! • Upgrade to next size of RDS • (Sometimes scaling vertically does the job)
  10. 10. • Jan 26th: Solr powered dashboard • Replicated to 2, then 3 servers
  11. 11. Load balancer (nginx) HTTP cache (varnish) Database (MySQL RDS) app server app server app server (django/mod_wsgi) (django/mod_wsgi) (django/mod_wsgi) search master search slave search slave Redis (data structures + (solr) (solr) (solr) message queue) logging worker worker (MongoDB) (celery) (celery)
  12. 12. Solr + Haystack
  13. 13. apache > lucene > solr Search the site with Solr Search Main Wiki Powered by Lucid Imagination Last Published: Sat, 04 Jun 2011 12:23:42 GMT About Welcome Who We Are Welcome to Solr Documentation PDF Resources What Is Solr? Related Projects Get Started News May 2011 - Solr 3.2 Released March 2011 - Solr 3.1 Released 25 June 2010 - Solr 1.4.1 Released 7 May 2010 - Apache Lucene Eurocon 2010 Coming to Prague May 18-21 10 November 2009 - Solr 1.4 Released 20 August 2009 - Solr's first book is published! 18 August 2009 - Lucene at US ApacheCon 09 February 2009 - Lucene at ApacheCon Europe 2009 in Amsterdam 19 December 2008 - Solr Logo Contest Results 03 October 2008 - Solr Logo Contest 15 September 2008 - Solr 1.3.0 Available 28 August 2008 - Lucene/Solr at ApacheCon New Orleans 03 September 2007 - Lucene at ApacheCon Atlanta 06 June 2007: Release 1.2 available 17 January 2007: Solr graduates from Incubator 22 December 2006: Release 1.1.0 available 15 August 2006: Solr at ApacheCon US 21 April 2006: Solr at ApacheCon 21 February 2006: nightly builds 17 January 2006: Solr Joins Apache Incubator What Is Solr?
  14. 14. More Like This Faceting Stored (non-indexed) fields Highlighting Spelling Suggestions Boost Find the needle you're looking for. Download Documentation Search doesn't have to be hard. Haystack lets you write your search code Sprinting to 1.1-final Posted on 2010/11/16 by Daniel once and choose the search engine you want it to run on. With a familiar API Though this site has sat out of that should make any Djangonaut feel right at home and an architecture that date, there has been a lot of work put into Haystack 1.1. As allows you to swap things in and out as you need to, it's how search ought of writing, there are eight issues to be. blocking the release. I aim to have those down to zero by the end of the week. Haystack is BSD licensed , plays nicely with third-party app without needing to modify the source and supports Solr , Whoosh and Xapian . Once those eight are done, I will be releasing 1.1-final. The RC process really didn't do much Get started last time and this release has been a long time in coming. This 1. Get the most recent source. release will feature: 2. Add haystack to your INSTALLED_APPS. 3. Create files for your models. Vastly improved faceting 4. Setup the main SearchIndex via autodiscover. Whoosh 1.X support! 5. Include haystack.urls to your URLconf. Document & field boost 6. Search! support
  15. 15. Model-oriented search • Define (like for your application • Hook up default haystack search views • Write a quick search.html template • Run ./ rebuild_index
  16. 16. add a conference you are signed in as simonw, do you want to sign out? calendar conferences coverage profile search Search We found 3 results for “django” FILTER BY django Search type Sessions 3 Your current filters are… TYPE: Sessions TOPIC: NoSQL PLACE: United States Clear all filters FILTER BY topic NoSQL and Django Panel EVENT DjangoCon US 2010 NoSQL 3 TIME 9th September 2010 09:00-10:00 SPEAKERS Jacob Burch Django 2 Cassandra 1 Step Away From That Database EVENT DjangoCon US 2010 TIME 8th September 2010 11:20-12:00 FILTER BY SPEAKERS Andrew Godwin place Apache Cassandra in Action United States 3 EVENT Strata 2011 Multnomah 2 TIME 1st February 2011 13:30-17:00 Oregon 2 SPEAKERS Jonathan Ellis Portland 2 Santa Clara 1 California 1
  17. 17. class BookIndex(indexes.SearchIndex): text = indexes.CharField(document=True, use_template=True) speakers = indexes.MultiValueField() topics = indexes.MultiValueField() def prepare_speakers(self, obj): return [a.user.t_id for a in obj.authors.exclude( user = None ).select_related('user')] def prepare_topics(self, obj): return list(obj.topics.values_list('pk', flat=True))
  18. 18. search/indexes/books/ book_text.txt {{ object.title }} {{ object.tagline }} {% for author in object.authors.all %} {{ author.display_name }} {{ author.user.t_screen_name }} {% endfor %} {% for topic in object.topics.all %} {{ topic.name_en }} {% endfor %}
  19. 19. Staying fresh • Search engines usually don’t like accepting writes too frequently • RealTimeSearchIndex for low traffic sites • ./ update_index --age=6 (hours) • Uses index.get_updated_field() • Roll your own (message queue or similar...)
  20. 20. Replication Solr Master Solr Slave Solr Slave Solr Slave
  21. 21. Smarter indexing class Article(models.Model): needs_indexing = models.BooleanField( default = True, db_index = True ) ... def save(self, *args, **kwargs): self.needs_indexing = True super(Article, self).save(*args, **kwargs)
  22. 22. index = site.get_index(model) updated_pks = [] objects = index.load_all_queryset().filter( needs_indexing=True )[:100] if not objects: return for object in objects: updated_pks.append( index.update_object(object) index.load_all_queryset().filter( pk__in = updated_pks ).update(needs_indexing = False)
  23. 23. nginx + Solr replication trick upstream solrmaster { server { server; listen 8983; } location /solr/update { upstream solrslaves { proxy_pass http://solrmaster; server; } server; location /solr/select { server; proxy_pass http://solrslaves; } } }
  24. 24. add a conference you are signed in as simonw, do you want to sign out? calendar conferences coverage profile search Your contacts' calendar yours 24 contacts 182 Simon We've found 182 conferences your Twitter contacts are Willison interested in. Your profile page TODAY Café Scientifique: Exploring Attend 21 the dark side of star Track formation with the Herschel From our blog Space Observatory Welcoming Sophie United Kingdom / Brighton Barrett to team 21st June 2011 Lanyrd Astronomy Science Today we have a very special announcement (and for once, 4 contacts tracking it's not a new feature!) We would like to welcome the super-wonderful Sophie Barrett to the Lanyrd team. 21 Usability Professionals' Attend Session schedules in Association – International Track your calendar Conference You can now subscribe to event schedules in your calendar of United States / Atlanta choice. Stay up to date at the 21st–24th June 2011 event with the schedule in the Usability User Experience pocket where you need it. 1 contact speaking and 3 contacts tracking Venues (and venue maps)
  25. 25. # Original implementation twitter_ids = [11134, 223455, 33221, ...] # fetch from Twitter attendees = Attendee.objects.filter( user__t_id__in = twitter_ids ).filter( conference__start_date__gte = )
  26. 26. # Current implementation twitter_ids = [11134, 223455, 33221, ...] # fetch from Twitter sqs = SearchQuerySet() sqs = sqs.models(Conference) or_string = ' OR '.join(twitter_ids) sqs = sqs.narrow('attendees:(%s)' % or_string)
  27. 27. Redis
  28. 28. Commands Clients Documentation Community Download Issues Redis is an open source, advanced key-value store. It is often What people are saying referred to as a data structure server since keys can contain Comparison of CouchDB, Redis, MongoDB, Casandra, Neo4J & strings, hashes, lists, sets and sorted sets. strings hashes lists sets others via @DZone Learn more → @__NeverGiveup Oh YAY, oui tu me redis ! *-* Hm, on s'rejoint à Try it Download it 14h au bahut ? :o Ready for a test drive? Check this interactive Redis 2.2.10 is the latest stable version. JE L REDIS JE FOLLOW BACK SUR @Fuckement_TL tutorial that will walk you through the most Interested in legacy or unstable versions? important features of Redis. Check the downloads page. une question : "How to use ServiceStack Redis in a web application to take advantage of pub / sub paradigm" #redis #web Nice - Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison from @kkovacs More... Sponsored by This website is open source software developed by Citrusbyte. The Redis logo was designed by Carlos Prioglio.
  29. 29. simonw-follows:{144,21345,12328...} europython-attendees:{344,21345,787...} contact_ids = redis.sinter( 'simonw-follows', 'europython-attendees' )
  30. 30. add a conference you are signed in as simonw, do you want to sign out? calendar conferences coverage profile search EuroPython 2011 You're speaking The European Python Conference AT THIS EVENT 19 –26 JUNE 2011 Florence in Italy 97 attending @europython PEOPLE View the schedule on Lanyrd #europython 80 tracking PEOPLE Save to iCal / iPhone / Outlook / (short URL) GCal TELL YOUR FRIENDS! Tweet about this event 119 speakers Andreas Alan Anna Schreiber Franzoni Ravenscroft Topics @onyame @franzeur Django Andrew Alessandro Anselm Kruis Godwin Dentella Plone @andrewgodwin Pyramid Andrii Alex Martelli Antonio Cuni @antocuni Python Mishkovskyi @mishok13 Twisted Ali Afshar Armin Rigo Armin Edit topics
  31. 31. Celery
  32. 32. Home Download Community Documentation Code Background Processing Distributed Asynchronous/Synchronous Concurrency Background Processing Distributed Periodic Tasks Retries Asynchronous/Synchronous Concurrency Periodic Tasks Retries Distributed Task Queue Celery 2.2 released! By @asksol on 2011-02-01. Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports A great number of new features, scheduling as well. including Jython, eventlet and gevent support. Everything is detailed in the The execution units, called tasks, are executed concurrently on a single Changelog, which you should have read or more worker servers using multiprocessing, Eventlet, or gevent. before upgrading. Tasks can execute asynchronously (in the background) or synchronously (wait until ready). Users of Django must also upgrade to django-celery 2.2. Celery is used in production systems to process millions of tasks a day. This release would not have been Celery is written in Python, but the protocol can be implemented in possible without the help of any language. It can also operate with other languages using contributors and users, so thank you, webhooks. and congratulations! The recommended message broker is RabbitMQ, but limited support for Redis, Beanstalk, MongoDB, CouchDB, and databases (using Celery 2.1.1 bugfix SQLAlchemy or the Django ORM) is also available. release By @asksol on 2010-10-14. Celery is easy to integrate with Django, Pylons and Flask, using the All users are urged to upgrade. For a list django-celery, celery-pylons and Flask-Celery add-on packages. of changes see the Changelog. Example Users of Django must also upgrade to This is a simple task adding two numbers: django-celery 2.1.1.
  33. 33. Tasks? • Anything that takes more than about 200ms • Updating a search index • Resizing images • Hitting external APIs • Generating reports
  34. 34. Trivial example • Fetch the content of a web page from celery.task import task @task def fetch_url(url): return urllib.urlopen(url).read() >>> result = fetch_url.delay(‘’) >>> html = result.wait()
  35. 35. Python and MongoDB EuroPython 2011 Italy / Florence tutorial 19th–26th June 2011 TELL YOUR FRIENDS! Tweet about this A session at EuroPython 2011 session Andreas Jung WHEN CEO, ZOPYX Ltd Time 14:30–18:30 CET Date 20th June 2011 MongoDB is the new star of the so-called NoSQL databases. Using Python with MongoDB is the next logical step after having used SESSION HASH TAG Python for years with relational databases. #sftzh This talk will give an introduction into MongoDB and demonstrate SHORT URL how MongoDB can be be used from Python. More information can be found under: OFFICIAL SESSION PAGE View the schedule More sessions at EuroPython 2011 on Python Topics MongoDB Add coverage to this session Python Edit topics A URL to coverage such as videos, slides, podcasts, handouts, sketchnotes, photos etc. SCHEDULE INCOMPLETE? Add Add another session
  36. 36. Add coverage dbtrainingeurop... Link title Python and MongoDB tutorial Python mongo db-training-europython-2011 EuroPython 2011 Italy / Florence 19th–26th June 2011 Type of coverage Link Audio Liveblog Write-up Sketch notes Photos Slides Transcript Notes Video Handout Coverage preview From SlideShare:
  37. 37. The task itself... • Tries using to find a preview • Fetches the HTTP headers and first 2048 bytes • If HTML, attempts to extract the <title> • If other, gets the file type and size from headers
  38. 38. Behind the scenes... ar = enhance_link.delay(url) poll_url = '/working/%s/' % signed.dumps({ 'task_id': ar.task_id, 'on_done_url': on_done_url, }) if 'ajax' in request.POST: return render_json(request, { 'ok': True, 'poll_url': poll_url, }) else: return HttpResponseRedirect(poll_url)
  39. 39. And when it’s done... from celery.backends import default_backend ... task_id = request.REQUEST.get('id', '') result = default_backend.get_result(task_id)
  40. 40. Configuration # Carrot / Celery: queue uses Redis CARROT_BACKEND = "ghettoq.taproot.Redis" BROKER_HOST = "" # redis server BROKER_PORT = 6379 BROKER_VHOST = "6" # Task results stored in memcached, so they can # expire automatically CELERY_RESULT_BACKEND = "cache" CELERY_CACHE_BACKEND = "memcached://;..."
  41. 41. Tricks
  42. 42. Phantom load testing • Deploy a new architecture on a brand new EC2 cluster • Leave your existing site on the old cluster • Invisibly link to the new stack from an <img width=1 height=1> element on your live site (not for very long though) • (sensible alternative: find a way to replay log files)
  43. 43. cache_version
  44. 44. add a conference you are signed in as simonw, do you want to sign out? calendar conferences coverage profile search Django conferences Django Django events looking for participants coverage 1 Django event is looking for participants 52 videos Most recent added 3 weeks ago ON NOW EuroPython 2011 52 slide decks 19 Most recent added 4 Italy / Florence hours ago 19th–26th June 2011 Django Plone Pyramid Python Twisted 3 audio clips Most recent added 1 week ago 27 write-ups SEPTEMBER DjangoCon US 2011 6 Most recent added 1 2011 United States / Portland 6th–8th September 2011 week ago 11 handouts Django Open Source Python Most recent added 18 hours ago 17 PyCON FR 2011 3 notes France / Rennes Most recent added 10 17th–18th September 2011 hours ago Django Python By country OCTOBER PyCon DE 2011 Ireland 1 4
  45. 45. class Conference(models.Model): ... cache_version = models.IntegerField(default = 0) def save(self, *args, **kwargs): self.cache_version += 1 super(Conference, self).save(*args, **kwargs) def touch(self): Conference.objects.filter(pk = cache_version = F('cache_version') + 1 )
  46. 46. {% cache 36000 conf-topics conference.cache_version %} <ul class="tags inline-tags meta"> {% for topic in conference.topics.all %} <li><a href="{{ topic.get_absolute_url }}">{{ topic }}</a></li> {% endfor %} </ul> {% endcache %}
  47. 47. Bulk invalidation from django.models import F topic.conferences.all().update( cache_version = F('cache_version') + 1 )
  48. 48. Signing
  49. 49. Pass data through an untrusted source with confidence that it hasn't been tampered with
  50. 50. Signing uses • "Unsubscribe" links in emails • ?redirect_to=URL protection Signed cookies "You are logged in as simonw" without hitting the database
  51. 51. Signing in Django 1.4 from django.core import signing signing.dumps({"foo": "bar"}) signing.loads(signed_string) response.set_signed_cookie(key, value...) response.get_signed_cookie(key)
  52. 52. Hashed static asset filenames in S3/CloudFront
  53. 53. global.js global.ed81d119.js
  54. 54. Benefits • Far futures expiry headers • Cache-Control: max-age=315360000 • Expires: Fri, 18 Jun 2021 06:45:00 -0000 GMT • Guaranteed updated CSS in IE • Deploy new assets in advance of application • Old versions stick around for rollbacks
  55. 55. ./ push_static • Minifies JavaScript and CSS • Renames files to include sha1(contents)[:6] • Pushes all assets to S3
  56. 56. Profiling and debugging production systems
  57. 57. UserBasedExceptionMiddleware from django.views.debug import technical_500_response import sys class UserBasedExceptionMiddleware(object): def process_exception(self, request, exception): if request.user.is_superuser: return technical_500_response(request, *sys.exc_info())
  58. 58. mysql-proxy • Very handy lua-customisable proxy for all of your MySQL traffic • Worst documented software ever • log.lua - logs out ALL queries •
  59. 59. django_instrumented • (Unreleased) code I wrote for Lanyrd • Collects various runtime stats about the current request, stashes a profile JSON in memcached • Writes out the profile UUID as part of the HTML • A bookmarklet to view the profile
  60. 60. mongodb logging • Super-fast inserts, log everything! • Capped collections • Structured queries • Ask me about it in a few months
  61. 61. For the future... • Much better profiling, monitoring and alerts • Varnish in front of everything • Replicated MySQL for analytics + upgrades
  62. 62. Questions?
  63. 63. Thank you!