tutorial
Upcoming SlideShare
Loading in...5
×
 

tutorial

on

  • 2,960 views

 

Statistics

Views

Total Views
2,960
Views on SlideShare
2,957
Embed Views
3

Actions

Likes
0
Downloads
23
Comments
0

1 Embed 3

http://www.slideshare.net 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • scripted language code often dominated by behavior of script interpreter. No free lunch: load classes, etc. to handle framework stuff embedding code: presentation-driven thinking in contrast to MVC
  • SOA: XML, Atom/RSS, etc. for data exchange, REST/CRUD for state management Everyone wins, because (eg) you’ll never do maps better than Google Exposure: facebook app get lots of users
  • FB plugin: FBML, FBQL
  • The airport's computerized baggage system, which was supposed to reduce flight delays, shorten waiting times at luggage carousels, and save airlines in labor costs, turned into an unmitigated failure. An opening originally scheduled for October 31, 1993 with a single system for all three concourses turned into a February 28, 1995 opening with separate systems for each concourse, with varying degrees of automation. The system's $186 million in original construction costs grew by $1 million per day during months of modifications and repairs. Incoming flights on the airport's B Concourse made very limited use of the system, and only United, DIA's dominant airline, used it for outgoing flights. The 40-year-old company responsible for the design of the automated system, BAE Automated Systems of Carrollton, Texas , at one time responsible for 90% of the baggage systems in the U.S., was acquired in 2002 by G&T Conveyor Company, Inc. [13] The automated baggage system never worked well, and in August 2005, it became public knowledge that United would abandon the system, a decision that would save them $1 million per month in maintenance costs. From Wikipedia
  • 1 st step is thinking of “code you wish you had”, assuming methods that if existed would make it a perfect match to the user story. Rather than inside – out (start with building blocks and compose to provide desired functionality), go from users outside-in, to reduce wasted coding; e.g., until get to user view, won’t know what you really need. Espcially Web apps, since easy to see what the user is doing. Both cycles involve taking small steps and listening to the feedback you get from the tools. We start with a failing step (red) in Cucumber (the outer cycle). To get that step to pass, we’ll drop down to RSpec (the inner cycle) and drive out the underlying code at a granular level (red/green/refactor). At each green point in the RSpec cycle, we’ll check the Cucumber cycle. If it is still red, the resulting feedback should guide us to the next action in the RSpec cycle. If it is green, we can jump out to Cucumber, refactor if appropriate, and then repeat the cycle by writing a new failing Cucumber step.
  • web servers: why separate asset server from dyn content? no need to involve appserver at all; can separtely do caching, SSL, etc. caches: what’s cacheable? how avoid staleness? synergy with asset servers replicated db: when easy? (“sharding”) when hard? (graph) “Scale makes availability affordable” what’s different for programmer: partial failures data replication/consistency How much scale: peak vs avg workload; idea of thinking in %iles. What metrics would you care about?
  • ie, understand how “magic” works in Rails, vs just use
  • why not cache sync across datacenters? why not sync (vs. replication) across datacenters? replication is master-slave (only 1 datacenter accepts writes); what problem arises and how is it fixed?
  • focus on myspace scaling story
  • Classification with feature selection we could add more metrics: variance, slope fits linear model from metrics to binary class Advantages - fast even for large number of metrics
  • Is this an app or enabling technology?

tutorial tutorial Presentation Transcript

  • Web 2.0 Applications EuroSys 2010 Tutorial Armando Fox UC Berkeley Reliable Adaptive Distributed Systems Lab [email_address]
  • Who I Am
    • Adjunct Prof. at UC Berkeley Computer Science
    • Research
      • 2006-now: applying machine learning to problems of datacenter-scale applications
      • 2001-2006: Recovery-Oriented Computing (ROC)
      • 1996-2000: Mobile computing meets SaaS
    • Teaching: undergraduate Software-as-a-Service/Software Engineering
    • Developer & maintainer of active Web app
    • Know just enough about languages to be dangerous
  • Where I Work: RAD Lab 5-year mission
    • Enable 1 person to develop, deploy, and operate next-generation Internet application at scale
    • Key enabling technology: Statistical machine learning
      • management, scaling, anomaly detection, performance prediction...
    • interdisciplinary: 7 faculty, ~30 PhD’s, ~6 ugrads, ~1 sysadm
    • Engagement with industrial affiliates keeps us honest
  • Goals & Non-Goals
    • Goals
      • New Web 2.0 features, technologies, challenges
      • Web 2.0 & Software Engineering Education
      • Server-centric view, though client highly nontrivial
      • Assumption: basic familiarity with Web 1.0
    • Non-goals
      • Plug our own research (you can read it elsewhere)
      • Teach you to code (plenty of good frameworks, docs)
      • Instead, know the landscape & where to go next
    • Disclaimers
      • My views are mine alone, etc.
      • Specific tools mentioned for sake of example only
  • Key Messages
    • Social Computing & Rich UI’s
    • DADO teams (develop, assess, deploy, operate) vs. waterfall
    • Agile, Behavior-Driven Development vs. Big Design Up Front
    • High-productivity tools, languages, frameworks: undergrads deploy ready-to-use apps in ~weeks
    • Cloud computing is a game changer for Web education, research, & business
  • Outline of topics
    • Web 1.0 review & what’s new in 2.0
    • Web 2.0 application frameworks
    • Service-oriented architecture
    • DADO, a new view of software development
    • Deployment
    • Education
    • Research Challenges
  • WEB 1.0 REVIEW & WHAT’S NEW IN 2.0
  • Software-as-a-Service (SaaS) Evolution
    • (Dates are approximate...)
    • 1990: Web 0.9 (physicists using NCSA Mosaic)
    • 1995: Web 1.0 (static & some dynamic content, e-commerce, Netscape)
    • 1997: "Content is King" => "Services are King" (email, search engines, photo sharing...)
    • 2000: Web 2.0 (rich UI's, social computing)
    • 2004: SaaS & SOA (Service Oriented Architectures) (Google Maps, Amazon S3...)
    • 2008: Cloud Computing (pay as you go)
  • The Web is a Client-Server, Request-Reply Architecture
    • HTTP (Hypertext Transfer Protocol), ASCII-based request/reply protocol that runs over TCP
      • HTTPS: variant that first establishes symmetrically-encrypted channel via public-key handshake, so suitable for sensitive info
    • By convention, servers listen on TCP port 80 (HTTP) or 443 (HTTPS)
    • Universal Resource Identifier (URI) format: scheme , host , port , resource , parameters, fragment
    • http :// search.com : 80 / img/search/file ? t=banana&client=firefox #p2
    Web browser Web server A series of tubes DNS server 1. 2.
  • A Conversation With a Web Server
    • GET /index.html HTTP/1.0
    • User-Agent: Mozilla/4.73 [en] (X11; U; Linux 2.0.35 i686 )
    • Host: www.yahoo.com
    • Accept : image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
    • Accept-Language: en
    • Accept-Charset: iso-8859-1,*,utf-8
    • Server replies:
    • HTTP/1.0 200 OK
    • Content-Length: 16018
    • Set-Cookie: B=2vsconq5p0h2n
    • Content-Type: text/html
    • <html><head><title>Yahoo!</title><base href=http://www.yahoo.com/> …etc.
    • Repeat for embedded content (images, stylesheets, scripts...) <img width=230 height=33 src=&quot; http://us.a1.yimg.com/us.yimg.com/a/an/anchor/icons2.gif &quot;>
    HTTP method & URI Cookie data: up to 4KiB MIME content type
  • Cookies
    • On first visit to a server, browser may receive a cookie from server in HTTP header
      • Data is arbitrary (up to 4KB long)
      • typically opaque, interpretation is up to the server
      • usually HMAC’d or encrypted, since client untrusted
    • Browser automatically passes appropriate cookie back to server on each request
      • Server may update cookie value with any response
      • Thus can synthesize concept of “session” using this
    • Many, many uses
      • track user’s ID (canonical use: authentication)
      • track session state (up to 4KB) or a handle to it
      • before cookies, “fat URL’s” used for this in Web 1.0
  • XML (eXtensible Markup Language)
    • <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
    • <book year=&quot;1967&quot;>
    • <title>The politics of experience</title>
    • <author>
    • <firstname>Ronald</firstname>
    • <lastname>Laing</lastname>
    • </author>
    • </book>
    • Really a metalanguage for describing hierarchical, semistructured, schema-less data
    • XML Document Type Definition (DTD) specifies structural & content constraints on a particular document type
  • XML (eXtensible Markup Language)
    • <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
    • <book year=&quot;1967&quot;>
    • <title>The politics of experience</title>
    • <author>
    • <firstname>Ronald</firstname>
    • <lastname>Laing</lastname>
    • </author>
    • </book>
    • Really a metalanguage for describing hierarchical, semistructured, schema-less data
    • XML Document Type Definition (DTD) specifies structural & content constraints on a particular document type
    Element Value Attribute Element
  • HTML, XHTML & Beyond
    • XHTML: a document conforming to a particular DTD describing a hierarchical collection of HTML elements
      • Variants: Strict, loose, transitional (for compatibility with deterioriating HTML syntax 1990-95)
    • <! DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Strict //EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1- strict .dtd&quot; >
      • inline (headings, tables, lists...)
      • embedded (images, video, Java applets, JavaScript code...)
      • fill-in forms—text, radio/check buttons, dropdown menus..., marshaling arguments into either URI or request body
    • CSS (Cascading Stylesheets) for presentation
      • Strict XHML forbids presentational markup
      • Idea: complete separation of appearance from structure
  • Selectors identify specific tag(s)
    • <link rel=&quot;stylesheet&quot; href=&quot;mystyles.css&quot;/>
    • < div class=&quot;pageFrame&quot; id=&quot;pageHead&quot; > <h1> Welcome, <span id=&quot;custName&quot;>Armando</span> </h1> </div>
    • tag name: h1
    • class name: .pageFrame
    • element ID: #pageHead
    • tag name & class: div.pageFrame
    • tag name & id: span#custName
    • descendant relationship: .pageFrame h1, div h1
    • descendant relationship: div .custName
    • child relationship: div > .custName
    both of these match the outer div above
  • CSS Styles apply visual styling based on selectors
    • <link rel=&quot;stylesheet&quot; href=&quot; mystyles.css &quot;/>
    • < div class=&quot;pageFrame&quot; id=&quot;pageHead&quot;> < h1 > Welcome, < span id=&quot;custName&quot; >Armando</span> </h1> </div>
    • In mystyles.css ( static asset with MIME type text/css ):
    • div.pageFrame { background-image: url('/banner.gif'); }
    • h1 { font-size: large; float: left; }
    • #custName :hover { background-color: yellow; font-weight: bold; }
    • Style properties include borders, background images, and layout directives (floating, absolute positioning, min/max scaled sizes, etc.)
    • Changing style properties has side effect of re-rendering
    • e.g., change display or visibility property to show/hide elements
  • Dynamic content generation
    • Most Web 1.0 (e-commerce) sites actually run a program that generates the output
    • Originally: templates with embedded code “snippets”
    • Eventually, embedded code became “tail that wagged the dog” and moved out of the Web server
    • Languages/frameworks evolved to capture common tasks
      • Perl, PHP, Python, Ruby on Rails, ASP, ASP.NET, Java Servlet Pages, Java Beans/J2EE, ...
  • SaaS 3-tier architecture
    • Common gateway interface (cgi): allows Web server to run a program
      • Server maps some URI’s to application names
      • App is run, gets handed complete HTTP request including headers
    • “ Arguments” embedded in URL with “&” syntax or sent as request body (with POST)
      • http://www.foo.com/search ?term=white%20rabbit&show=10&page=1
    • App generates entire response
      • content (HTML? an image? some javascript?)
      • HTTP headers & response code
    • Plug-in modules for Web servers allow long-running CGI programs & link to language interpreters
    • Various frameworks have evolved to capture this common structure
    HTTP server application persistent storage app server storage
  • 3 Tier Deployment
    • HTTP server (“web server”)
      • “ fat” (e.g. Apache): support virtual hosts, plugins for multiple languages, URL rewriting, reverse proxying, ....
      • “ thin” ( nginx, thin, Tomcat, ...): bare-bones machinery to support one language/framework; no frills
    • App server
      • separate server process, front-ended by a “thin” HTTP server
      • or linked to an Apache worker via FastCGI or web server plug-in: mod_perl, mod_php, mod_rails, ...
      • Apache can spawn/quiesce/reap independent processes
    • Persistent storage
      • most commonly RDBMS (MySQL, PostgreSQL, etc.)
      • communicate w/app via proprietary or standardized database “connector” (ODBC, JDBC, ...)
    • Hence LAMP: Linux, Apache, MySQL, PHP/Perl
    HTTP server application persistent storage app server storage
  • Frameworks
    • Support for more languages: Apache modules (mod_perl, mod_php, mod_rails ...)
      • avoid spawning new process per request
      • typically embed language interpreter in Apache
    • Support for common idioms like sessions
      • Cookie management
      • virtualize connection to database
      • “dispatcher” interactions with front-end HTTP server
    • Early “templating systems” (e.g. PHP) vs. modern “full stack frameworks” (e.g. Rails)
  • Example: Rails, a Ruby-based Model/View/Controller Framework apache your app CGI or other dispatching Relational Database mysql or sqlite3 Ruby interp. firefox Model , View , Controller Subclasses of ActiveRecord::Base Subclasses of ActionView Subclasses of ApplicationController
    • Implemented almost entirely in Ruby
    • Distributed as a Ruby “gem” (collection of related libraries & tools)
    • Connectors for most popular databases
    tables models /*.rb controllers/*.rb Rails routing views/*.html.erb Rails rendering
  • A trip through a Rails app
    • Declarative routes map URL’s to actions (methods in a class) and unmarshal parameters from URL or form
    • Actions can set variables that are visible to views
    • Every controller action eventually renders something
      • HTML page: view template with variables expanded
      • Response to AJAX request
      • Error page
    http://.../ foo / my_action ? x=Howdy routes.rb app/controllers/ foo _controller.rb def my_action @var = params[:x] end app/views/ foo / my_action .html.erb <p> Hey, <%= @var %> </p>
  • ActiveRecord, an object-relational mapping layer
    • class User < ActiveRecord::Base
      • table name inferred from class name
      • columns introspected from database
      • example of convention over configuration
    • # To find by column values:
    • armando = User .find_ by_name ('fox')
    • armando = User .find_ by_name_and_birthdate ('fox', Date.parse('May 12, 1968'))
    • armando. birthdate = Date.parse('June 6, 1969')
    • armando. save!
    • # To find only a few, and sort by an attribute
    • old_guys = User .find(:all, :conditions => [&quot;birthdate < ?&quot;, Date.parse(&quot;1/1/80&quot;)],
    • :order => &quot;birthdate&quot;)
    Protect from SQL injection attacks users id* name birthdate
  • ActiveRecord Associations SELECT * FROM users u JOIN pics p ON u.id = p.user_id; class User < ActiveRecord::Base has_many :pics end class Pic < ActiveRecord::Base belongs_to :user end thisuser. pics << Pic.new(...) thisuser.pics.sort { |p| p. user.birthdate } users id* name description pics id* user_id** filename
  • Multiple joins
    • user has_many :groups, :through=>:memberships
    • group has_many :users,:through=>:memberships
    • membership belongs_to :user, belongs_to :group
    • Can now write user.groups , group.users , etc.
    • Separates relationships from storage schema
    memberships user_id** group_id** status groups id* name topic users id* name description
  • Rails & Security
    • Application-based attacks on Web 2.0 apps
      • SQL injection (defense: sanitize untrusted user input)
      • Cross-site request forgery, cross-site scripting (defense: include session authentication token)
      • Good frameworks help protect against these
    • Infrastructure-based attacks (DDoS, etc.)
      • Your deployment provider matters (more on this later)
  • What’s new in Web 2.0?
    • Primitive UI => Rich UI
      • enable “desktop-like” interactive Web apps
      • enable browser as universal app platform on cell phones
    • “ Mass customize” to consumer => Social computing
      • tagging (Digg), collaborative filtering (Amazon reviews), etc. => primary value from users & their social networks
      • write-heavy workloads (Web 1.0 was read-mostly)
      • lots of short writes with hard-to-capture locality (hard to shard)
    • Libraries => Service-oriented architecture
      • Integrate power of other sites with your own (e.g. mashups that exploit Google Maps; Google Checkout shopping cart/payment)
      • Pay-as-you-go democratization of “services are king”
      • Focus on your core innovation
    • Buy & rack => Pay-as-you-go Cloud Computing
  • Rich Internet Apps (RIAs)
    • Closing gap between desktop & Web
      • Highly responsive UI’s that don’t require server roundtrip per-action
      • More flexible drawing/rendering facilities (e.g. sprite-based animation)
      • Implies sophisticated client-side programmability
      • Local storage, so can function when disconnected
    • early example: Google Docs + Google Gears
      • include offline support, local storage, support for video, support for arbitrary drawing, ...
    • currently many technologies—Google Gears, Flash, MS Silverlight...
      • client interpreter must be embedded in browser (plugin, extension, etc.)
      • typically has access to low-level browser state => new security issues
      • N choices for framework * M browsers = N*M security headaches
    • proposed HTML5 may obsolete some of these
  • Rich UI with AJAX (Asynchronous Javascript and XML)
    • Web 1.0 GUI: click  page reload
    • Web 2.0: click  page can update in place
      • also timer-based interactions, drag-and-drop, animations, etc.
    • How is this done?
      • Document Object Model (c.1998, W3C) represents document as a hierarchy of elements
      • JavaScript (c.1995; now ECMAScript) makes DOM available programmatically, allowing modification of page elements after page loaded
      • XMLHttpRequest (c.2000) allows async HTTP transactions decoupled from page reload
      • JavaScript libraries (jQuery, Prototype, script.aculo.us) encapsulate useful abstractions
  • DOM & JavaScript: Document = tree of objects
    • hierarchical object model representing HTML or XML doc
    • Exposed to JavaScript interpreter
      • Inspect DOM element value/attribs
      • Change value/attribs  redisplay or fetch new content from server
    • Every element can be given a unique ID
    • JavaScript code can walk the DOM tree or select specific nodes via provided methods
    <input type=&quot;text&quot; name=&quot;phone_number&quot; id=&quot;phone_number&quot; /> < script type=&quot;text/javascript&quot;> var phone = document.getElementById('phone_number') ; phone.value='555-1212'; phone.disabled=true; document. images[0].src =&quot;http://.../some_other_image.jpg&quot;; < /script >
  • JavaScript
    • A browser-embedded scripting language
      • OOP: classes, objects, first-class functions, closures
      • dynamic: dynamic types, code generation at runtime
      • JS code can be embedded inline into document... <script type=&quot;text/javascript&quot;> <!-- # protect older browsers
      • calculate = function() { ... } // -->
      • </script>
      • ...or referenced remotely: <script src=&quot;http://evil.com/Pwn.js&quot;/>
    • Current page DOM available via window, document objects
      • Handlers (callbacks) for UI & timer events can be attached to JS code, either inline or by function name: onClick, onMouseOver,...
      • Changing attributes/values of DOM elements has side-effects, e.g.: <a href=&quot;#&quot; onClick=&quot;this. innerHTML ='Presto!'&quot;>Click me</a>
  • AJAX == Asynchronous Javascript And Xml
    • Recipe:
      • attach JS handlers to events on DOM objects
      • in handler, inspect/modify DOM elements and optionally do asynchronous HTTP request to server
      • register callback to receive server response
      • response callback modified DOM using server-provided info
    • JavaScript as a target language
      • Google Web Toolkit (GWT): compile Java => emit JS
      • Rails: runtime code generation ties app abstractions to JS
  • JavaScript example for AJAX
    • r=XmlHttpRequest.new
    • r.open(&quot;GET&quot;,&quot;http://www.example.com&quot;,true)
      • last arg true means script should not block (important!)
    • r.send(request_content) # eg, form fields
    • Callbacks during XHR processing
      • r.onReadyStateChange=function(XmlHttpRequest req) { ... }
      • inspect req.readyState  uninitialized,open, sent,receiving,loaded
    • req.status contains HTTP status of response
    • req.responseText contains response content
    • Libraries like JQuery and Prototype abstract this and provide some cross-browser support
  • Example: AJAX via Rails
    • Embedded Ruby code in HTML template:
    • link_to_remote('Show article', :update => 'article_content', :url => {:action =>'get_article_text',:id =>article}, :before => &quot;Element.show('spinner')&quot;, :loading => &quot;Element.hide('spinner'); Element.show('stopwatch')&quot; , :success => &quot;Element.hide('stopwatch')&quot;, 404 => alert(&quot;Article text not found!&quot;) , :failure => alert(&quot;Some other error&quot;) )
    • Delivered page contains JS that embeds calls to Prototype, defines and dispatches to callback handlers, etc.
    • Simple auto-completion handler:
    • observe_field('student[last_name]',
    • :url => {:controller=>'students', :action=>'lookup_by_lastname'}, :update=>'lastname_completions')
  • Sidebar: It’s Tough Being a Browser
    • Users now expect “Web apps” to include animation, sound, 3D graphics, disconnection, responsive GUI...
      • Browser =~ new OS: manage mutually-untrusting apps (sites)
    Source: Robert O’Callahan (Mozilla.org), Inside Firefox
  • Social Computing
    • Web 1.0: add value via mass customization
      • select content/presentation for you based on best guesses about your interests
      • resource: demographic/analytic data about you
    • Web 2.0: add value via connecting to social network
      • vendor: your friends’ interests are a good indicator of your interests
      • user: value added to existing content == how your friends interact with it
      • resource: your social network
    • From social networking site to social network as a way of structuring applications
  • Social Computing
    • Amount of content “created” by each user small!
      • e.g., Digg article, rate video, play a Facebook game
    • but still creates lots of short random writes
      • consider “Like” feature on Facebook
      • social graphs naturally hard to partition (though would love to see a paper about this from FB)
    • question for Web 2.0 developers is not whether social computing is part of your app, but how
    • later we will discuss technical architecture of “connecting” an app to social networks
  • SOA
  • Amazon.com: Web 1.0 SOA
    • ~50 “two-pizza” teams of “developer/operators”
    • ~10 operators
      • monitor the whole site
      • page the resolvers on alarm
    • ~1000 resolvers
      • 10-15 per team, 1 on-call 24x7
      • monitor own service, fix problems
    • Over 140 code change commits/month
    • Internal microcosm of service-oriented architecture (as were Yahoo, Google, others)
    P. Bodík et al., Advanced Tools for Operators at Amazon.com, Proc. ICAC 2005 web server web server web server web server web server service A web server web server service B web server web server service C DB DB DB
  • What is SOA?
    • Use other services as RPC servers for your app
    • Web 1.0: large sites organized this way internally
      • Yahoo!, Amazon, Google, ...
      • External “Services” available, but getting them is high-touch: Doubleclick ads, Akamai content distribution
    • Web 2.0: consumer-facing service API’s and typically pay-as-you-go (vs. contractual)
      • Services: Google AdSense, Google Analytics, Amazon CloudFront...
      • Platforms: Facebook, Google Maps, ...
      • Mashups, e.g. housingmaps.com
      • User-composable services, e.g. Yahoo Pipes
  • SOA == RPC
    • Transport: HTTP(S)
    • Data interchange: XML DTD (e.g., RSS), JSON
    • Request protocol:
      • SOAP (Simple Object Access Protocol)
      • JSON-RPC
    • On the horizon: WebHooks (HTTP POST callback, for “push”)
  • JSON-RPC
    • Open connection to designated port on server
    • Send HTTP method & request URI, with MIME type of body set to application/json
    • Then send request body:
    • { &quot;version&quot;: &quot;1.1&quot;,
    • &quot;method&quot;: &quot;confirmFruitPurchase&quot;,
    • &quot;id&quot;: &quot;194521489&quot;,
    • &quot;params&quot;: [ [ &quot;apple&quot;, &quot;orange&quot;, &quot;pear&quot; ], 1.123 ]
    • }
    • Response might be something like this:
    • { &quot;version&quot;: &quot;1.1&quot;,
    • &quot;result&quot;: &quot;done&quot;,
    • &quot;error&quot;: null,
    • &quot;id&quot;: &quot;194521489&quot;
    • }
    • You have to handle substantially all errors
  • RSS
    • Request is a regular HTTP GET to a specified URL
    • <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
    • <rss version=&quot;2.0&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1&quot;>
    • < channel >
    • < title >Altarena Playhouse Ticket Availability</title>
    • < link >http://www.audience1st.com/altarena/store</link>
    • < description >Altarena Ticket Availability</description>
    • < item >
    • < title >Sylvia - Friday, May 14, 8:00 PM – Buy now</title>
    • < link >http://.../store?showdate_id=347</link>
    • < guid permalink=&quot;false&quot;>http://.../store?ts=1271058414</guid>
    • </item>
    • <item>
    • ...
    • </item>
    • </channel>
    • </rss>
  • AJAX and SOA
    • AJAX: client  server
      • client makes (async) requests to HTTP server
      • client-side JavaScript upcall receives reply and decides what to do
      • commonly, response includes XHTML/XML to update page, or JavaScript to execute
      • Doesn’t really make sense except in context of client
    • SOA: server  server or client  server
      • one principal makes (sync or async) requests to an HTTP server
      • formerly, principal was a server running some app
      • today, powerful JavaScript clients blur the line
  • Facebook
    • Facebook plug-in apps
    • Facebook platform (“Facebook Connect”)
    AJAX Facebook.com Your app 2. 3. FB data FBQL 4. html 1. SOA 3. Your app Facebook.com FB data html+ xfbml 1. 4. 2 (opt.). REST  REST via JavaScript & XFBML  HTML IFRAME w/FB content
  • Google Maps
    • Your app embeds Javascript-heavy client code (provided by Google)
      • client-side functionality: clear/draw overlays, etc.
      • server-side functionality: fetch new map, rescale, geocoding
    • Attach callbacks (handled by your app) to UI actions
    • Result of callback can trigger additional calls to Google Maps code, which in turn contact GMaps servers
    html+ js Your app Google Maps 1. 2. 3. 4.
  • Mashups: housingmaps.com
  • Two ways to do it...
    • + Client portability
    • +/– Client performance (both app download & JavaScript execution)
    • + Availability of utility libraries for app development
    • – Privacy/trustworthiness of aggregator app
    • – Caching
    “ Thin” browser client Web 2.0 app Craigslist.org Google Maps “ Fat” browser client
  • REST (Representational State Transfer) Philosophy
    • Architectural style (not a standard per se):
      • Client-server, Stateless, Cacheability indicated
      • a/k/a post-hoc description of properties that made Web 1.0 successful by constraining SOA interactions
    • In context of SOA for Web 2.0
      • HTTP is transport; HTTP methods (Get, Put, etc.) are the only commands
      • Reify idea that URI names resource (broadly...)
      • Client has resource  has enough info to request modification of resource on server
      • cookie can encode part of transferred state
    • If your app is RESTful, it’s easy to “SOA”-ify
  • REST with HTTP examples HTTP GET HTTP PUT HTTP POST HTTP DELETE Collection URI, such as http://example.com/customers/257/orders List the members of the collection, complete with their member URIs for further navigation Replace the entire collection with another collection Create a new entry in the collection. The ID created is usually included as part of the data returned by this operation. delete the entire collection HTTP GET HTTP PUT HTTP POST HTTP DELETE Element URI, such as http://example.com/resources/7HOU57Y Retrieve a representation of the addressed member of the collection in an appropriate MIME type Update (or create) the addressed member of the collection Treats the addressed member as a collection in its own right and creates a new subordinate of it. Delete the addressed member of the collection.
  • AGILE DEVELOPMENT & WEB 2.0
  • New models of software development
    • Process: Support DADO Evolution, 1 group
    • Waterfall: Static Handoff Model, N groups
    D evelop A ssess D eploy O perate D evelop A ssess D eploy O perate
  • Why is this here?
    • For many, a new way to develop software
    • Highly productive: undergraduates produce complete working apps, with tests , in weeks
    • Great structural fit for Web 2.0 applications
    • Amazingly good tools: “make it fun” just as important for testing as for development
  • (Short) History of Software Engineering
    • “1/3 of software development projects fail or are abandoned outright because of cost overruns, delays, and reduced functionality”
    • IRS Tax Modernization System:
      • “ The IRS must recognize that technology is an enabler, not a driver, of business success, and that it needs a strategic plan with business objectives that drive the use of technology.” House Commission on Restructuring the IRS, 1997 report
    • Denver Airport Baggage Handling System
      • 1.5 year delay, $1M/day during modifications/repairs, ultimately abandoned 10 years later (source: Wikipedia)
    • Software Development Failures: Anatomy of Abandoned Projects, K.Ewusi-Mensah, 2003
  • “Big Design Up Front”
    • Started with elaborate, detailed specification of what customer wants
      • 100s of pages
    • Problem: Customers may change mind
      • change wrecks schedule in unpredictable ways
      • some use cases may have been forgotten or misrepresented
    • But change is inevitable
      • “ If a problem has no solution, it may not be a problem, but a fact; not to be solved, but to be coped with over time” Israeli foreign minister Shimon Peres
  • Agile Development
    • Big Design Up Front
    • Agile
    • Time, resources, and scope “fixed”
    • Changing one affects the others, as well as quality
    • Manage the plan
    • Try to minimize change
    • Time, resources, and quality fixed
    • Changing time or resources affects scope
    • Manage the priorities
    • Change as you learn more
    Agile methods break tasks into small increments with minimal planning, and do not directly involve long-term planning . Each iteration involves a team working through a full SW development cycle including planning, requirements analysis, design, coding, unit testing, and acceptance testing when a working product is demonstrated to stakeholders. This helps minimize overall risk, and lets the project adapt to changes quickly. An iteration may not add enough functionality to warrant a market release, but the goal is to have an available release (with minimal bugs) at the end of each iteration . Multiple iterations may be required to release a product or new features.
  • Test-Driven/Behavior-Driven Development
    • Behavior driven: start from behaviors, and behavior spec == acceptance test
      • Start from user behavior by writing the code you wish you had (results in better API than top-down design)
      • Script the tests you’d single-step manually
      • when done, get automatable integration/acceptance test for free
    • Test driven: write tests first
      • debugging, testing, isolating bugs: need modular code
      • write test first ensures code is modular/debuggable
  • User Stories for Acceptance/Integration Testing
    • A story from user perspective that provides business value to stakeholder and is testable
    • As a [type of stakeholder]
      • I want to [perform some task]
      • so that I can [reach some goal]
      • Complete Web app has 100’s or 1000’s of stories
      • Long stories (“epics”) broken down to smaller chunks
    • Development proceeds in fixed-period iterations (typically 2 weeks)
      • Each story small enough to implement in 1 iteration
      • Developer estimates difficulty ( points ) to implement
      • “ Deliver” (release) N new points/iteration ( velocity )
  • A Feature Comprises Several User Stories
    • Feature: Subscriber purchases additional tickets
    • As a season subscriber
    • I want to go to the Store page
    • So that I can buy discounted tickets for a show
    • Scenario : Subscriber logs in
    • Given I am logged in as a subscriber
    • When I visit the &quot;Store&quot; page
    • Then I should see the Subscriber message
    • Scenario : Subscriber offered discount ticket price
    • Given I am on the &quot;Store&quot; page
    • And there are upcoming performances of &quot;Chicago&quot;
    • When I select the show &quot;Chicago&quot;
    • Then &quot;Subscriber Discount&quot; should appear in the &quot;Ticket Prices&quot; menu
  • A Feature Comprises Several User Stories
    • Feature: Subscriber purchases additional tickets
    • As a season subscriber
    • I want to go to the Store page
    • So that I can buy discounted tickets for a show
    • Scenario : Subscriber logs in
    • Given I am logged in as a subscriber
    • When I visit the &quot;Store&quot; page
    • Then I should see the Subscriber message
    • Scenario : Subscriber offered discount ticket price
    • Given I am on the &quot;Store&quot; page
    • And there are upcoming performances of &quot;Chicago&quot;
    • When I select the show &quot;Chicago&quot;
    • Then &quot;Subscriber Discount&quot; should appear in the &quot;Ticket Prices&quot; menu
    1. Title 2. Narrative 3. User stories
  • Rails Testing Ecosystem
    • Unit testing: RSpec (based on Java Spec)
      • more expressive, and Ruby-specific
      • extensive support for isolation (mocking & stubbing) by exploiting Ruby dynamic language features
    • Integration/acceptance testing: Cucumber
      • can be used for non-Ruby systems
      • bridges user stories and integration tests
    • Cucumber on Rails
      • Web browser interactions: use Webrat or Selenium to emulate or script browser interactions, incl. JavaScript
      • (Optional) Use RSpec facilities to setup preconditions, check postconditions of tests
  • Given...
    • Regular expressions match scenario text to test code
    • “ Steps” implement Given, When, Then
    • Given: setup preconditions either directly or via Webrat/Selenium
    • Given /^I am logged in as a subscriber$/ do
    • visit '/customers/login'
    • @customer = customers(:tom_the_subscriber)
    • fill_in 'customer_login', :with => @customer.login
    • fill_in 'customer_password', :with => @customer.pass
    • click_button 'Login'
    • response . should match(/Login successful/)
    • end
  • When... Then...
    • When: use Webrat or Selenium to emulate browser or drive a real browser
    • Then: use RSpec (unit test) facilities to check outcome ( should, should_receive , etc.)
    • When /^I visit the Store page$/i do
    • visit '/store'
    • end
    • Then /^I should see the (.*) message$/ do |msg|
    • response . should have_selector(&quot;div. #{m}&quot;)
    • response . should match (Regexp.new(
    • &quot;Welcome,.*&quot; + @customer.first_name))
    • end
  • Expectations example
    • describe &quot;transferring a ticket&quot; do
    • context &quot;when recipient doesn't exist&quot; do
    • before(:each) do
    • @t = Ticket.new(...)
    • @from = Customer.find(:first)
    • @from.tickets << @t
    • @from.save!
    • @to = create_nonexistent_customer_id()
    • end
    • it &quot;should not cause an error&quot; do
    • lambda { @t. transfer_to_customer (@to) }. should_not raise_error
    • end
    • it &quot;should not remove from original owner&quot; do
    • @t. transfer_to_customer (@to)
    • @from.tickets.should include(@t)
    • end
    • end
    • end
    Test case 2 Test case 1 Preconditions before each test
  • Expectations example
    • describe &quot;successful purchase&quot; do
    • it &quot;should contact the payment gateway&quot; do Store. should_receive (:pay_via_gateway). with (@amount,@credit_card,@params). exactly (1). times . and_return (@success) Store.purchase!(...)
    • Expectation modifiers: at_least( n ).times, any_number_of_times
    • Argument modifiers: with(:any_args), with()
    • Return value modifiers: and_return( val )
    • Ruby dynamic language features used to implement this test scaffolding
  • Outside-In Development: Red/Green/Refactor
    • For each step in user story
    • 1. Write the step definition
    • 2. Run & watch it fail
    • For each behavior of underlying objects/models
    • Write unit test (expectation)
    • 3. Watch it fail
    • 4. Implement just enough to pass
    • 5. Refactor if needed
    • 6. Watch user story step pass
    • 7. Refactor step(s) if needed
  • Tracking Progress with PivotalTracker.com
  • Summary: Agile & Behavior-Driven Development
    • Agile, iteration-based process based on user stories
    • Planning, coding, testing proceed as a cycle by 1 person
    • Test-first promotes modularity, debugability, and a concrete measure of progress
    • Attention to productivity in testing tools as well as dev tools
      • Student projects in Berkeley SaaS class: ~50% LOC were testing-related
  • DEPLOYMENT
  • Scaling via Replication
    • The “most general” deployment scenario for a 3 tier Web app
      • Many Web servers
      • possibly including static-asset servers
      • L4/L7 load balancers distribute load among them
    • Caches and reverse proxies remember previously-computed content
      • whole page caching
      • page fragment caching, query caching
      • Apache in reverse-proxy mode, or memcached process(es) addressed by app server
    • Integration of caching with app logic varies by framework
    WS WS … $ $ … LB LB … App App DB DB? … Asset Svr the Internets
  • “ Scale makes availability affordable”
    • Goal: interchangeability (send any user request to any available server)
      • each server handles 1/N load
      • affinity can be used to “soft-pin” users to particular servers
      • requires good support for session state abstraction in app framework
    • lose 1 server => lose 1/N capacity
      • Load Balancers have logic to detect failed servers & remove from rotation until they are resurrected
    WS WS … $ $ … LB LB … App App DB DB? … Asset Svr the Internets
  • Asset Servers
    • For serving static assets (images, sound clips, CSS, etc.)
    • Separate Web server process, configuration optimized for fast static file serving
    • Web 2.0: use Amazon S3 (blob store) or CloudFront (CDN)
      • helps to have good asset-server abstraction in app framework
  • Deploying a new release
    • Checkout new code on production server(s)
    • Run database schema migrations if any
    • Quiesce old version, soft-restart new version
    • If necessary, temporary disable access during quasi-atomic switchover
    • Differentiate between asset servers, code servers, database machines
    • Be prepared to roll back if any problems
    • Tools like capistrano help automate the above steps
  • Deployment scenarios (& approximate pricing)
    • Buy/rack/install/configure it yourself...that’s so Web 1.0
    • Shared hosting ($3/month)
      • turnkey support for popular frameworks, hosted versions of popular building blocks (e.g. MySQL)
      • highly variable performance, multitenant per machine
    • Virtual private host ($10/month)
      • better isolation and security through virtualization
      • substantially more administration
    • “ Framework VM” or “curated” environments (Heroku, Google AppEngine, Force.com) – pricing varies
      • hosted extensions: memcached , profiling, etc.
      • integration of 3 rd party hosted services, e.g. Amazon S3 backup
    • Cloud Computing
  • Pay-as-you-go Cloud Computing “ Instances” Platform Cores Memory Disk Small - $0.085 / hr 32-bit 1 1.7 GB 160 GB Large - $0.34/ hr 64-bit 4 7.5 GB 850 GB – 2 spindles XLarge - $0.68/ hr 64-bit 8 15.0 GB 1690 GB – 3 spindles Options....extra memory, extra CPU, extra disk, ...
  • A Berkeley View of Cloud Computing (2/09)
    • abovetheclouds.cs.berkeley.edu
    • Goal: stimulate discussion on what’s new
      • Clarify terminology
      • Quantify comparisons
      • Identify challenges & opportunities
    • UC Berkeley perspective
      • industry engagement but no axe to grind
      • users of Cloud Computing since late 2007
    • New: pay-as-you-go, utility computing
      • Illusion of infinite resources on demand (minutes)
      • Fine-grained billing : release == don’t pay, no minimum
  • Cloud Economics 101
    • Cloud Computing User : Static provisioning for peak - wasteful, but necessary for SLA
    “ Statically provisioned” data center “ Virtual ” data center in the cloud Unused resources Demand Capacity Time Demand Capacity Time
  • Cloud Economics 101
    • Cloud Computing Provider : Could save energy
    “ Statically provisioned” data center Real data center in the cloud Unused resources Demand Capacity Time Demand Capacity Time
  • Risk of Overprovisioning Unused resources
    • Underutilization results if “peak” predictions are too optimistic
    Static data center Demand Capacity Time
  • New Scenarios Enabled by “Risk Transfer” to Cloud
    • “ Cost associativity” from linear pricing: 1,000 CPUs for 1 hour same price as 1 CPUs for 1,000 hours (@$0.10/hour)
      • Washington Post converted Hillary Clinton’s travel documents to post on WWW <1 day after released
      • RAD Lab graduate students demonstrate improved Hadoop (batch job) scheduler—on 1,000 servers
    • Major enabler for SaaS startups
      • Animoto traffic doubled every 12 hours for 3 days when released as Facebook plug-in
      • Scaled from 50 to >3500 servers
      • ...then scaled back down
    • Goal: fix any transient problem by adding/removing nodes
      • Single-node performance becomes much less important
  • Classifying Clouds for Web 2.0
    • Instruction Set VM (Amazon EC2)
    • Managed runtime VM (Microsoft Azure)
    • Curated “IDE-as-a-service” (Heroku)
    • Platform as service (Google AppEngine, Force.com)
    • flexibility/portability vs. built-in functionality
    EC2 Azure Force.com Lower-level, Less managed Higher-level, More managed, more value-added SW Heroku, AppEngine Joyent
  • Summary: Deployment
    • “ Deployment-as-a-service” increasingly common
      • monthly pay-as-you-go curated environment (Heroku)
      • hourly pay-as-you-go cloud computing (EC2)
      • hybrid : overflow from fixed capacity to elastic capacity
      • Remember administration costs when comparing!
    • Good framework can help at deployment time
      • Separate abstractions for different types of state: session state, asset server, caching, database
      • ORM – natural fit for social computing, and abstracts away from SQL (vs Web 1.0 PHP, e.g.)
      • REST – encourages you to make your app RESTful from start, so that “SOA”-ifying it is trivial
    • Scaling structured storage: open challenge
  • EDUCATION
  • Software Education in 2010 (or: the case for teaching SaaS)
    • “depth first” CS curricula vs. Web 2.0 breadth
      • DB, Networks, OS, SW Eng/Languages, Security, ...
      • Medium of instruction for SW Eng. courses not tracking languages/tools/techniques actually in use
      • Students learn bad practices by osmosis so they can create Web apps
    • New: languages & tools are actually good now
      • Ruby, Python, etc. are tasteful and allow reinforcing important CS concepts (higher-order programming, closures, etc.)
      • order-of-magnitude greater productivity than 1 generation ago, including for testing
  • Team Skills
    • Web 2.0 apps increasingly composed of loosely coupled teams doing DADO
    • Technical as well as “social” team skills needed
      • repository management
      • branching, tagging, merging
      • distributing responsibility during collaboration
    • Web 2.0 SaaS == Great fit for ugrad education
      • Apps can be developed/deployed on semester timescale
      • Rapid gratification => projects outlive the course
      • Team skills in context of agile development
  • SaaS Using RoR at Cal: Course Goals
    • What’s different about DADO for SaaS
      • Basic *ilities: Horizontal scaling, load balancing, H/A
      • Consistency, caching, database scaling, CAP
      • Benchmarking, tuning, understanding SLA’s
    • How CS “big ideas” make RoR high productivity
      • H.O. programming, metaprogramming, introspection => ActiveRecord ORM
      • runtime code generation => AJAX support
    • Major Vehicle: DADO an app of your choice, in teams of 2-3; deploy to public cloud
      • zero to prototype in ~6 weeks
      • assume OOP skills, but no DB or web programming
  • Comparison to other SW Eng./programming courses
    • Open-ended project
      • vs. “fill in blanks” programming
    • Focus on SaaS
      • vs. Android, Java desktop apps, etc.
    • Focus on RoR as high-level framework
    • Projects expected to work
      • vs. working pieces but no artifact
      • most projects actually do work, some continue life outside class
    • Focus on how “big ideas” in languages/programming enable high productivity
  • Topic coverage & labs
    • “Hello World” web app in Rails
    • Unit-test-driven design of a specified module
    • User-story-driven design of an app (work in teams of 2 or 3 students)
    • Deploy own app to Amazon EC2
    • Use Cloudstone benchmark app to saturate MySQL database (using EC2)
    • Experiment with different types of caching to observe effect on database saturation
    • Final demo: publicly-deployed app, short talk
  • Web 2.0 SaaS as Course Driver
    • Majority of students: ability to design own app was key to appeal of the course
      • design things they or their peers would use
    • High productivity frameworks => projects work
      • actual gratification from using CS skills, vs. getting N complex pieces of Java code to work but not integrate
    • Fast-paced semester is good fit for agile iteration-based design
    • Tools used are same as in industry
  • Cloud Computing as a Supporting Technology
    • Elasticity is great for courses!
      • Donation from AWS; ~$100/student
      • Watch a database fall over: ~200 servers needed
      • Lab deadlines, final project demos
    • VM image simplifies courseware distribution
      • Prepare image ahead of time
      • Students can be root if need to install weird SW, libs...
    • students get better hardware
      • cost associativity
      • cloud provider updates HW more frequently
    • VM images compatible with Eucalyptus—enables hybrid cloud computing
  • Moving to cloud computing What Before After Compute servers 4 nodes of R cluster EC2 Storage local Thumper S3, EBS Authentication login per student, MySQL username/tables per student, ssh key for SVN per student EC2 keypair + Google account Database Berkeley ITS shared MySQL MySQL on EC2 Version control local SVN repository Google Code SVN Horizontal scaling ??? EC2 + haproxy/nginx Software stack management burden Jon Kuroda create AMI
  • Success stories
  • Success stories, cont.
    • Fall 2009 project: matching undergrads to research opportunities
    • Fall 2009 project: Web 2.0 AJAXy course scheduler with links to professor reviews
    • Spring 2010 projects: apps to stress RAD Lab infrastructure
      • gRADit: vocabulary review as a game
      • RADish: comment filtering taken to a whole new level
  • SaaS Courses at Cal Lower div. Upper div. Grad. Understand Web 2.0 app structure ✔ Understand high-level abstraction toolkits like RoR ✔ ✔ How high-level abstractions implemented ✔ ✔ Scaling/operational challenges of SaaS ✔ ✔ Develop & deploy SaaS app ✔ ✔ Implement new abstractions, languages, or analysis for SaaS ✔
  • Planning a SaaS course?
    • Pick a highly-productive framework
      • Projects can be deployed, and will actually work
      • Students can use production-quality tools & methods
      • We used Ruby on Rails; Google AppEngine probably also a good choice
    • Avail yourself of *-as-a-service
      • Google Code for Subversion version control
      • PivotalTracker for project tracking
      • EC2 for app deployment (Amazon is very good about donating AWS credits for education)
    • Tie high-productivity mechanisms back to CS “big ideas”
      • Code generation, introspection/reflection, metaprogramming, higher order programming
    • Steal our materials (http://radlab.cs.berkeley.edu)
  • Summary: Education
    • Web 2.0 SaaS is a great motivator for teaching software skills
      • students get to build artifacts they themselves use
      • some projects continue after course is over
      • opportunity to (re-)introduce “big ideas” in software development/architecture
    • Cloud computing is great fit for CS courses
      • elasticity around project deadlines
      • easier administration of courseware
      • students can take work product with them after course (e.g. use of Eucalyptus in RAD Lab)
  • WEB 2.0 RESEARCH
  • What’s New in Web 2.0
    • Very large structured data storage that scales elastically with app
    • Understanding & generating large spikes
    • Operational problems: finding the “needle in the haystack”
    • Renewed focus on client side challenges (JavaScript, client security, browser performance)
    • Cloud Computing enables large scale and elasticity
  • Cloud Computing
    • Cost associativity makes it possible to obtain results on 100’s or 1000’s of servers
      • Console log mining
      • BOOM (declarative cloud programming)
      • SCADS (SIGMOD 2010 demo)
    • Eucalyptus makes hybrid cloud computing reasonably practical
      • Run small experiments locally, then “scale up” to cloud for paper results
    • Why aren’t you using cloud computing yet?
  • Example: Facebook
    • Facebook has 2 datacenters, 1 per coast
      • reads spread across both
      • writes only to W. Coast; periodically (~10 minutes) replicated to E. Coast
      • >2000 MySQL servers, >25TB RAM for memcached
    • Challenge: inconsistency due to stale data
      • I change status message => Friends on East Coast datacenter don’t see change for 10 min
      • What if E.Coast person changes own status??
  • Web at 100 feet: georeplication & CDN’s
    • Source: “How Facebook Works”, Technology Review, Jul/Aug 2008
  • SCADS: Scalable, Consistency-Adjustable Data Storage
    • Most popular websites follow the same pattern
      • Outgrow initial prototype (on MySQL) due to scale
      • Build large, complicated ad-hoc systems to deal with scaling limitations as they arise
    • Want Scale Independence as new users join:
      • No changes to application
      • Cost per user & request latency don’t increase
    • Key Innovations
      • Performance-{safe,insightful} query language
      • Declarative performance/consistency tradeoffs
      • Automatic scale up & down using machine learning
    M. Armbrust et al., SCADS: Scalable Consistency-Adjustable Data Storage for Interactive Applications. Proc. CIDR 2009 M. Armbrust et al., PIQL: A Performance-Insightful Query Language. Proc. SOCC 2010.
  • Console Log Mining (Xu et al., SOSP 2009, ICDM 2009)
    • Console logs ubiquitous, yet rarely used for debugging operational problems
      • Neither human-friendly nor machine-friendly
    • Combine source parsing, type inference, and machine learning to find “needles in haystack”
    • 240-node MapReduce job => ~20M lines log => 1 page operator-friendly decision tree
      • Now in use at Google to parse production logs
    Machine Learning Visualization Parsing Feature Creation Source code/binary Console logs
      • W. Xu et al., Mining Console Logs for Large-Scale System Problem Detection (SOSP 2009) [online-algorithm version: ICDM 2009]
  • “ Fingerprinting” Datacenter Performance Crises for Automated Problem Identification
    • Goal: automate identification of performance crises given similar symptoms (“alarm on key performance indicators”)
      • Today: ad-hoc ID takes minutes to hours
    • Fingerprint uses statistical techniques to capture most relevant metrics associated with a particular crisis
    • Simple distance calculation identifies similar past crises
    time overloaded back-end overloaded back-end DB config error app config error
      • P. Bodik et al., Fingerprinting the datacenter: Automated classification of performance crises, EuroSys 2010
  • Understanding Web 2.0 workload spikes
    • death of Michael Jackson: 23% of tweets, 15% of Wikipedia traffic...how to understand/synthesize spikes?
      • characterized 5 real spikes
      • large differences in many characteristics: steepness, magnitude, # data hotspots, …
      • spike synthesis tool generates realistic workload with few parameters using statistically sound model
    • Cloud computing offers alternative to overprovisioning?
    • P. Bodik et al., Characterizing, modeling, and generating workload spikes for stateful services, SOCC 2010
      • P. Bodik et al., Statistical machine learning makes automatic control practical for Internet datacenters, HotCloud '09
      • P. Bodik et al., Automatic exploration of datacenter performance regimes, ACDC '09
    stateful system policy actions workload model Exploration of performance [ACDC ’09] Policy simulator [HotCloud ’09] Resource allocation for stateful systems
  • It’s Tough Being a Browser
    • Performance, security and GUI choke point
      • Cookie management, JavaScript, XSS attacks
      • CSS layout, side-effects of DOM manipulation
      • AJAX: document can change anytime after loading
      • RIA frameworks/plugins: Flash, HTML 5, Silverlight
      • Device variation: new twist due to power limits
      • See http://www.google.com/googlebooks/chrome for a great fun-to-read description in comic-book form
    • Bring “desktop-quality” browsing to handhelds?
      • Enabled by 4G networks, better output devices
      • Parsing, Rendering, Scripting are all potential bottlenecks to parallelizing
  • Parallelizing the Browser (Rastislav Bodík et al.)
    • New algorithms for CSS, layout, rendering, up to 80X speedup on popular web sites
    • Constraint-based browser programming language unifies layout, rendering, scripting
    84ms 2ms
  • SO YOU WANT TO TRY WRITING A WEB 2.0 APP
  • Try it yourself, it’s fun
    • Download & install RoR (rubyonrails.org)
      • A good intro book: Simply Rails 2 by Patrick Lenz
    • Signup for a free Heroku.com trial account
    • Install various useful “Ruby gems” (libraries): gem install heroku cucumber-rails rspec-rails
    • Create app skeleton: rails mynewapp ; git commit –m “new app” mynewapp heroku create mynewapp
    • Start writing user stories and track your progress (PivotalTracker.com)
    • When ready to deploy on Heroku: git push heroku master
    • More resources: http://radlab.cs.berkeley.edu > Courses
  • Wrap Up
    • New ecosystem: changes software process, deployment process, interdependencies of DADO (everything-as-a-service)
    • Web 2.0 tools have reached a new plateau of productivity unheard-of 1 generation ago
    • Gap closing between “web app” user experience vs. desktop app
    • Genuine new and interesting research problems
    • Ideal vehicle for ugrad education?
  • THANK YOU!
  • Demos
    • ARC Scheduler
    • gRADit