My other computer is a datacentre

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    My other computer is a datacentre - Presentation Transcript

    1. My other computer is a datacentre Steve Loughran Julio Guijarro November 2008
    2. Our other computer is a datacentre
    3. His other computer is a datacentre
      • Open source projects
      • SVN, web, defect tracking
      • Released artifacts
      code.google.com
    4. His other computer is a datacentre
      • Email
      • Stop-motion Lego videos on YouTube
    5. Their other computer is this datacentre Yahoo! 8000 nodes, 32K cores, 16 Petabytes
    6. Problem: scale
    7. Economical Scale
      • Business model: advertising plus “professional”
      • Power cost > hardware cost after 3 years
      • Commodity hardware (x86, SATA)‏
      • Little/No RAID
    8. There are no free electrons
    9. Power
      • HVAC power 1.5-2X system power =server W saved has follow-on benefits
      • Idle servers still consume 80% peak power =keep busy or shut down
      • Gigabit copper networking costs power
      • SSD front end machines save on disk costs, and can be started/stopped fast.
    10. Hardware
    11. Network Fabric
      • 1Gbit/s from the server
      • 10 Gbit/s between racks
      • Two 10Gb/s for failover
      • Multiple off-site links
      Bandwidth between racks is a bottleneck
    12. Where?
      • Cheap electricity
      • Cool and dry outside air
      • Cheap land, building affordable
      • Networking: fast, affordable, >1 supplier
      • Low risk of earthquakes and other disasters
      • Hardware: easy to get machines in fast
      • Politics: tax, govt. stability, data protection .
    13.  
    14. How?
    15.  
    16. High Availability
      • Avoid SPOFs
      • Replicate data.
      • Redeploy applications onto live machines
      • Route traffic to live front ends
      • Decoupled connection to back end: queues, scatter/gather
      • Issue: how do you define live ?
    17. Web Front End
      • Disposable machines
      • Hardware: SSD with laptop CPUs
      • PHP and other agile languages
      • HTML, Ajax, RSS/Atom interconnects
      • Talk to the back end
    18. Work engine
      • Move work to where the data is
      • Queue, scatter/gather or tuple-space
      • Create workers based on demand
      • Spare time: check health, power off
      • Cloud Algorithms: MapReduce
    19. MapReduce: Hadoop
    20. The Cotham Bluetooth Dataset
      • Six months worth of discovered Bluetooth devices
      • MAC Address and timestamp
      • One site: a few megabytes
      • Future: multiple sites?
      {lost,"00:0F:B3:92:05:D3","2008-04-17T22:11:15",1124313075} {found,"00:0F:B3:92:05:D3","2008-04-17T22:11:29",1124313089} {lost,"00:0F:B3:92:05:D3","2008-04-17T22:24:45",1124313885} {found,"00:0F:B3:92:05:D3","2008-04-17T22:25:00",1124313900} {found,"00:60:57:70:25:0F","2008-04-17T22:29:00",1124314140}
    21. MapReduce to Day of Week map_found_event_by_day_of_week( {event, found, Device, _, Timestamp}, Reducer) -> DayOfWeek = timestamp_to_day(Timestamp), Reducer ! {DayOfWeek, Device}. size(Key, Entries, A) -> L = length(Entries), [ {Key, L} | A]. mr_day_of_week(Source) -> mr(Source, fun traffic:map_found_event_by_day_of_week/2, fun traffic:size/3, []).
    22. Results traffic:mr_day_of_week(big). [{3,3702}, {6,3076}, {2,3747}, {5,3845}, {1,3044}, {4,3850}, {7,2274}] Monday 3044 Tuesday 3747 Wednesday 3702 Thursday 3850 Friday 3845 Saturday 3076 Sunday 2274
    23. Hadoop running MapReduce
    24. Filesystem
      • Commodity disks
      • Scatter data across machines
      • Duplicate data across machines
      • Trickle-feed backup to other sites
      • Long-haul API: HTTP
      • Archive unused files
      • In idle time: check health of data, rebalance
      • Monitor and predict disk failure.
    25. What will your datacentre do?

    + Steve LoughranSteve Loughran, 11 months ago

    custom

    1938 views, 0 favs, 3 embeds more stats

    Talk on the engineering aspects of building a datac more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1938
      • 1881 on SlideShare
      • 57 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 54
    Most viewed embeds
    • 53 views on http://www.1060.org
    • 3 views on http://1060.org
    • 1 views on http://static.slideshare.net

    more

    All embeds
    • 53 views on http://www.1060.org
    • 3 views on http://1060.org
    • 1 views on http://static.slideshare.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories