Escalando Aplicaciones Web

4,274 views

Published on

Presentation at BarCamp Buenos Aires 2009 on Scaling web applications using memcached, MapReduce and Amazon Web Services.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,274
On SlideShare
0
From Embeds
0
Number of Embeds
29
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Map (Java) = dict (Python) = associative array (PHP) = Hash (Ruby) = Object (Javascript)
  • Escalando Aplicaciones Web

    1. 1. Escalando aplicaciones web BarCamp Buenos Aires 2009 Santiago Coffey - Popego
    2. 2. Simplest webapp
    3. 3. Scaling with job queues
    4. 4. Scaling with alternative databases
    5. 5. YouTube recipe for rapid growth #!/usr/bin/env python while True: identify_bottlenecks() fix_bottlenecks() drink() sleep() notice_new_bottleneck()
    6. 6. What can I do today to make it better?
    7. 7. Web server?
    8. 8. Ability to take the small...
    9. 9. ... to the big
    10. 10. SCALING SUCKS
    11. 11. RDBMS vs. simple key-value stores ACID: Atomicity, High availability Consistency, Isolation, Durability Replication, distributed Transactions Fault tolerance SQL: Joins, Group-By Scalability
    12. 12. ACID? What for?
    13. 13. memcached Distributed memory caching system Works as a large hash table distributed across multiple machines Used to speed up applications by caching data and objects in memory, thus saving database or external data source access
    14. 14. memcached Lacks authentication and security -> firewall Lacks atomicity Lacks durability -> memcachedb (whose BerkeleyDB backend is easily replicable) Fast, provides high availability
    15. 15. Other key-value stores... Redis Project Voldemort CouchDB Scalaris HyperTable Cassandra HBase OpenChord / DHT
    16. 16. MapReduce Soft ware framework Introduced by Google To support distributed computing On large data sets Using clusters of computers
    17. 17. Word count example: Mapper #!/usr/bin/env python import sys def mapper(stream=sys.stdin): for line in stream: for word in line.split(): print '%st%d' % (word, 1) if __name__ == "__main__": mapper()
    18. 18. Word count example: Reducer #!/usr/bin/env python import sys def reducer(stream=sys.stdin): counter = {} for line in stream: word, count = line.split('t', 1) counter[word] = counter.get(word, 0) + int(count) for word, count in sorted(counter, key=lambda i: i[0]) print '%st%d' % (word, count) if __name__ == "__main__": reducer()
    19. 19. Framework implementations Hadoop BashReduce: http://github.com/erikfrey/ bashreduce/tree/master br [-m host1 [host2...]] [-c column] [-r reduce] [-i input] [-o output]
    20. 20. HBase/BigTable HBase: Open source implementation of Google’s BigTable Not a RDBMS
    21. 21. A BigTable is a sparse, distributed, persistent multidimensional sorted map
    22. 22. HBase: map At its core it’s a map a collection of keys and values where each key has a value associated
    23. 23. HBase: persistent Map data is permanently stored e.g.: file on a disk filesystem
    24. 24. HBase: distributed Built upon distributed filesystems Underlying storage can be spread out Among an array of independent machines (like in a RAID system) e.g.: HDFS, Amazon S3, GFS Replicated across a number of nodes
    25. 25. HBase: sorted Key-value pairs kept in strict order Keys are indexed so that similar items are nearer (avoiding table scans)
    26. 26. HBase: multidimensional Actually it’s a map of maps: Row (root key) Column family (specified upon creaton, cannot be modified) Qualifier (unlimited, can be modified) Timestamp (defaults to latest) Value
    27. 27. HBase: sparse A row can have any number of columns in each column family or none at all Also: sparseness in key gaps
    28. 28. Hey! I’m just a developer. Do I need to understand the details of this super filesystem? NO!
    29. 29. Amazon Web Services Accessed over HTTP Using REST and SOAP protocols Also: GUI clients Billed on “usage” (usage form depends on exact service)
    30. 30. Amazon Web Services: Elastic Map Reduce Enables to process vast amounts of data Utilizes a hosted Hadoop framework running on the web-scale infrastructure EC2 and S3 Easy and cost-effective Reliable and fault-tolerant
    31. 31. Amazon Web Services: Elastic Map Reduce Why Elastic? You choose how many computer instances (and how much powerful) You launch job flows on demand and they are started within minutes Computer instances automatically torn down on completion
    32. 32. Amazon Web Services: EC2 EC2: Elastic Compute Cloud Provides scalable virtual private ser vers using Xen Machine instances launched on demand Charges per machine: Hourly + data transfer
    33. 33. When is my system elastic? When infrastructure can easily grow and shrink
    34. 34. Amazon Web Services: S3 S3: Simple Storage System Stores arbitrary objects (< 5 GB) in unique user-assigned keys Organized into buckets (each owned by an AWS account)
    35. 35. Amazon Web Services: S3 Scalability, high availability and low latency At commodity costs: $0.15 per GB- month + bandwith charges
    36. 36. Amazon Web Services: SQS SQS: Simple Queue Ser vice Distributed queue messaging ser vice Supports programmatic sending of messages via web ser vice applications Resolves common producer-consumer problems among distributed hosts
    37. 37. Other Amazon Web Services... Amazon SimpleDB: operates in concert with EC2 and S3 to provide the core functionality of a database running queries on structured data Amazon CloudFront, a content delivery net work (CDN) for distributing objects stored in S3 to so- called "edge locations" near the requester AWS Management Console: A web-based interface to manage and monitor the Amazon infrastructure suite
    38. 38. ¡Gracias! santiago.coffey@popego.com http:/ witter.com/scoffey /t http://scoffey.popego.com

    ×