Escalando Aplicaciones Web
Upcoming SlideShare
Loading in...5
×
 

Escalando Aplicaciones Web

on

  • 2,607 views

Presentation at BarCamp Buenos Aires 2009 on Scaling web applications using memcached, MapReduce and Amazon Web Services.

Presentation at BarCamp Buenos Aires 2009 on Scaling web applications using memcached, MapReduce and Amazon Web Services.

Statistics

Views

Total Views
2,607
Views on SlideShare
2,580
Embed Views
27

Actions

Likes
1
Downloads
11
Comments
0

3 Embeds 27

http://www.coffey.com.ar 17
http://sirshamrock.blogspot.com 7
http://www.slideshare.net 3

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Map (Java) = dict (Python) = associative array (PHP) = Hash (Ruby) = Object (Javascript)

Escalando Aplicaciones Web Escalando Aplicaciones Web Presentation Transcript

  • Escalando aplicaciones web BarCamp Buenos Aires 2009 Santiago Coffey - Popego
  • Simplest webapp
  • Scaling with job queues
  • Scaling with alternative databases
  • YouTube recipe for rapid growth #!/usr/bin/env python while True: identify_bottlenecks() fix_bottlenecks() drink() sleep() notice_new_bottleneck()
  • What can I do today to make it better?
  • Web server?
  • Ability to take the small...
  • ... to the big
  • SCALING SUCKS
  • RDBMS vs. simple key-value stores ACID: Atomicity, High availability Consistency, Isolation, Durability Replication, distributed Transactions Fault tolerance SQL: Joins, Group-By Scalability
  • ACID? What for?
  • memcached Distributed memory caching system Works as a large hash table distributed across multiple machines Used to speed up applications by caching data and objects in memory, thus saving database or external data source access
  • memcached Lacks authentication and security -> firewall Lacks atomicity Lacks durability -> memcachedb (whose BerkeleyDB backend is easily replicable) Fast, provides high availability
  • Other key-value stores... Redis Project Voldemort CouchDB Scalaris HyperTable Cassandra HBase OpenChord / DHT
  • MapReduce Soft ware framework Introduced by Google To support distributed computing On large data sets Using clusters of computers
  • Word count example: Mapper #!/usr/bin/env python import sys def mapper(stream=sys.stdin): for line in stream: for word in line.split(): print '%st%d' % (word, 1) if __name__ == "__main__": mapper()
  • Word count example: Reducer #!/usr/bin/env python import sys def reducer(stream=sys.stdin): counter = {} for line in stream: word, count = line.split('t', 1) counter[word] = counter.get(word, 0) + int(count) for word, count in sorted(counter, key=lambda i: i[0]) print '%st%d' % (word, count) if __name__ == "__main__": reducer()
  • Framework implementations Hadoop BashReduce: http://github.com/erikfrey/ bashreduce/tree/master br [-m host1 [host2...]] [-c column] [-r reduce] [-i input] [-o output]
  • HBase/BigTable HBase: Open source implementation of Google’s BigTable Not a RDBMS
  • A BigTable is a sparse, distributed, persistent multidimensional sorted map
  • HBase: map At its core it’s a map a collection of keys and values where each key has a value associated
  • HBase: persistent Map data is permanently stored e.g.: file on a disk filesystem
  • HBase: distributed Built upon distributed filesystems Underlying storage can be spread out Among an array of independent machines (like in a RAID system) e.g.: HDFS, Amazon S3, GFS Replicated across a number of nodes
  • HBase: sorted Key-value pairs kept in strict order Keys are indexed so that similar items are nearer (avoiding table scans)
  • HBase: multidimensional Actually it’s a map of maps: Row (root key) Column family (specified upon creaton, cannot be modified) Qualifier (unlimited, can be modified) Timestamp (defaults to latest) Value
  • HBase: sparse A row can have any number of columns in each column family or none at all Also: sparseness in key gaps
  • Hey! I’m just a developer. Do I need to understand the details of this super filesystem? NO!
  • Amazon Web Services Accessed over HTTP Using REST and SOAP protocols Also: GUI clients Billed on “usage” (usage form depends on exact service)
  • Amazon Web Services: Elastic Map Reduce Enables to process vast amounts of data Utilizes a hosted Hadoop framework running on the web-scale infrastructure EC2 and S3 Easy and cost-effective Reliable and fault-tolerant
  • Amazon Web Services: Elastic Map Reduce Why Elastic? You choose how many computer instances (and how much powerful) You launch job flows on demand and they are started within minutes Computer instances automatically torn down on completion
  • Amazon Web Services: EC2 EC2: Elastic Compute Cloud Provides scalable virtual private ser vers using Xen Machine instances launched on demand Charges per machine: Hourly + data transfer
  • When is my system elastic? When infrastructure can easily grow and shrink
  • Amazon Web Services: S3 S3: Simple Storage System Stores arbitrary objects (< 5 GB) in unique user-assigned keys Organized into buckets (each owned by an AWS account)
  • Amazon Web Services: S3 Scalability, high availability and low latency At commodity costs: $0.15 per GB- month + bandwith charges
  • Amazon Web Services: SQS SQS: Simple Queue Ser vice Distributed queue messaging ser vice Supports programmatic sending of messages via web ser vice applications Resolves common producer-consumer problems among distributed hosts
  • Other Amazon Web Services... Amazon SimpleDB: operates in concert with EC2 and S3 to provide the core functionality of a database running queries on structured data Amazon CloudFront, a content delivery net work (CDN) for distributing objects stored in S3 to so- called "edge locations" near the requester AWS Management Console: A web-based interface to manage and monitor the Amazon infrastructure suite
  • ¡Gracias! santiago.coffey@popego.com http:/ witter.com/scoffey /t http://scoffey.popego.com