#105
 Get More Refcardz! Visit refcardz.com




                                                                                                      Getting Started with
                                                    CONTENTS INCLUDE:
                                                    n	
                                                         Introduction
                                                    n	
                                                         Scalable Data Architecture


                                                                                                NoSQL and Data Scalability
                                                    n	
                                                         Is NoSQL For You?
                                                    n	
                                                         mongoDB
                                                    n	
                                                         GigaSpaces XAP
                                                         Google App Engine Datastore and more...
                                                                                                                                                                       By Eugene Ciurana
                                                    n	




                                                                                                                                    Implementations of either often share characteristics from the
                                                           INTRODUCTION                                                             other.

                                                    The DZone Refcard #43 is an introduction to system high                         Data Grids
                                                    availability and scalability terminology and techniques                         Data grids process workloads defined as independent jobs
                                                    (http://refcardz.dzone.com/refcardz/scalability). The next                      that don’t require data sharing among processes. Storage
                                                    logical step is the scalable handling of massive data volumes                   or network may be shared across all nodes of the grid, but
                                                    resulting from having these powerful processing capabilities.                   intermediate results have no bearing on other jobs progress or
                                                                                                                                    on other nodes in the grid, such as a MapReduce cluster.
                                                    This Refcard demystifies NoSQL and data scalability
                                                    techniques by introducing some core concepts. It also offers
                                                    an overview of current technologies available in this domain
                                                                                                                                                                  Consumer
                                                    and suggests how to apply them.

                                                    What is Data Scalability?
                                                                                                                                                                    Master
                                                    Data scalability is the ability of a system to store, manipulate,
                                                    analyze, and otherwise process ever increasing amounts of
 www.dzone.com




                                                    data without reducing overall system availability, performance,                                              Load Balancer

                                                    or throughput.
                                                    Data scalability is achieved by a combination of more powerful
                                                                                                                                           Node           Node                   Node    Node
                                                    processing capabilities and larger but efficient storage
                                                    mechanisms.

                                                    Relational and hierarchical databases scale up by adding more                                                Load Balancer
                                                    processors, more storage, caching systems, and such. Soon
                                                    they hit either a cost or a practical scalability limit because
                                                    they are difficult or impossible to scale out. These database                         Node           Node                    Node   Node
                                                    management systems are designed as single units that must
                                                    maintain data integrity, and enforce schema rules to guarantee                                         Figure 1 - Data Grid
                                                    it. This rigidity is what promotes upward, but not outward,
  Getting Started with NoSQL and Data Scalability




                                                    scalability.                                                                    Data grids expose their functionality through a single API
                                                                                                                                    (either a Web service or native to the application programming
                                                                    Oracle RAC is a cluster of multiple computers with              language) that abstracts its topology and implementation from
                                                                    access to a common database. This is considered                 its data processing consumer.
                                                            Hot     only vertically scalable because the processing
                                                            Tip     (usually in the form of stored procedures) may scale
                                                                    out, but the shared storage facilities don’t scale with
                                                                    the cluster.
                                                                                                                                                       Get over 90 DZone Refcardz
                                                    Data integrity and schemas are suited for handling
                                                    transactional, normalized, uniform data. They handle                                                  FREE from Refcardz.com!
                                                    unstructured or rapidly evolving data structures with difficulty
                                                    or exponentially larger costs.


                                                            Hot     Data replication is not the same as data scalability!
                                                            Tip

                                                           SCALABLE DATA ARCHITECTURES

                                                    There are two general kinds of architectures used for building
                                                    scalable data systems: data grids and NoSQL systems.

                                                                                                                  DZone, Inc.   |   www.dzone.com
2
                                                                                                                   Getting Started with NoSQL and Data Scalability



Areas of Application                                                                                    NoSQL vs RDBMS vs OO Analogies
  • Financial modeling                                                                                   NoSQL                                 RDBMS                                    OO
  • Data mining
                                                                                                         Kind                                  Table                                    Class
  • Click stream analytics
                                                                                                         Entity                                Record                                   Object
  • Document clustering
  • Distributed sorting or grepping                                                                      Attribute                             Column                                   Property
  • Simulations
  • Inverted index construction
                                                                                                            IS NoSQL FOR YOU?
  • Protein folding

NoSQL                                                                                                   Preparation:
NoSQL describes a horizontally scalable, non-relational
                                                                                                           • Don’t fall prey to the NoSQL fad
database with built-in replication support. Applications
                                                                                                           • Don’t be stubborn; neither NoSQL nor traditional
interact with it through a simple API, and the data is stored in a
                                                                                                             databases apply to all cases
“schema-free”, flat addressing repository, usually as large files
                                                                                                           • Apply the CAP Theorem to your use cases to
or data blocks. The repository is often a custom file system
                                                                                                             determine feasibility
designed to support NoSQL operations.
                                                                                                        Brewer’s (CAP) Theorem
                NoSQL is intended as shorthand for “not only                                            It’s impossible for a distributed computer system to
    Hot         SQL.” Complete architectures almost always mix                                          simultaneously provide all three of these guarantees:
    Tip
                traditional and NoSQL databases.                                                           • Consistency (all nodes see the same data at the same time)
                                                                                                           • Availability (node failures don’t prevent survivors from
NoSQL setups are best suited for non-OLTP applications that                                                  continuing to operate)
process massive amounts of structured and unstructured data                                                • Partition tolerance (no failures less than total network
at a lower cost and with higher efficiency than RDBMs and                                                    failures cause the system to fail)
stored procedures.
                                                                                                        Since only two of these characteristics are guaranteed for any
                                                                                                        given scalable system, use your functional specification and
                                           Consumer                                                     business SLA (service level agreement) to determine what your
                                                                                                        minimum and target goals for CAP are, pick the two that meet
                                                                                                        your requirements, and proceed to implement the appropriate
         Node                    Node                     Node                     Node                 technology.


                                                                                                                              Rule of Thumb: NoSQL’s primary goal is to achieve
                                      Virtual File System
                  logical table management, load balancing, garbage collection
                                                                                                             Hot              horizontal scalability. It attains this by reducing
                                  (HDFS, mongoFS, Hypertable)                                                Tip
                                                                                                                              transactional semantics and referential integrity.
            Tablet           Tablet                                        Tablet
           Server 0         Server 1                                      Server n

                                                                                                        Use Figure 3 to identify the best match between your
                                     Distributed File System
                                                                                                        application’s CAP requirements and the suggested SQL and
                                                                                                        NoSQL systems listed.
                FS 0          FS 1            FS 2                          FS n


                              Figure 2 - NoSQL Topology                                                                                                      Relational
                                                                                                                                                             Key-Value
                                                                                                                                                          Column-Oriented
NoSQL storage is highly replicated (a commit doesn’t occur                                                                                               Document-Oriented
until the data is successfully written to at least two separate
                                                                                                            Consistency                                                                               Availability
storage devices) and the file systems are optimized for write-
                                                                                                                  C                                                                                         A
                                                                                                                              RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica
only commits. The storage devices are formatted to handle
large blocks (32 MB or more). Caching and buffering are
                                                                                                                                                                                                               a,
                                                                                                                  mo




designed for high I/O throughput. The NoSQL database
                                                                                                                                                                                                            dr
                                                                                                                  ng edi




                                                                                                                                                                                                          an
                                                                                                                    oD s,




                                                                                                                                                                                                             s
                                                                                                                     R




is implemented as a data grid for processing (mapReduce,
                                                                                                                                                                                                          as
                                                                                                                       B, Be




                                                                                                                                                                                                     ak C




                                                                                                                                               Pick Any Two
                                                                                                                         Te rke




                                                                                                                                                                                                   Ri I,




queries, CRUD, etc.)
                                                                                                                                                                                                 B, , KA
                                                                                                                           rra ley
                                                                                                                              sto D




                                                                                                                                                                                               hD et
                                                                                                                                 re B,




                                                                                                                                                                                             uc bin




Areas of Application
                                                                                                                                   ,D M




                                                                                                                                                                                          Co Ca
                                                                                                                                     ata em




                                                                                                                                                                                        B, yo




  • Document storage
                                                                                                                                        sto cac




                                                                                                                                                                                     leD Tok
                                                                                                                                           re he




  • Object databases
                                                                                                                                                                                  mp t,
                                                                                                                                             , H DB




                                                                                                                                                                               Si mor
                                                                                                                                                yp , S




                                                                                                                                                                                 lde
                                                                                                                                                  er




  • Graph databases
                                                                                                                                                     tab cala




                                                                                                                                                                              Vo
                                                                                                                                                        le, ris




  • Key/value stores
                                                                                                                                                                           o,
                                                                                                                                                                            m
                                                                                                                                                           Hb




                                                                                                                                                                         na




                                                                                                                                                                     P
                                                                                                                                                              as




  • Eventually consistent key/value stores
                                                                                                                                                                         Dy
                                                                                                                                                             ,   e




This list is the implementation counterpart to the data grid                                                                                            Partition tolerance
areas of application; the data store feeds the computational                                                                                    Figure 3 - CAP Selection Chart
network and, together, they form the NoSQL database.                                                                                             (source: Nathan Hurst's Blog)


                                                                                      DZone, Inc.   |   www.dzone.com
3
                                                                                                                   Getting Started with NoSQL and Data Scalability



                                                                                                         encoded JSON representation. BSON is designed to be
   mongoDB                                                                                               traversable, lightweight and efficient. Applications can map
                                                                                                         BSON/JSON documents using native representations like
mongoDB is a document-based NoSQL database that bridges                                                  dictionaries, lists, and arrays, leaving the BSON translation
the gap between scalable key-value stores like Datastore                                                 to the native mongoDB driver specific to each programming
and Memcache DB, and RDBMS’s querying and robustness                                                     language.
capabilities. Some of its main features include:
                                                                                                         BSON Example
   • Document-oriented storage - data is manipulated as                                                   {
     JSON-like documents                                                                                      'name' : 'Tom',
                                                                                                              'age' : 42
   • Querying - uses JavaScript and has APIs for submitting                                               }
     queries in every major programming language
   • In-place updates - atomicity                                                                         Language         Representation
   • Indexing - any attribute in a document may be used for                                               Python           {
                                                                                                                                'name' : 'Tom',
     indexing and query optimization                                                                                            'age' : 42
   • Auto-sharding - enables horizontal scalability                                                                        }

   • Map/reduce - the mongoDB cluster may run smaller                                                     Ruby             {
                                                                                                                                "name" => "Tom",
     MapReduce jobs than a Hadoop cluster with significant                                                                      "age" => 42
                                                                                                                           }
     cost and efficiency improvements
                                                                                                          Java             BasicDBObject d;
mongoDB implements its own file system to optimize I/O                                                                     d = new BasicObject();
                                                                                                                           d.put("name", "Tom");
throughput by dividing larger objects into smaller chunks.                                                                 d.put("age", 42);
Documents are stored in two separate collections: files                                                   PHP              array( "name" => "Tom",
containing the object meta-data, and chunks that form a                                                                      "age" => 42);

larger document when combined with database accounting                                                   Dynamic languages offer a closer object mapping to BSON/
information. The mongoDB API provides functions for                                                      JSON than compiled languages.
manipulating files, chunks, and indices directly. The
administration tools enable GridFS maintenance.                                                          The complete BSON specification is available from:
                                                                                                         http://bsonspec.org/
                                          Consumer                                                       mongoDB Programming
                                                                                                         Programming in mongoDB requires an active server running
                                                        fail-over
                                                                                                         the mongod and the mongos database daemons (see Figure
                                                                                                         5), and a client application that uses one of the language-
           mongoDB Server (master)                            mongoDB Server (slave)
       mongod                                         mongod
                                                                                                         specific drivers.
       Database                                       Database
       daemon                                         daemon
                              mongos                                             mongos
                              Sharding
                              daemon
                                                                                 Sharding
                                                                                 daemon                          Hot     All the examples in this Refcard are written in Python
                   Data                                              Data
                                                                                                                 Tip     for conciseness.
                  Storage                                           Storage


                                                                                                         Starting the Server
                                 Figure 5 - mongoDB Cluster
                                                                                                         Log on to the master server and execute:
A mongoDB cluster consists of a master and a slave. The slave                                             [servername:user] ./mongod
may become the master in a fail-over scenario, if necessary.
Having a master/slave configuration (also known as Active/                                               The server will display its status messages to the console
Passive or A/P cluster) helps ensure data integrity since only                                           unless stdout is redirected elsewhere.
the master is allowed to commit changes to the store at any                                              Programming Example
given time. A commit is successful only if the data is written to                                        This example allocates a database if one doesn’t already exist,
GridFS and replicated in the slave.                                                                      instantiates a collection on the server, and runs a couple of
                                                                                                         queries.
                  mongoDB also supports a limited master/master
                  configuration. It’s useful only for inserts, queries,                                  The mongoDB Developer Manual is available from:
    Hot           and deletions by specific object ID. It must not                                       http://www.mongodb.org/display/DOCS/Manual
    Tip
                  be used if updates of a single object may occur                                         #!/usr/bin/env jython
                  concurrently.                                                                           import pymongo

                                                                                                          from pymongo import Connection
Caching
                                                                                                          connection = Connection('servername', 27017)
mongoDB has a built-in cache that runs directly in the cluster
without external requirements. Any query is transparently                                                 db = connection['people_database']

cached in RAM to expedite data transfer rates and to reduce                                               peopleList = db['people_list']
disk I/O.                                                                                                 person = {
                                                                                                            'name' : 'Tom',
Document Format                                                                                             'age' : 42 }
mongoDB handles documents in BSON format, a binary-

                                                                                       DZone, Inc.   |   www.dzone.com
4
                                                                                       Getting Started with NoSQL and Data Scalability



 peopleList.insert(person)                                                         • JSON data and program objects storage - many RESTful
                                                                                     web services provide JSON data; they can be stored in
 person = {
   'name' : 'Nancy',                                                                 mongoDB without language serialization overhead
   'age' : 69 }
                                                                                     (especially when compared against XML documents)
 peopleList.insert(person)                                                         • Content management systems - JSON/BSON objects can
 # find first entry:                                                                 represent any kind of document, including those with a
 person = peopleList.find_one()
                                                                                     binary representation
 # find a specific person:
                                                                                mongoDB Drawbacks
 person = peopleList.find_one({ 'name' : 'Joe'})
                                                                                   • No JOIN operations - each document is stand-alone
 if person is None:                                                                • Complex queries - some complex queries and indices are
   print "Joe isn’t here!"
 else:                                                                               better suited for SQL
   print person['age']
                                                                                   • No row-level locking - unsuitable for transactional data
 # bulk inserts                                                                      without error prone application-level assistance
 persons = [{ 'name' : 'Joe' }, {'name' : 'Sue'}]

 peopleList.insert(persons)                                                     If any of these is part of the functional requirements, a SQL
                                                                                database would be better suited for the application.
 # queries with multiple results

 for person in peopleList.find():
   print person['name']                                                            GIGASPACES XAP
 for person in peopleList.find({'age' : {'$ge' : 21}}).sort('name'):
   print person['name']                                                         The GigaSpaces eXtreme Application Platform is a data
 # count:                                                                       grid designed to replace traditional application servers. It
 nDrinkingAge = peopleList.find({'age' : {'$ge' : 21}}).count()
                                                                                operates based on an event-processing model where the
                                                                                application dispatches objects to the processing nodes
 # indexing
                                                                                associated with a given data partition. The system may be
 from pymongo import ASCENDING, DESCENDING
                                                                                configured so that data state on the grid may trigger events, or
 peopleList.create_index([('age', DESCENDING), ('name', ASCENDING)])            an application may dispatch specific commands imperatively.
                                                                                GigaSpaces XAP also manages all threading and execution
The PyMongo documentation is available at:                                      aspects of the operation, including thread and connection
http://api.mongodb.org/python - guides for other languages                      pools. GigaSpaces XAP implements Spring transactions and
are also available from this web site.                                          auto-recovery. The system detects any failed operations
The code in the previous example performs these operations:                     in a computational node and automatically rolls back the
                                                                                transaction; it then places it back in the space where another
     • Connect to the database server started in the previous
                                                                                node picks it up to complete processing.
       section
     • Attach a database; notice that the database is treated like
                                                                                                      Application Frameworks
      an associative array
     • Get a collection (loosely equivalent to a table in a                                Java          C++               .Net           Groovy
       relational database), treated like an associative array
                                                                                           Mule         Spring             JEE              Jetty
     • Insert one or more entities
     • Query for one or more entites

Although mongoDB treats all these data as BSON internally,
most of the APIs allow the use of dictionary-style objects to
streamline the development process.                                                                              XAP Deployment Virtualization
                                                                                           XAP
Object ID                                                                               Management
A successful insertion into the database results in a valid                                and                   XAP Middleware Virtualization
                                                                                         Monitoring
Object ID. This is the unique identifier in the database for a                                                    (Virtualized Clustering Layer)
given document. When querying the database, a return value
will include this attribute:
 {
     "name" : "Tom",
     "age" : 42,
     "_id" : ObjectId('999999')
 }                                                                                        RDBMS            Memcache DB                   mongoDB

Users may override this Object ID with any argument as long as
it’s unique, or allow mongoDB to assign one automatically.
                                                                                              Figure 4 - GigaSpaces XAP Data Grid
Common Use Cases
     • Caching - more robust capabilities, plus persistence, when               GigaSpaces provides data persistence, distributed processing,
       compared against a pure caching system                                   and caching by interfacing with SQL and NoSQL data stores.
     • High volume processing - RDBMS may be too expensive                      The GigaSpaces API abstracts all back-end operations (job
       or slow to run in comparison                                             dispatching, data persistence, and caching) and makes

                                                              DZone, Inc.   |   www.dzone.com
5
                                                                                    Getting Started with NoSQL and Data Scalability



it transparent to the application. GigaSpaces XAP may
implement distributed computing operations like MapReduce                       GOOGLE APP ENGINE DATASTORE
and run them in its nodes, or it may dispatch them for
processing to the underlying subsystem if the functionality                 The Datastore is the main scalability feature of Google App
is available (e.g. mongoDB, Hadoop). The application may                    Engine applications. It’s not a relational database or a façade
implement transactions using the Spring API, the GigaSpaces                 for one. Datastore is a public API for accessing Google’s
XPA transactional facilities, or by implementing a workflow                 Bigtable high-performance distributed database system.
where specific NoSQL stores handle entities or groups of                    Think of it as a sparse array distributed across multiple servers
entities. This must be implemented explicitly in systems like               that also allows an infinite number of columns and rows.
mongoDB, but may exist in other NoSQL systems like Google                   Applications may even define new columns “on the fly”. The
App Engine’s Datastore.                                                     Datastore scales by adding new servers to a cluster; Google
                                                                            provides this functionality without user participation.
GigaSpaces XAP Programming                                                                                      Google                              Your
                                                                                                              Applications                       Applications
The GigaSpaces XAP API is very rich and covers many aspects
beyond NoSQL and data scalability areas like configuration                                     Datastore                         API 1            Datastore
                                                                                            Java (JDO, JPA)                  Other language        Python
management, deployment, web services, third-party product
                                                                                                                       Bigtable
integration, etc.                                                                                                   Master Server
                                                                                             (Logical table management, load balancing, garbage collection)
The GigaSpaces XAP documentation is at:
                                                                                                Tablet         Tablet                            Tablet
http://www.gigaspaces.com/wiki/display/XAP71/Programmer%27s+Guide                              Server 0       Server 1                          Server n


NoSQL operations may be implemented over these APIs:                                                                  Google File System


   • SQLQuery - allows querying a space using a SQL-like                                         FS 0          FS 1           FS 2                FS n

     syntax and regular expressions; do not confuse it with
                                                                                                          Figure 6 - Datastore Architecture
     JDBC support.
                                                                            Datastore operations are defined around entities. Entities
   • Persistency - mostly supports RDBMSs but may implement
                                                                            can have one-to-many or many-to-many relationships. The
     other persistency mechanisms through the External Data
                                                                            Datastore assigns unique IDs unless the application specifies
     Source Components API.
                                                                            a unique key. Datastore also disallows some property names
   • memcached - support for key/value pair distributed                     that it uses for housekeeping. The complete Datastore
     dictionaries available to any client in the grid; entities             documentation is available from:
     are automatically made available across all nodes. The                 http://code.google.com/appengine/docs/python/datastore/
     memcached API is implemented on top of the data
     grid, and it’s interchangeable with other memcached                                    Did you notice the parallels between Datastore
     implementations.                                                           Hot         and mongoDB so far? Many NoSQL database
   • Task Execution - allows synchronous and asynchronous job                   Tip         implementations have solved similar problems in
     execution on specific nodes or clusters                                                similar ways.

The GigaSpaces XAP API is in a minority of stateful NoSQL                   Transactions and Entity Groups
systems. Most NoSQL systems strive to achieve statelessness                 Datastore supports transactions. These transactions are only
to increase scalability and data consistency.                               effective on entities that belong to the same entity group.
                                                                            Entities in a given group are guaranteed to be stored on the
Common Use Cases                                                            same server.
   • Real-time analytics - dynamic data analysis and reporting
                                                                            Datastore Programming
     based on data entered into a system less than a minute
                                                                            The programming model is based on inheritance of basic
     before effective time of use
                                                                            entities, db.Model and db.Expando. Persistent data is
   • Map/reduce - distributed data processing of large data                 mapped onto an entity specialization of either of these classes.
     sets across a computational grid                                       The API provides persistence and querying instance methods
   • Near-zero downtime - allows for database schema                        for every entity managed by the Datastore.
     changes without homebrew master/slave configurations or
                                                                            Programming Example
     proprietary RDBMS dependencies
                                                                            The Datastore API is simpler than other NoSQL APIs and is
GigaSpaces XAP NoSQL Drawbacks                                              highly optimized to work in the App Engine environment. In
   • Complexity - the server, transactional, and grid model are             this example we insert data into the data store, then run a query:
     more complex than for other NoSQL systems                               from google.appengine.ext import db

   • Application server model - the API and components are                   class Person(db.Model):
                                                                               name = db.StringProperty(required=True)
     geared toward building applications and transactional                     age = db.IntegerProperty(require=True)
     logic
                                                                             person = Person(name = 'Tom', age = 42)
   • Steeper learning curve
                                                                             person.put()
   • Higher TCO - brings a requirement of a specialized,
                                                                             person = Person(name = 'Sue', age = 69)
     well-trained system administration team with higher
     requirements than other NoSQL systems                                   person.put()



                                                          DZone, Inc.   |   www.dzone.com
6
                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Getting Started with NoSQL and Data Scalability



                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Common Use Cases
                                                                                                                                       # find a specific person
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Massive scalability - Datastore offers ultimate scalability
                                                                                                                                       query = Person.all() # every entity!                                                                                                                                                                                                                                                                                               by leveraging Google’s own infrastructure for persistent
                                                                                                                                       query.filter('age > ', 20)                                                                                                                                                                                                                                                                                                         storage
                                                                                                                                       query.order('name')                                                                                                                                                                                                                                                                                                              • Google App Engine applications - there is no alternative
                                                                                                                                       peopleList = query.fetch() # up to 1000                                                                                                                                                                                                                                                                                            mechanism for data storage on this platform
                                                                                                                                       for person in peopleList:                                                                                                                                                                                                                                                                                                        • Data rich RESTful Web services - use the Datastore
                                                                                                                                         print person.name                                                                                                                                                                                                                                                                                                                and App Engine infrastructure to offload traditional data
                                                                                                                            This example performs these operations:                                                                                                                                                                                                                                                                                                       centers when stateless, data-intensive web services must
                                                                                                                               • Defines a Person kind and associates it with the Datastore                                                                                                                                                                                                                                                                               be implemented
                                                                                                                               • Persists new items using the put() method                                                                                                                                                                                                                                                                                            Datastore Drawbacks
                                                                                                                               • Defines and executes a query; notice that the query                                                                                                                                                                                                                                                                                    • Vendor lock-in - persistence and queries are tightly
                                                                                                                                 conditions are expressed as strings                                                                                                                                                                                                                                                                                                      coupled with the Datastore and the Datastore API is far
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          from being an industry standard
                                                                                                                            Gql - the Google Query Language                                                                                                                                                                                                                                                                                                             • Availability - Datastore has been known to fail and the
                                                                                                                            Datastore also allows queries in a custom, SQL-like language.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          EULA doesn’t allow more than 4-nines SLAs
                                                                                                                            The query in the previous example could be expressed as:
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Quotas - Datastore utilization costs per data access and
                                                                                                                                       SELECT * FROM Person WHERE age > 20                                                                                                                                                                                                                                                                                                for processor time
                                                                                                                                           ORDER BY name ASC
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        • Query limits - Result sets are limited to return a maximum
                                                                                                                            Gql is useful for writing more expressive queries than those                                                                                                                                                                                                                                                                                  of 1,000 entities, forcing queries be needlessly complex
                                                                                                                            written in the Python or Java APIs.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                         STAYING CURRENT
                                                                                                                                                                                                                                                        Careful! Python vs. Gql queries have different
                                                                                                                                                                                                                                                        performance and quota characteristics that may                                                                                                                                                                Do you want to know about specific projects and use cases
                                                                                                                                                                                                 Hot                                                    impact your cost or functionality! Refer to the
                                                                                                                                                                                                 Tip                                                                                                                                                                                                                                                                  where NoSQL and data scalability are the hot topics? Join the
                                                                                                                                                                                                                                                        Datastore documentation for a discussion of how                                                                                                                                                               scalability newsletter:
                                                                                                                                                                                                                                                        they differ.                                                                                                                                                                                                  http://eugeneciurana.com/scalablesystems




                                                                                                       ABOUT THE AUTHOR                                                                                                                                                                                                                                                                                                                                                   RECOMMENDED BOOK
                                                                                                                                                                                                                                                                    Eugene Ciurana is an open-source evangelist who specializes in the design                                                                                                                                             MongoDB, a cross-platform NoSQL database, is the fastest-
                                                                                                                                                                                                                                                                    and implementation of mission-critical, high-availability large scale systems.                                                                                                                                        growing new database in the world. MongoDB provides a
                                                                                                                                                                                                                                                                    Over the last two years, Eugene designed and built hybrid cloud scalable                                                                                                                                              rich document orientated structure with dynamic queries that
                                                                                                                                                                                                                                                                    systems for leading financial, software, insurance, and healthcare companies                                                                                                                                          you’ll recognize from RDMBS offerings such as MySQL. In other
                                                                                                                                                                                                                                                                    in the US, Japan, and Europe. As chief liaison between Walmart.com Global                                                                                                                                             words, this is a book about a NoSQL database that does not
                                                                                                                                                                                                                                                                    and the ISD Technology Council, he led the official adoption of Linux and                                                                                                                                             require the SQL crowd to re-learn how the database world
                                                                                                                                                                                                                                                                    other open-source technologies at Walmart Stores Information Systems                                                                                                                                                  works!
                                                                                                                                                                                                                                                                    Division in 2006.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                BUY NOW
                                                                                                                                                                                                                                                                    Publications                                                                                                                                                                                                               books.dzone.com/books/mongodb
                                                                                                                                                                                                                                                                    • Developing with Google App Engine
                                                                                                                                                                                                                                                                    • DZone Refcard #43: Scalability and High Availability
                                                                                                                                                                                                                                                                    • DZone Refcard #38: SOA Patterns
                                                                                                                                                                                                                                                                    • The Tesla Testament: A Thriller




                                                                                                                                          #82

                                                                                                                                                                                                                                                                                                                                                                                           Browse our collection of 100 Free Cheat Sheets
                                                                                                                                             Get More Refcardz! Visit refcardz.com




                                                                                                                                                                                         CONTENTS INCLUDE:
                                                                                                                                                                                         ■


                                                                                                                                                                                         ■
                                                                                                                                                                                             About Cloud Computing
                                                                                                                                                                                             Usage Scenarios                                                                         Getting Started with

                                                                                                                                                                                                              Aldon                                           Cloud 4Computing
                                                                                                                                                                                         ■
                                                                                                                                                                                             Underlying Concepts
                                                                                                                                                                                         Cost
                                                                                                                                                                                                     by...                                                         #6
                                                                                                                                                                                         ■




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Upcoming Refcardz
                                                                                                                                                                                                                                     ®
                                                                                                                                                                                            t to you
                                                                                                                                                                                         Data Tier Technologies
                                                                                                                                                                                         ■

                                                                                                                                                                                                                                 ply.
                                                                                                                                                                                     brough
                                                                                                                                                                                         Platform Management and more... te. Com
                                                                                                                                                                                         ■
                                                                                                                                                                                                                    bora
                                                                                                                                                                                                               Chan
                                                                                                                                                                                                                     ge. Colla                                                                                                              By Daniel Rubio


                                                                                                                       on:
                                                                                                                                                                                                                                                                                         dz. com




                                                                                                              gratiterns
                                                                                                                                                                                                                                                                        also minimizes the need to make design changes to support
                                                                                                                                                                                                                                                                                              CON
                                                                                                                                                                                                 ABOUT CLOUD COMPUTING                                                  one time events.           TEN TS
                                                                                                                                                                                                                                                                                                          INC



                                                                                                        s Inte nti-PatM. Duvall
                                                                                                                                                                                                                                                                                                                       ■
                                                                                                                                                                                                                                                                                                                           HTML                 LUD E:
                                                                                                                                                                                                                                                                                                                                  Basics
                                                                                                                                                                                                                                                                                        HTML
                                                                                                                                                                                                                                                                                       ref car




                                                                                                                                                                                                                                                                                                                   ■

                                                                                                                                                                                             Web applications have always been deployed on servers                      Automated growth & scalable technologies


                                                                                                 nuou
                                                                                                                                                                                                                                                                                             vs XHT
                                                                                                                                                                                                                                                                                                   ation
                                                                                                                                                                                                                                                                                                               ■
                                                                                                                                                                                                                                                                                                                   Valid                   ML
                                                                                                                                                                                             connected to what is now deemed the ‘cloud’.                               Having the capability to support one time events, cloud

                                                                                                             A
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Apache Ant
                                                                                                                                                                                                                                                                                             Useful



                                                                                             ontiPatterns and By Pa
                                                                                                                                                                                                                                                                                                           ■




#84
                                                                                                                                                                                                                                                                        computing platforms also facilitate the gradual growth curves
                                                                                                                                                                                                                                                                                                    Open
                                                                                                                    ul                                                                                                                                                                      Page         Source
                                                                                                                                                                                                                                                                                                       ■
                                                                                                                                                                                                                                                                                     Vis it




                                                                                                                                                                                             However, the demands and technology used on such servers                                             Stru


                                                                                            C
                                                                                                                                                                                                                                                                        faced by web applications. cture         Tools




                                                                                                                                                                                                                                                                                                                                                                                    Core
                                                                                                                                                                                                                                                                                            Key    ■
                                                                                                                                                                                                                                                                                                            Elem           Structur
                                          E:                                                                                                                                                 has changed substantially in recent years, especially with                                                 al Elem ents
                                INC LUD gration                                                                                                                                              the entrance of service providers like Amazon, Google and                                                         ents and
                                                                                                                                                                                                                                                                        Large scale growth scenarios involving specialized equipment
                         NTS
                                                                                                                                                                                                                                                                                rdz !




                                    ous Inte Change



                                                                                                                                                                                                                                                                                                                                                                                         HTML
                 CO NTE                                                                                                                                                                      Microsoft.                                     es                                                                           more...
                                                                                                                                                                                                                                                                        (e.g. load balancers and clusters) are all but abstracted away by
                          Continu at Every                                                                                                                                                                                           e chang
  m




                  About                       ns                                                                                                                                                                                     to isolat
                                                                                                                                                                                                                                                                        relying on a cloud computing platform’s technology.
                          Software i-patter
                                                                                                                                                                                                                               space




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Hadoop
  rdz .co




                  ■
                                                                                                                                                                                                  n
                                                                                                                                                                                                                                                                             Re fca




                                                                                                                                                                                                                        e Work
                   Build
                                                                                                                                                                                           riptio
                               and Ant                                                                                                                                         These companies have Privat deployed webmanage
                                                                                                                                                                                       Desc                    a
                                                                                                                                                                                                         are in long trol repos
                                                                                                                                                                                                                                      itory
                                                                                                                                                                                                                                                    applications                                                       HTM
                                                                                                                                                                                                                                                                                                                               L BAS
                      ■


                    Patterns Control                                                                                                                                                         lop softw                 n-con                     to
                          ■
                                                                                                                                                                               that adapt and scale to large user bases, making them
                                                                                                                                                                                        Deve
                                                                                                                                                                                                        les to
                                                                                                                                                                                                               a versio                ing and                                                                      ICS
                                                                                                                                                                                                                                                                        In addition, several cloud computing platforms support data
                                       ment                                                                                                                                                                                   ize merg
                                                                                                                                                                   rn
                     Version                   e...                                                                                                          Patte                              it all fi             to minim
                                                                                                                                                                                                         many aspects related tomultiple computing.
                                                                                                                                                                                                                                                                                            HTM
                             Manage s and mor                                                                                                                       e Work knowledgeable in a mainline                                            s cloud               tier technologies that L and the precedent set by Relational
                                                                                                                                                                                                                                                                                                    exceed
                              ■                                                                                                                                             space        Comm
      ref ca




                      Build                                                                                                                                                                                                                                                                               XHT
                                                                                                                                                                                                                                                                                re




                                                                                                                                                              Privat                           lop on                                that utilize                                           HTML
                                   tice                                                                                                                                                                                                                                                             is used ML are the
                                                                                                                                                                                                                                                                        Database Systems (RDBMS): Map Reduce, web service APIs,
                                  ■
                                                                                                                                                                                          Deve code lines
                            d Prac                                                                                                                                                                                         a system
                                                                                                                                                                     sitory                                                                            of work                             prog
                                                                                                                                                                                                                                                                                                  support as the grap foundation                                                                                                    By An
                                                                                                                                                                                                                                                                        Ge t Mo




                       Buil                                                                                                      Repo                                   active               are within
                                                                                                                                                  This Refcard will introduce to you to clouddcomputing, with an                       units
                                                                                                   RATION                                                                                                                                                               etc. Some platforms rams writ large scale RDBMS deployments.
                                      ■
                                                                                                                                                                                                                                                                                                                                                                                    The src
                                                                                                                                                                                                                                                                                                                                                                                                                                         dy Ha
                                                                                                                                                                                       softw                                 riente
                                                                                                                        e               ine                                    loping                                                                                                     and Java                           hical            of all
                                                                                       INTEG                                                                                                                                                                                                                ten                                                                              attribute
                                                                                                                                                                                                                      task-o
                                                                                                              softwar                             emphasis oncodelines providers, so by Commit better understand
                                                                                                                                   Mainl                                 Deve
                                                                                                                                                                            these                      chang Level can
                                                                                                                                                                                                              es you
                                                                                                                                                                                                                                                                                                      also recein JavaScri user interfac web develop                               and the                                                    rris
        Vis it




                                                                             S



                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Spring Security
                                                                                                                                                                                                                                                                                                                                                                                                        desc
                                                                       INUOU                           ding                                                                                                                                                                              as the
                                                                                                                                                                                             e code
                                                                                                                                             w ww.dzone.com




                                                                                                                                                                                                                                                                                                                  ive data pt. Server-s                                           the ima alt attribute ribes whe
                                                                                                                                                                                                          Task
                                                            NT                                 of buil           trol                             what it is a cloud computingaplatform can offer your web
                                                                                                                                           line Policy                           nize sourc es as                                                     ut                                          output                                           e in clien        ment.
                                                     T CO                               cess           ion con                      Code                                   Orga        it chang              e name                           e witho                                   likew
                                                                                                                                                                                                                                                                             CLOUD COMPUTING PLATFORMS AND ide languag t-side
                                                                                                                                                                                                                                                                                                           mechan            from                                                          ge is            describe re the ima
                                                                              the pro ject’s vers
                                                                                                                                                                                                                                       sourc
                                                                                                                                                  applications. and subm                                                                                                                       ise
                                            ABOU                      (CI) is
                                                                                                                                                            it                                  with uniqu                  are from
                                                                                                                                                                                                                                                                                        was onc use HTML              ism. The web pag                                           Nested
                                                                                                                                                                                                                                                                                                                                                                                                 unavaila             s alte          ge file
            rd z!




                                                                                   a pro                                                      evel Comm                             the build                        softw                                                                                                                                   es like                                       ble.              rnate
                                                            gration
                                                                                                                                                                                                             build                                        um
                                                                                                                                                                                                                                                                             UNDERLYING CONCEPTS                 and XHT         emergin es and use                                                                                           can be
                                                                           ed to                                     blem             Task-L                                 Label
                                                                                                                                                                                               activit
                                                                                                                                                                                                      ies to
                                                                                                                                                                                                                                       the bare
                                                                                                                                                                                                                                                   minim                                          e
                                                                                                                                                                                                                                                                                       standard a very loos                                 g Ajax
                                                                                                                                                                                                                                                                                                                                                                     PHP
                                                                                                                                                                                                                                                                                                                                                                                Tags        tags                                    text that           found,
                                                   ous Inte     committ                                  to a pro                                          USAGE SCENARIOS            ate all
                                                                                                                                                                               Autom configurat
                                                                                                                                                                                                     ion                       cies to                  t
                                                                                                                                                                                                                                                                                                  ization,        ely-defi
                                                                                                                                                                                                                                                                                                                            ML as
                                                                                                                                                                                                                                                                                                                                     thei            tech
                                                                                                                                                                                                                                                                                                                                                             HTML                     can                                                      is disp
                                          Continu ry change                                                                                                                                                                                                                                                                                                                    cannot be (and freq
                                                                                                                                                                                                                         nden                     ymen                                                                                                                                                                                                 layed
                                                                                                    tion                                                                                                                                                                                                                    ned lang r visual eng nologies
               Re fca




                                                                                                                  ive                           Build                                 al                         depe                      deplo      t                               need
                                                eve                                        , a solu , ineffect             )             Label                                  manu
                                                                                                                                                                                                  stalle
                                                                                                                                                                                                         d tool               the same          nmen                                         for           but as                                                                       overlap           uen                                                if
                                          with                                     s (i.e.       (i.e.               blem                         ated
                                                                                                                                                          Build
                                                                                                                                                                                       ce pre-in                       t, use target enviro                             Amazon EC2: ther stanstandard software and uage with
                                                                                                                                                                                                                                                                                      whe Industry dard              it has
                                                                                                                                                                                                                                                                                                                             become virtualization HTM
                                                                                                                                                                                                                                                                                                                                                        ine.                  b></               , so <a>< tly are) nest
                                                                         pattern                           lar pro
                                                                                                                                                                                                               ymen
                                                                                         tterns                                                                                                                                                                                                             s has                                                                   a> is
                                                    ory.                                                                                  Autom                                  Redu                d deploEAR) each
                                                                                                                                                  Pay only what you consume in                                                                                                                                       bec heavily based on very little L




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Subversion
                                           reposit               ed via        anti-pa             particu tions that e                                        nden
                                                                                                                                                                    cies                     tagge
                                                                                                                                                                                  For each (e.g. WAR
                                                                                                                                                                                                            or
                                                                                                                                                                                                                                                          t                          the curr you cho platform isome
                                                                                                                                                                                                                                                                        Amazon’s cloud computing ose                                    more                                              fine.              b></
                                                                                                                                                                                                                                                                                                                                                                                                                 a></
                                                                                                                                                                                                                                                                                                                                                                                                                            ed insid
                                                         explain text) and to “fi x” the are solu                                                                                                                            es                                                                                                 more            importa                                                                                e
                                                                                                                                                                                                                                                                                              ent stan                                                                                                                      not lega each othe
                                                                                                                                                                                                                                                                                                               to writ                                                                                                b> is
                                                     be                                                                   duc                     Web application deployment untillibrari yearsenvironmen similar
                                                                                                                                                    al Depe                               ge                      t a few                  t ago was                                that will
                                                                                                                                                                                                                                                                        industry standard software and virtualization technology.      app               nt,
                                                                                                  ns             to pro                      Minim                                 packa                    nden                                                                                        dards           e HTM
                 Mo re




                                            CI can ticular con             used                                                                                                                  all depe                       all targe
                                                                                                                                                                                                                                                                                                                                 L or XHT arent. Reg the                         HTM                                                                 r. Tags
                                                                                        i-patter they tend es, but can                                                                                                                                                                                                                                                                                                                l, but
                                                                                                                                                             rity
                                                                 etimes         s. Ant                                                        Binar most phone services: plans with alloted resources, with an
                                                                                                                                                  toInteg
                                                                                                                                                    y                                      alize                late fi
                                                                                                                                                                                                                        le that
                                                                                                                                                                                                                                                                                   and XHT simplify all         will help
                                                                                                                                                                                                                                                                                                                                                          ardless                       L VS
                                                                                                                                                                                                                                                                                                                                                                                               XHTM                                          <a><
                                             in a par hes som                                     ,                                                                                 Centr                                                               ts                                                                                  ML,
                                                                                                                                                                                                                                                                                              ML physical r othe you prov to
                                                                                                                ctic                                                                                     temp                                                                                               you                                                                                                                                    b></
                                                                       proces , in the end bad pra                        enting
                                                                                                                                                                              nt                                                                 nmen
                                                                                                                                                                 Mana
                                                                                                                                                                       geme                  e a single based on
                                                                                                                                                  incurred cost whether such resources were consumedto thenot.                          t enviro            or          Virtualization allows aare actu piece of hardware idebe understand of
                                                                                                                                                                                                                                                                                  much                                                                                     HTML                         L
                                              approac ed with the cial, but                                    implem
                                                                                                                                                                                                                                  targe
                                                                                                                                                                                                                                                                                                            ally simp r web cod
                                                                                                                                                                                      Creat
                                                                                                rily                                                   nden
                                                                                                                                                              cy                              rties are                       nt                       es                                of the                                             a solid            ing                 has
                                                                                                                                                                                                                into differe
                                                                                     necessa pared to
                                                                                                                                                                                                                                                chang
                                                                    efi                                                                          Depe                                   prope                                             itting
                                                                                                                                                                                                                                                                                                                        ler than ing. Fort foundation
                                                                                                                                                                                                                                                                        utilized by multiple operating systems. This allows resources
                                                                                                                                                                                                                                                                                                  function                                                                job adm been arou
                                               associat to be ben
                                                                                                                                                                    er
                          Ge t




                                                                                                                                                                                                    te builds                                                                    commo
                                                                           are not                                                                       late Verifi                                                            e comm
                                                                                                                                                                                                                                                                                            memory, ality has allocated exclusively to
                                                                                                                                                                                                                                                                                                                                                                                                     nd for
                                                                                           n com                                                  Cloud computing asRun remo              it’s known etoday has changed this. etc.                                                                                                               unately                 expecte irably, that
                                                                                                                                                                                                                        befor
                                                                    They         lts whe
                                                                                                                                                 Temp                                                           Build                                   ually,          (e.g. bandwidth,n elem CPU) to be mov                      they                                                                     some
                                                appear effects.                                                                                                                                 rm a
                                                                                                                                                  The various resources consumed by webperiodically,
                                                                                                                                                                                                       Privat                                    contin
                                                                                                                                                                                                                                   applications (e.g.nt team                     Every                ent instances. ed to CSS
                                                                                                                                                                                                                                                                                                                                        used
                                                                                                                                                                                                                                                                                                                                                to be, HTML                        d. Earl        job              time.
                                                                       ed resu                                                   tion                                                     Perfo                                                             opme                       pag
                                                                                                                                                                                                                                                                        individual operating system s                                                   because         Brow                y HTM has expand              Whi
                                                 adverse unintend                                                       Integra
                                                                                                                                                            d Builds                             sitory
                                                                                                                                                  bandwidth, memory, CPU) areIntegration on from CI server basis
                                                                                                                                                                                                                           Build                   to devel                     common e (HTML                                            .                                   ser                                   ed far le it has don
                                                                                                                                                                                                                                                                                                                                                                        web dev manufacture L had very
                                                                                                                                                    Stage                                  Repo
                                                          e                                                                                                                                                  tallied              a per-unit                                                            or XHT
                                                  produc                                                     tinuous                                         e Build                               rm an                                                                       extensio .) All are                                                                                                                          more              e its
                                                                                                                                                                                                                                                               e.com




                                                                                                                             ard                                                                                                                                                                                ML shar                                                                                         limit
                                                           tern.
                                                                                                                                                                                                                            ack                                                                                                                                                   elopers          rs add                          than
                                                                                                        Con            Refc                       (starting from zero) by Perfomajor cloud computing platforms.
                                                                                                                                                     Privat
                                                                                                                                                                                             all                                                                                         n. HTM essentia                                                                                                  ed man ed layout
                                                                                                                                                                                                                    feedb                                               As a user of Amazon’s EC2 cloud computing platform, you are
                                                                                                                                                                                                                                                                   on                                                      es cert                                     result                                                            anybody
                                                                                            the term” cycle, this
                                                                                                                                                                                                             ated
                                                   the pat
                                                                                                                                                                                                                                       occur                                                                                                                                                came
                                                                             tion                                          h as                                                                      autom                   as they                         based            proc                              lly plai                                                      is
                                                                  Integra al use of                                 ts suc
                                                                                                                                                                      Build                   Send
                                                                                                                                                                                                               ors as
                                                                                                                                                                                                                       soon
                                                                                                                                                                                                                                            ion with
                                                                                                                                                                                                                                                     builds
                                                                                                                                                                                                                                                                                   an operating system in the same wayain on all hosting
                                                                                                                                                                                                                                                                        assigned essor             L files                n text      as elements                      The late a lack of stan up with clev y competi
                                                                                                                                                                                                                                                                                                                                                                                                                                 supp
                                                                                                                                                                                                                                                                                                                                                                                                                                        ort.
                                                                                            and test concep
                                                                                                                                                               ration                                                                                                                                     should
                                                         tinuous vention                                                                                                                                                                                                                                                                                                                                                       ng stan
                                                                                                                                                        Integ                                                                        entat
                                                                                                                                                                                                                                                                                                                                                     in                         st web           dar             er wor
                                                                                  “build include                                                                                                                                                                                                                     not be
                                                                                                                                                                                                                            docum
                                                    Con                                                                                                                                                                                                                                                                                                                                                                  kar            dar
                                                            the con s to the                                                                                                                                                                                                                                                                                                             standar
                                                                                                                                                                                                                     oper
                                                                                                                                                                                                        rate devel                                                                                                            cr
                                                     While                             CI to                                                                                                     Gene
                                                                  efer         ion of



                                                                                                                                                                                                                                                                                                                                                                                                                                                                  DZone, Inc.
                                                                                                                                              Cloud Computing




                                                                     the not
                                                               s on
                                                       expand




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ISBN-13: 978-1-934238-75-2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  140 Preston Executive Dr.             ISBN-10: 1-934238-75-9
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Suite 100
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          50795
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Cary, NC 27513

                                                                                      DZone communities deliver over 6 million pages each month to                                                                                                                                                                                                                                                                                                                888.678.0399
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  919.678.0300
                                                                                      more than 3.3 million software developers, architects and decision
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Refcardz Feedback Welcome
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          $7.95




                                                                                      makers. DZone offers something for everyone, including news,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  refcardz@dzone.com
                                                                                      tutorials, cheatsheets, blogs, feature articles, source code and more.                                                                                                                                                                                                                                                                                                                                        9 781934 238752
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Sponsorship Opportunities
                                                                                      “DZone is a developer’s dream,” says PC Magazine.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  sales@dzone.com
                                                                                      Copyright © 2010 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical,                                                                                                                                                                                                                                                       Version 1.0
                                                                                      photocopying, or otherwise, without prior written permission of the publisher.

No sql and data scalability

  • 1.
    #105 Get MoreRefcardz! Visit refcardz.com Getting Started with CONTENTS INCLUDE: n Introduction n Scalable Data Architecture NoSQL and Data Scalability n Is NoSQL For You? n mongoDB n GigaSpaces XAP Google App Engine Datastore and more... By Eugene Ciurana n Implementations of either often share characteristics from the INTRODUCTION other. The DZone Refcard #43 is an introduction to system high Data Grids availability and scalability terminology and techniques Data grids process workloads defined as independent jobs (http://refcardz.dzone.com/refcardz/scalability). The next that don’t require data sharing among processes. Storage logical step is the scalable handling of massive data volumes or network may be shared across all nodes of the grid, but resulting from having these powerful processing capabilities. intermediate results have no bearing on other jobs progress or on other nodes in the grid, such as a MapReduce cluster. This Refcard demystifies NoSQL and data scalability techniques by introducing some core concepts. It also offers an overview of current technologies available in this domain Consumer and suggests how to apply them. What is Data Scalability? Master Data scalability is the ability of a system to store, manipulate, analyze, and otherwise process ever increasing amounts of www.dzone.com data without reducing overall system availability, performance, Load Balancer or throughput. Data scalability is achieved by a combination of more powerful Node Node Node Node processing capabilities and larger but efficient storage mechanisms. Relational and hierarchical databases scale up by adding more Load Balancer processors, more storage, caching systems, and such. Soon they hit either a cost or a practical scalability limit because they are difficult or impossible to scale out. These database Node Node Node Node management systems are designed as single units that must maintain data integrity, and enforce schema rules to guarantee Figure 1 - Data Grid it. This rigidity is what promotes upward, but not outward, Getting Started with NoSQL and Data Scalability scalability. Data grids expose their functionality through a single API (either a Web service or native to the application programming Oracle RAC is a cluster of multiple computers with language) that abstracts its topology and implementation from access to a common database. This is considered its data processing consumer. Hot only vertically scalable because the processing Tip (usually in the form of stored procedures) may scale out, but the shared storage facilities don’t scale with the cluster. Get over 90 DZone Refcardz Data integrity and schemas are suited for handling transactional, normalized, uniform data. They handle FREE from Refcardz.com! unstructured or rapidly evolving data structures with difficulty or exponentially larger costs. Hot Data replication is not the same as data scalability! Tip SCALABLE DATA ARCHITECTURES There are two general kinds of architectures used for building scalable data systems: data grids and NoSQL systems. DZone, Inc. | www.dzone.com
  • 2.
    2 Getting Started with NoSQL and Data Scalability Areas of Application NoSQL vs RDBMS vs OO Analogies • Financial modeling NoSQL RDBMS OO • Data mining Kind Table Class • Click stream analytics Entity Record Object • Document clustering • Distributed sorting or grepping Attribute Column Property • Simulations • Inverted index construction IS NoSQL FOR YOU? • Protein folding NoSQL Preparation: NoSQL describes a horizontally scalable, non-relational • Don’t fall prey to the NoSQL fad database with built-in replication support. Applications • Don’t be stubborn; neither NoSQL nor traditional interact with it through a simple API, and the data is stored in a databases apply to all cases “schema-free”, flat addressing repository, usually as large files • Apply the CAP Theorem to your use cases to or data blocks. The repository is often a custom file system determine feasibility designed to support NoSQL operations. Brewer’s (CAP) Theorem NoSQL is intended as shorthand for “not only It’s impossible for a distributed computer system to Hot SQL.” Complete architectures almost always mix simultaneously provide all three of these guarantees: Tip traditional and NoSQL databases. • Consistency (all nodes see the same data at the same time) • Availability (node failures don’t prevent survivors from NoSQL setups are best suited for non-OLTP applications that continuing to operate) process massive amounts of structured and unstructured data • Partition tolerance (no failures less than total network at a lower cost and with higher efficiency than RDBMs and failures cause the system to fail) stored procedures. Since only two of these characteristics are guaranteed for any given scalable system, use your functional specification and Consumer business SLA (service level agreement) to determine what your minimum and target goals for CAP are, pick the two that meet your requirements, and proceed to implement the appropriate Node Node Node Node technology. Rule of Thumb: NoSQL’s primary goal is to achieve Virtual File System logical table management, load balancing, garbage collection Hot horizontal scalability. It attains this by reducing (HDFS, mongoFS, Hypertable) Tip transactional semantics and referential integrity. Tablet Tablet Tablet Server 0 Server 1 Server n Use Figure 3 to identify the best match between your Distributed File System application’s CAP requirements and the suggested SQL and NoSQL systems listed. FS 0 FS 1 FS 2 FS n Figure 2 - NoSQL Topology Relational Key-Value Column-Oriented NoSQL storage is highly replicated (a commit doesn’t occur Document-Oriented until the data is successfully written to at least two separate Consistency Availability storage devices) and the file systems are optimized for write- C A RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica only commits. The storage devices are formatted to handle large blocks (32 MB or more). Caching and buffering are a, mo designed for high I/O throughput. The NoSQL database dr ng edi an oD s, s R is implemented as a data grid for processing (mapReduce, as B, Be ak C Pick Any Two Te rke Ri I, queries, CRUD, etc.) B, , KA rra ley sto D hD et re B, uc bin Areas of Application ,D M Co Ca ata em B, yo • Document storage sto cac leD Tok re he • Object databases mp t, , H DB Si mor yp , S lde er • Graph databases tab cala Vo le, ris • Key/value stores o, m Hb na P as • Eventually consistent key/value stores Dy , e This list is the implementation counterpart to the data grid Partition tolerance areas of application; the data store feeds the computational Figure 3 - CAP Selection Chart network and, together, they form the NoSQL database. (source: Nathan Hurst's Blog) DZone, Inc. | www.dzone.com
  • 3.
    3 Getting Started with NoSQL and Data Scalability encoded JSON representation. BSON is designed to be mongoDB traversable, lightweight and efficient. Applications can map BSON/JSON documents using native representations like mongoDB is a document-based NoSQL database that bridges dictionaries, lists, and arrays, leaving the BSON translation the gap between scalable key-value stores like Datastore to the native mongoDB driver specific to each programming and Memcache DB, and RDBMS’s querying and robustness language. capabilities. Some of its main features include: BSON Example • Document-oriented storage - data is manipulated as { JSON-like documents 'name' : 'Tom', 'age' : 42 • Querying - uses JavaScript and has APIs for submitting } queries in every major programming language • In-place updates - atomicity Language Representation • Indexing - any attribute in a document may be used for Python { 'name' : 'Tom', indexing and query optimization 'age' : 42 • Auto-sharding - enables horizontal scalability } • Map/reduce - the mongoDB cluster may run smaller Ruby { "name" => "Tom", MapReduce jobs than a Hadoop cluster with significant "age" => 42 } cost and efficiency improvements Java BasicDBObject d; mongoDB implements its own file system to optimize I/O d = new BasicObject(); d.put("name", "Tom"); throughput by dividing larger objects into smaller chunks. d.put("age", 42); Documents are stored in two separate collections: files PHP array( "name" => "Tom", containing the object meta-data, and chunks that form a "age" => 42); larger document when combined with database accounting Dynamic languages offer a closer object mapping to BSON/ information. The mongoDB API provides functions for JSON than compiled languages. manipulating files, chunks, and indices directly. The administration tools enable GridFS maintenance. The complete BSON specification is available from: http://bsonspec.org/ Consumer mongoDB Programming Programming in mongoDB requires an active server running fail-over the mongod and the mongos database daemons (see Figure 5), and a client application that uses one of the language- mongoDB Server (master) mongoDB Server (slave) mongod mongod specific drivers. Database Database daemon daemon mongos mongos Sharding daemon Sharding daemon Hot All the examples in this Refcard are written in Python Data Data Tip for conciseness. Storage Storage Starting the Server Figure 5 - mongoDB Cluster Log on to the master server and execute: A mongoDB cluster consists of a master and a slave. The slave [servername:user] ./mongod may become the master in a fail-over scenario, if necessary. Having a master/slave configuration (also known as Active/ The server will display its status messages to the console Passive or A/P cluster) helps ensure data integrity since only unless stdout is redirected elsewhere. the master is allowed to commit changes to the store at any Programming Example given time. A commit is successful only if the data is written to This example allocates a database if one doesn’t already exist, GridFS and replicated in the slave. instantiates a collection on the server, and runs a couple of queries. mongoDB also supports a limited master/master configuration. It’s useful only for inserts, queries, The mongoDB Developer Manual is available from: Hot and deletions by specific object ID. It must not http://www.mongodb.org/display/DOCS/Manual Tip be used if updates of a single object may occur #!/usr/bin/env jython concurrently. import pymongo from pymongo import Connection Caching connection = Connection('servername', 27017) mongoDB has a built-in cache that runs directly in the cluster without external requirements. Any query is transparently db = connection['people_database'] cached in RAM to expedite data transfer rates and to reduce peopleList = db['people_list'] disk I/O. person = { 'name' : 'Tom', Document Format 'age' : 42 } mongoDB handles documents in BSON format, a binary- DZone, Inc. | www.dzone.com
  • 4.
    4 Getting Started with NoSQL and Data Scalability peopleList.insert(person) • JSON data and program objects storage - many RESTful web services provide JSON data; they can be stored in person = { 'name' : 'Nancy', mongoDB without language serialization overhead 'age' : 69 } (especially when compared against XML documents) peopleList.insert(person) • Content management systems - JSON/BSON objects can # find first entry: represent any kind of document, including those with a person = peopleList.find_one() binary representation # find a specific person: mongoDB Drawbacks person = peopleList.find_one({ 'name' : 'Joe'}) • No JOIN operations - each document is stand-alone if person is None: • Complex queries - some complex queries and indices are print "Joe isn’t here!" else: better suited for SQL print person['age'] • No row-level locking - unsuitable for transactional data # bulk inserts without error prone application-level assistance persons = [{ 'name' : 'Joe' }, {'name' : 'Sue'}] peopleList.insert(persons) If any of these is part of the functional requirements, a SQL database would be better suited for the application. # queries with multiple results for person in peopleList.find(): print person['name'] GIGASPACES XAP for person in peopleList.find({'age' : {'$ge' : 21}}).sort('name'): print person['name'] The GigaSpaces eXtreme Application Platform is a data # count: grid designed to replace traditional application servers. It nDrinkingAge = peopleList.find({'age' : {'$ge' : 21}}).count() operates based on an event-processing model where the application dispatches objects to the processing nodes # indexing associated with a given data partition. The system may be from pymongo import ASCENDING, DESCENDING configured so that data state on the grid may trigger events, or peopleList.create_index([('age', DESCENDING), ('name', ASCENDING)]) an application may dispatch specific commands imperatively. GigaSpaces XAP also manages all threading and execution The PyMongo documentation is available at: aspects of the operation, including thread and connection http://api.mongodb.org/python - guides for other languages pools. GigaSpaces XAP implements Spring transactions and are also available from this web site. auto-recovery. The system detects any failed operations The code in the previous example performs these operations: in a computational node and automatically rolls back the transaction; it then places it back in the space where another • Connect to the database server started in the previous node picks it up to complete processing. section • Attach a database; notice that the database is treated like Application Frameworks an associative array • Get a collection (loosely equivalent to a table in a Java C++ .Net Groovy relational database), treated like an associative array Mule Spring JEE Jetty • Insert one or more entities • Query for one or more entites Although mongoDB treats all these data as BSON internally, most of the APIs allow the use of dictionary-style objects to streamline the development process. XAP Deployment Virtualization XAP Object ID Management A successful insertion into the database results in a valid and XAP Middleware Virtualization Monitoring Object ID. This is the unique identifier in the database for a (Virtualized Clustering Layer) given document. When querying the database, a return value will include this attribute: { "name" : "Tom", "age" : 42, "_id" : ObjectId('999999') } RDBMS Memcache DB mongoDB Users may override this Object ID with any argument as long as it’s unique, or allow mongoDB to assign one automatically. Figure 4 - GigaSpaces XAP Data Grid Common Use Cases • Caching - more robust capabilities, plus persistence, when GigaSpaces provides data persistence, distributed processing, compared against a pure caching system and caching by interfacing with SQL and NoSQL data stores. • High volume processing - RDBMS may be too expensive The GigaSpaces API abstracts all back-end operations (job or slow to run in comparison dispatching, data persistence, and caching) and makes DZone, Inc. | www.dzone.com
  • 5.
    5 Getting Started with NoSQL and Data Scalability it transparent to the application. GigaSpaces XAP may implement distributed computing operations like MapReduce GOOGLE APP ENGINE DATASTORE and run them in its nodes, or it may dispatch them for processing to the underlying subsystem if the functionality The Datastore is the main scalability feature of Google App is available (e.g. mongoDB, Hadoop). The application may Engine applications. It’s not a relational database or a façade implement transactions using the Spring API, the GigaSpaces for one. Datastore is a public API for accessing Google’s XPA transactional facilities, or by implementing a workflow Bigtable high-performance distributed database system. where specific NoSQL stores handle entities or groups of Think of it as a sparse array distributed across multiple servers entities. This must be implemented explicitly in systems like that also allows an infinite number of columns and rows. mongoDB, but may exist in other NoSQL systems like Google Applications may even define new columns “on the fly”. The App Engine’s Datastore. Datastore scales by adding new servers to a cluster; Google provides this functionality without user participation. GigaSpaces XAP Programming Google Your Applications Applications The GigaSpaces XAP API is very rich and covers many aspects beyond NoSQL and data scalability areas like configuration Datastore API 1 Datastore Java (JDO, JPA) Other language Python management, deployment, web services, third-party product Bigtable integration, etc. Master Server (Logical table management, load balancing, garbage collection) The GigaSpaces XAP documentation is at: Tablet Tablet Tablet http://www.gigaspaces.com/wiki/display/XAP71/Programmer%27s+Guide Server 0 Server 1 Server n NoSQL operations may be implemented over these APIs: Google File System • SQLQuery - allows querying a space using a SQL-like FS 0 FS 1 FS 2 FS n syntax and regular expressions; do not confuse it with Figure 6 - Datastore Architecture JDBC support. Datastore operations are defined around entities. Entities • Persistency - mostly supports RDBMSs but may implement can have one-to-many or many-to-many relationships. The other persistency mechanisms through the External Data Datastore assigns unique IDs unless the application specifies Source Components API. a unique key. Datastore also disallows some property names • memcached - support for key/value pair distributed that it uses for housekeeping. The complete Datastore dictionaries available to any client in the grid; entities documentation is available from: are automatically made available across all nodes. The http://code.google.com/appengine/docs/python/datastore/ memcached API is implemented on top of the data grid, and it’s interchangeable with other memcached Did you notice the parallels between Datastore implementations. Hot and mongoDB so far? Many NoSQL database • Task Execution - allows synchronous and asynchronous job Tip implementations have solved similar problems in execution on specific nodes or clusters similar ways. The GigaSpaces XAP API is in a minority of stateful NoSQL Transactions and Entity Groups systems. Most NoSQL systems strive to achieve statelessness Datastore supports transactions. These transactions are only to increase scalability and data consistency. effective on entities that belong to the same entity group. Entities in a given group are guaranteed to be stored on the Common Use Cases same server. • Real-time analytics - dynamic data analysis and reporting Datastore Programming based on data entered into a system less than a minute The programming model is based on inheritance of basic before effective time of use entities, db.Model and db.Expando. Persistent data is • Map/reduce - distributed data processing of large data mapped onto an entity specialization of either of these classes. sets across a computational grid The API provides persistence and querying instance methods • Near-zero downtime - allows for database schema for every entity managed by the Datastore. changes without homebrew master/slave configurations or Programming Example proprietary RDBMS dependencies The Datastore API is simpler than other NoSQL APIs and is GigaSpaces XAP NoSQL Drawbacks highly optimized to work in the App Engine environment. In • Complexity - the server, transactional, and grid model are this example we insert data into the data store, then run a query: more complex than for other NoSQL systems from google.appengine.ext import db • Application server model - the API and components are class Person(db.Model): name = db.StringProperty(required=True) geared toward building applications and transactional age = db.IntegerProperty(require=True) logic person = Person(name = 'Tom', age = 42) • Steeper learning curve person.put() • Higher TCO - brings a requirement of a specialized, person = Person(name = 'Sue', age = 69) well-trained system administration team with higher requirements than other NoSQL systems person.put() DZone, Inc. | www.dzone.com
  • 6.
    6 Getting Started with NoSQL and Data Scalability Common Use Cases # find a specific person • Massive scalability - Datastore offers ultimate scalability query = Person.all() # every entity! by leveraging Google’s own infrastructure for persistent query.filter('age > ', 20) storage query.order('name') • Google App Engine applications - there is no alternative peopleList = query.fetch() # up to 1000 mechanism for data storage on this platform for person in peopleList: • Data rich RESTful Web services - use the Datastore print person.name and App Engine infrastructure to offload traditional data This example performs these operations: centers when stateless, data-intensive web services must • Defines a Person kind and associates it with the Datastore be implemented • Persists new items using the put() method Datastore Drawbacks • Defines and executes a query; notice that the query • Vendor lock-in - persistence and queries are tightly conditions are expressed as strings coupled with the Datastore and the Datastore API is far from being an industry standard Gql - the Google Query Language • Availability - Datastore has been known to fail and the Datastore also allows queries in a custom, SQL-like language. EULA doesn’t allow more than 4-nines SLAs The query in the previous example could be expressed as: • Quotas - Datastore utilization costs per data access and SELECT * FROM Person WHERE age > 20 for processor time ORDER BY name ASC • Query limits - Result sets are limited to return a maximum Gql is useful for writing more expressive queries than those of 1,000 entities, forcing queries be needlessly complex written in the Python or Java APIs. STAYING CURRENT Careful! Python vs. Gql queries have different performance and quota characteristics that may Do you want to know about specific projects and use cases Hot impact your cost or functionality! Refer to the Tip where NoSQL and data scalability are the hot topics? Join the Datastore documentation for a discussion of how scalability newsletter: they differ. http://eugeneciurana.com/scalablesystems ABOUT THE AUTHOR RECOMMENDED BOOK Eugene Ciurana is an open-source evangelist who specializes in the design MongoDB, a cross-platform NoSQL database, is the fastest- and implementation of mission-critical, high-availability large scale systems. growing new database in the world. MongoDB provides a Over the last two years, Eugene designed and built hybrid cloud scalable rich document orientated structure with dynamic queries that systems for leading financial, software, insurance, and healthcare companies you’ll recognize from RDMBS offerings such as MySQL. In other in the US, Japan, and Europe. As chief liaison between Walmart.com Global words, this is a book about a NoSQL database that does not and the ISD Technology Council, he led the official adoption of Linux and require the SQL crowd to re-learn how the database world other open-source technologies at Walmart Stores Information Systems works! Division in 2006. BUY NOW Publications books.dzone.com/books/mongodb • Developing with Google App Engine • DZone Refcard #43: Scalability and High Availability • DZone Refcard #38: SOA Patterns • The Tesla Testament: A Thriller #82 Browse our collection of 100 Free Cheat Sheets Get More Refcardz! Visit refcardz.com CONTENTS INCLUDE: ■ ■ About Cloud Computing Usage Scenarios Getting Started with Aldon Cloud 4Computing ■ Underlying Concepts Cost by... #6 ■ Upcoming Refcardz ® t to you Data Tier Technologies ■ ply. brough Platform Management and more... te. Com ■ bora Chan ge. Colla By Daniel Rubio on: dz. com gratiterns also minimizes the need to make design changes to support CON ABOUT CLOUD COMPUTING one time events. TEN TS INC s Inte nti-PatM. Duvall ■ HTML LUD E: Basics HTML ref car ■ Web applications have always been deployed on servers Automated growth & scalable technologies nuou vs XHT ation ■ Valid ML connected to what is now deemed the ‘cloud’. Having the capability to support one time events, cloud A Apache Ant Useful ontiPatterns and By Pa ■ #84 computing platforms also facilitate the gradual growth curves Open ul Page Source ■ Vis it However, the demands and technology used on such servers Stru C faced by web applications. cture Tools Core Key ■ Elem Structur E: has changed substantially in recent years, especially with al Elem ents INC LUD gration the entrance of service providers like Amazon, Google and ents and Large scale growth scenarios involving specialized equipment NTS rdz ! ous Inte Change HTML CO NTE Microsoft. es more... (e.g. load balancers and clusters) are all but abstracted away by Continu at Every e chang m About ns to isolat relying on a cloud computing platform’s technology. Software i-patter space Hadoop rdz .co ■ n Re fca e Work Build riptio and Ant These companies have Privat deployed webmanage Desc a are in long trol repos itory applications HTM L BAS ■ Patterns Control lop softw n-con to ■ that adapt and scale to large user bases, making them Deve les to a versio ing and ICS In addition, several cloud computing platforms support data ment ize merg rn Version e... Patte it all fi to minim many aspects related tomultiple computing. HTM Manage s and mor e Work knowledgeable in a mainline s cloud tier technologies that L and the precedent set by Relational exceed ■ space Comm ref ca Build XHT re Privat lop on that utilize HTML tice is used ML are the Database Systems (RDBMS): Map Reduce, web service APIs, ■ Deve code lines d Prac a system sitory of work prog support as the grap foundation By An Ge t Mo Buil Repo active are within This Refcard will introduce to you to clouddcomputing, with an units RATION etc. Some platforms rams writ large scale RDBMS deployments. ■ The src dy Ha softw riente e ine loping and Java hical of all INTEG ten attribute task-o softwar emphasis oncodelines providers, so by Commit better understand Mainl Deve these chang Level can es you also recein JavaScri user interfac web develop and the rris Vis it S Spring Security desc INUOU ding as the e code w ww.dzone.com ive data pt. Server-s the ima alt attribute ribes whe Task NT of buil trol what it is a cloud computingaplatform can offer your web line Policy nize sourc es as ut output e in clien ment. T CO cess ion con Code Orga it chang e name e witho likew CLOUD COMPUTING PLATFORMS AND ide languag t-side mechan from ge is describe re the ima the pro ject’s vers sourc applications. and subm ise ABOU (CI) is it with uniqu are from was onc use HTML ism. The web pag Nested unavaila s alte ge file rd z! a pro evel Comm the build softw es like ble. rnate gration build um UNDERLYING CONCEPTS and XHT emergin es and use can be ed to blem Task-L Label activit ies to the bare minim e standard a very loos g Ajax PHP Tags tags text that found, ous Inte committ to a pro USAGE SCENARIOS ate all Autom configurat ion cies to t ization, ely-defi ML as thei tech HTML can is disp Continu ry change cannot be (and freq nden ymen layed tion ned lang r visual eng nologies Re fca ive Build al depe deplo t need eve , a solu , ineffect ) Label manu stalle d tool the same nmen for but as overlap uen if with s (i.e. (i.e. blem ated Build ce pre-in t, use target enviro Amazon EC2: ther stanstandard software and uage with whe Industry dard it has become virtualization HTM ine. b></ , so <a>< tly are) nest pattern lar pro ymen tterns s has a> is ory. Autom Redu d deploEAR) each Pay only what you consume in bec heavily based on very little L Subversion reposit ed via anti-pa particu tions that e nden cies tagge For each (e.g. WAR or t the curr you cho platform isome Amazon’s cloud computing ose more fine. b></ a></ ed insid explain text) and to “fi x” the are solu es more importa e ent stan not lega each othe to writ b> is be duc Web application deployment untillibrari yearsenvironmen similar al Depe ge t a few t ago was that will industry standard software and virtualization technology. app nt, ns to pro Minim packa nden dards e HTM Mo re CI can ticular con used all depe all targe L or XHT arent. Reg the HTM r. Tags i-patter they tend es, but can l, but rity etimes s. Ant Binar most phone services: plans with alloted resources, with an toInteg y alize late fi le that and XHT simplify all will help ardless L VS XHTM <a>< in a par hes som , Centr ts ML, ML physical r othe you prov to ctic temp you b></ proces , in the end bad pra enting nt nmen Mana geme e a single based on incurred cost whether such resources were consumedto thenot. t enviro or Virtualization allows aare actu piece of hardware idebe understand of much HTML L approac ed with the cial, but implem targe ally simp r web cod Creat rily nden cy rties are nt es of the a solid ing has into differe necessa pared to chang efi Depe prope itting ler than ing. Fort foundation utilized by multiple operating systems. This allows resources function job adm been arou associat to be ben er Ge t te builds commo are not late Verifi e comm memory, ality has allocated exclusively to nd for n com Cloud computing asRun remo it’s known etoday has changed this. etc. unately expecte irably, that befor They lts whe Temp Build ually, (e.g. bandwidth,n elem CPU) to be mov they some appear effects. rm a The various resources consumed by webperiodically, Privat contin applications (e.g.nt team Every ent instances. ed to CSS used to be, HTML d. Earl job time. ed resu tion Perfo opme pag individual operating system s because Brow y HTM has expand Whi adverse unintend Integra d Builds sitory bandwidth, memory, CPU) areIntegration on from CI server basis Build to devel common e (HTML . ser ed far le it has don web dev manufacture L had very Stage Repo e tallied a per-unit or XHT produc tinuous e Build rm an extensio .) All are more e its e.com ard ML shar limit tern. ack elopers rs add than Con Refc (starting from zero) by Perfomajor cloud computing platforms. Privat all n. HTM essentia ed man ed layout feedb As a user of Amazon’s EC2 cloud computing platform, you are on es cert result anybody the term” cycle, this ated the pat occur came tion h as autom as they based proc lly plai is Integra al use of ts suc Build Send ors as soon ion with builds an operating system in the same wayain on all hosting assigned essor L files n text as elements The late a lack of stan up with clev y competi supp ort. and test concep ration should tinuous vention ng stan Integ entat in st web dar er wor “build include not be docum Con kar dar the con s to the standar oper rate devel cr While CI to Gene efer ion of DZone, Inc. Cloud Computing the not s on expand ISBN-13: 978-1-934238-75-2 140 Preston Executive Dr. ISBN-10: 1-934238-75-9 Suite 100 50795 Cary, NC 27513 DZone communities deliver over 6 million pages each month to 888.678.0399 919.678.0300 more than 3.3 million software developers, architects and decision Refcardz Feedback Welcome $7.95 makers. DZone offers something for everyone, including news, refcardz@dzone.com tutorials, cheatsheets, blogs, feature articles, source code and more. 9 781934 238752 Sponsorship Opportunities “DZone is a developer’s dream,” says PC Magazine. sales@dzone.com Copyright © 2010 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, Version 1.0 photocopying, or otherwise, without prior written permission of the publisher.