Your SlideShare is downloading. ×
Cloud computing with Amazon Web Services, Part 5: Dataset
processing in the cloud with SimpleDB

Amazon SimpleDB

Amazon S...
•    Number of all items in the domain
   •    Number of all attribute name-value pairs in the domain
   •    Number of un...
Domains per Amazon Web             100
Services account
Attributes                         Name-value pairs per item is 25...
Table 2. Pricing for machine utilization
           Quantity                        Cost
First 25 machine hours         Fr...
To start exploring SDB, you need to sign up for an Amazon Web Services account (see
Resources). See Part 2 of this series ...
Check to make sure everything is set up correctly by starting a Python shell and
importing the boto library, as shown in L...
>>>
>>> len(all_domains)
1
>>>
>>> for d in all_domains:
...     print d.name
...
devworks-dom-1



You can also retrieve ...
Listing 10. Update attributes
>>> my_item['cars']
u'BMW'
>>>
>>> my_item['cars'] = 'Honda'
>>>
>>> my_item['cars']
'Honda'...
To search your structured data, SDB provides a custom query language that contains
attribute name-value pairs associated w...
car2
>>>



The query language provides support for a variety of comparison operators. It lets you
perform range queries a...
Upcoming SlideShare
Loading in...5
×

Cloud Computing With Amazon Web Services, Part 5 Dataset Processing In The Cloud With Simple Db

922

Published on

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
922
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Cloud Computing With Amazon Web Services, Part 5 Dataset Processing In The Cloud With Simple Db"

  1. 1. Cloud computing with Amazon Web Services, Part 5: Dataset processing in the cloud with SimpleDB Amazon SimpleDB Amazon SDB is a fast, scalable real-time dataset indexing and querying framework that makes it easy to store and retrieve structured data for your Amazon Web Services-based applications. It's designed to work well with the other Amazon Web Services, such as Elastic Compute Cloud (EC2) and Simple Storage Service (S3). SDB enables you to build your entire application stack within the Amazon Web Services environment. You pay for the service based entirely upon your usage. There is also a free tier of service available. Some valuable features provided by SDB: Reliability SDB is designed to store your indexed data redundantly across multiple data centers and to make them available at all times. Speed SDB is designed to provide quick retrieval of data, especially if your requests are made from within the Amazon Web Services environment from an EC2 instance. Simplicity The programming model for accessing and using SDB is simple and can be used from a variety of programming languages. Security SDB is designed to provide a high level of security. Access to the data is restricted to authorized users. Flexibility SDB gives you the ability to store data on the fly without any need for pre-defined schemas. Inexpensive SDB charges are quite economical. You are only charged for what you actually use. This rest of this section explores the concepts that underpin SDB. Domains A domain is a container that lets you store your structured data and run queries against it. The data is stored in a domain as items. Conceptually, a domain is similar to a worksheet tab in a spreadsheet; items are rows in the spreadsheet. You can run queries against a domain, but you cannot yet query across domains in the current version of SDB. Each domain has the following metadata associated with it: • Date and time the metadata was last updated
  2. 2. • Number of all items in the domain • Number of all attribute name-value pairs in the domain • Number of unique attribute names in the domain • Total size of all item names in the domain, in bytes • Total size of all attribute values, in bytes • Total size of all unique attribute names, in bytes SDB, like Simple Queue Service (SQS), follows the "eventual consistency" model. SDB maintains multiple copies of each domain for fault tolerance. Every change made to a domain is propagated across all copies. Amazon CTO Werner Vogels discusses the reasoning behind the concept of eventual consistency on his blog. Because this operation sometimes takes a few seconds, depending on system load and network latency, a consumer of your domain may not see the changes immediately. Changes will eventually be propagated throughout SDB, but this delay is an important consideration when designing your SDB-based applications. Items Items represent individual objects within your domains, and they contain attributes with values. Each item is conceptually similar to a row in a spreadsheet — an attribute is a column and the values are cells. Attributes are not restricted to single values and can even have multiple values. SDB automatically indexes your domains regardless of how the data is structured. SDB also has a time limit for executing any single query against your domains. If a query takes longer than 5 seconds, SDB will stop the query and return an error. Domains in SDB are flexible and don't have any fixed schemas. Each item within a domain can contain a unique set of up to 256 attributes. The attributes can even be completely different from all other attributes for the other items within that domain. Limitations The current version of SDB has limitations that you should consider when designing your application. Table 1 shows the limitations (as specified by Amazon in its latest documentation). Table 1. Current limitations Parameter Current restrictions Domain size 10 GB per domain 250,000,000 attribute name-value pairs 3-255 characters (a-z, A-Z, 0-9, '_', '-', and '.')
  3. 3. Domains per Amazon Web 100 Services account Attributes Name-value pairs per item is 256. Name length is 1024 bytes. Value length is 1024 bytes. Only allowed characters are UTF-8 characters that are valid in XML documents. Control characters and any sequences that are not valid in XML are not allowed. Per PutAttributes operation limited to 100 Requested per Select or QueryWithAttributes operation limited to 256. Maximum items in query 256 response Maximum query execution time 5 seconds Maximum predicates per query 10 expression Maximum comparisons per 10 query expression predicate Maximum number of unique 20 attributes per select expression Maximum number of 20 comparisons per select expression Maximum response size for 1 MB QueryWithAttributes and Select Pricing Amazon provides a free tier for SDB, along with pricing for usage above the free tier limit. The charges are based on: • The machine usage of each SDB request. • The amount of machine capacity used for completing the specified request, normalized to the hourly capacity of a 1.7-GHz Xeon processor. Free tier There are no charges on the first 25 machine hours, 1 GB of data transfer, and 1 GB of storage that you consume every month, at least until 1 Jun 2009. This is a significant amount of usage being provided for free for a limited time by Amazon. Many types of applications can operate very easily within this free tier. Table 2 shows example pricing.
  4. 4. Table 2. Pricing for machine utilization Quantity Cost First 25 machine hours Free Additional machine hours $0.14 per machine hour Table 3 addresses the amount of data transferred to and from SDB. There is no charge for data transferred between SDB and other Amazon Web Services within the same region. Data transferred between SDB and other Amazon Web Services across regions will be charged at Internet Data Transfer rates on both sides of the transfer. Table 3. Pricing for data transfer Type of Cost transfer All data First 1 GB of data transfer in is free transfer $0.100 per GB — all additional data transfer in First 1 GB of data transfer out is free $0.170 per GB — first 10 TB/month data transfer out $0.130 per GB — next 40 TB/month data transfer out $0.110 per GB — next 100 TB/month data transfer out $0.100 per GB — data transfer out / month over 150 TB Table 4 outlines costs for structured data storage. Table 4. Structured data storage Amount of Cost storage All data storage First 1GB of data is free. $0.25 per GB /month - all additional data storage For the latest pricing, check Amazon SDB. You can also use the Simple Monthly Calculator provided by Amazon for calculating your monthly usage costs for SDB and the other Amazon Web Services. Getting started with SDB
  5. 5. To start exploring SDB, you need to sign up for an Amazon Web Services account (see Resources). See Part 2 of this series for detailed instructions on signing up for Amazon Web Services. Once you have an Amazon Web Services account, you must enable Amazon SDB service for your account: 1. Log in to your Amazon Web Services account. 2. Navigate to the SDB home page. 3. Click Sign Up For This Web Service on the right side. 4. Provide the requested information and complete the sign-up process. All communication with any of the Amazon Web Services is through either the SOAP interface or the query interface. In this article, you use the query interface via a third- party library to communicate with SDB. You will need to obtain your access keys, which you can access from your Web Services Account information page by selecting View Access Key Identifiers. You are now set up to use Amazon Web Services and have enabled SDB service for your account. Interacting with SDB For this example, you use a third-party open source Python library named boto to become familiar with SDB by running small snippets of code in a Python shell. Install boto and set up your environment Download boto. The latest version, as of the writing of this article, was 1.6b. Unzip the archive to the directory of your choice. Change into this directory and run setup.py to install boto into your local Python environment, as shown in Listing 1. Listing 1. Install boto $ cd directory_where_you_unzipped_boto $ python setup.py install Set up some environment variables to point to the Amazon Web Services access keys. The access keys are available from the Web Services Account information. Listing 2. Set up environment variables # Export variables with your AWS access keys $ export AWS_ACCESS_KEY_ID=Your_AWS_Access_Key_ID $ export AWS_SECRET_ACCESS_KEY=Your_AWS_Secret_Access_Key
  6. 6. Check to make sure everything is set up correctly by starting a Python shell and importing the boto library, as shown in Listing 3. Listing 3. Check the setup $ python Python 2.4.5 (#1, Apr 12 2008, 02:18:19) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import boto >>> Explore SDB with boto Use the SDBConnection class to provide the main interface for the interaction with SDB. You will use boto from the Python console. The example calls different methods on the SDBConnection object and examines the responses returned by SDB, which will help you get familiar with the API while you explore the concepts behind SDB. The first step is to create a connection object to SDB using the Amazon Web Services access keys you exported earlier to your environment. The boto library always checks the environment first to see if these variables are set. If they are set, boto automatically uses them when it creates the connection. Listing 4. Create a connection to SDB >>> import boto >>> sdb_conn = boto.connect_sdb() >>> For the rest of this article, you can use the sdb_conn object, created above, to interact with SDB. You can create new domains by specifying a name for the domain. Listing 5. Create a domain >>> d1 = sdb_conn.create_domain('devworks-dom-1') >>> Retrieve a list of all your domains, which returns a result set object that is essentially a Python list, as shown in Listing 6. You can iterate over this list and access each domain. Listing 6. List all the domains >>> all_domains = sdb_conn.get_all_domains()
  7. 7. >>> >>> len(all_domains) 1 >>> >>> for d in all_domains: ... print d.name ... devworks-dom-1 You can also retrieve a single domain by name. Listing 7. List single domain >>> my_domain = sdb_conn.get_domain('devworks-dom-1') >>> >>> print my_domain.name devworks-dom-1 Newly created domains are, of course, empty until you add items to them. You create a new item within a domain, then add attributes to it. Listing 8. Create new item >>> my_domain = sdb_conn.get_domain('devworks-dom-1') >>> >>> i1 = my_domain.new_item('test_item_1') >>> >>> i1['cars'] = 'BMW' >>> >>> i1['fruits'] = ['apple', 'orange', 'mango'] >>> Items can be retrieved from a domain by specifying the item name, which must be unique. This is similar to the concept of a primary key in a relational database. Listing 9. Retrieve an item and its attributes >>> my_item = my_domain.get_item('test_item_1') >>> >>> print my_item {u'cars': u'BMW', u'fruits': [u'apple', u'mango', u'orange']} >>> The item object returned above is a live Item object that will automatically retrieve all attributes for this item from SDB when you access any of its attributes. Any updates made to the values of the attributes for this item will be saved automatically to SDB.
  8. 8. Listing 10. Update attributes >>> my_item['cars'] u'BMW' >>> >>> my_item['cars'] = 'Honda' >>> >>> my_item['cars'] 'Honda' >>> You can also retrieve items and attributes by using the SDBConnection class and specifying the domain and item names. Listing 11. Retrieve an item using SDBConnection >>> >>> sdb_conn.get_attributes('devworks- dom-1','test_item_1') {u'cars': u'Honda', u'fruits': [u'apple', u'mango', u'orange']} >>> An item is automatically deleted by SDB if it does not have any attributes. You can also specifically delete an item and its attributes. Listing 12. Delete an item and its attributes >>> sdb_conn.get_attributes('devworks- dom-1','test_item_1') {u'cars': u'Honda', u'fruits': [u'apple', u'mango', u'orange']} >>> >>> sdb_conn.delete_attributes('devworks- dom-1','test_item_1') True >>> sdb_conn.get_attributes('devworks- dom-1','test_item_1') {} >>> Listing 13. Delete a domain >>> sdb_conn.delete_domain('devworks-dom-1') True >>> Querying SDB domains
  9. 9. To search your structured data, SDB provides a custom query language that contains attribute name-value pairs associated with items. The basic component when building up a query expression is called a predicate. Each predicate is delineated by a square bracket that surrounds an attribute, a comparison operator, and a value to compare. For example, a predicate (such as ['desc' = 'Hello Devworks']) defines an equality comparison on the attribute desc. Each predicate is evaluated separately and produces a set of item names. You can combine multiple predicates using set operations like union and intersection to build complex queries. When using predicates in your queries, it's important to consider that all predicate comparisons are performed lexicographically by SDB. You must ensure that your data is stored in attributes using the appropriate string representation. Keep in mind that queries taking longer than 5 seconds will be automatically aborted by SDB. Listing 14. Create some test data >>> d2 = sdb_conn.create_domain('devworks-dom-2') >>> >>> i1 = d2.new_item('car1') >>> >>> i1['make']= 'BMW' >>> i1['color']='grey' >>> i1['year']='2008' >>> i1['desc']='Sedan' >>> i1['model']='530i' >>> >>> i2 = d2.new_item('car2') >>> >>> i2['make']= 'BMW' >>> i2['color']='white' >>> i2['year']='2007' >>> i2['desc']='Sports Utility Vehicle' >>> i2['model']='X5' >>> Listing 15. Query with a single predicate >>> rs = d2.query("['make' = 'BMW']") >>> for result in rs: ... print result.name ... car1 car2 >>> Listing 16. Query with multiple predicates >>> rs = d2.query("['make' = 'BMW'] intersection ['year' = '2007']") >>> for result in rs: ... print result.name ...
  10. 10. car2 >>> The query language provides support for a variety of comparison operators. It lets you perform range queries and multi-valued attribute queries. To get a good grasp of all the possibilities and best practices for creating queries and fine-tuning them for best performance, it's highly recommended that you review the introductory articles on the query language provided by Amazon Web Services. You can also retrieve the metadata for a domain that gives you the total number of items in the domain (in addition to other data). Listing 17. Metadata for a domain >>> my_domain = sdb_conn.get_domain('devworks-dom-2') >>> >>> my_metadata = my_domain.get_metadata() >>> >>> print my_metadata.item_count 2 >>> print my_metadata.item_names_size 8 >>> print my_metadata.attr_value_count 10 >>> print my_metadata.attr_names_size 22 >>> print my_metadata.attr_values_size 56 >>> print my_metadata.timestamp 1231798889 >>> Conclusion This article introduced you to Amazon's SDB service. You learned some of the basic concepts and explored some of the functions provided by boto, an open source Python library for interacting with SDB.

×