Your SlideShare is downloading. ×
Amazon SimpleDB
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Amazon SimpleDB

1,994
views

Published on

Presented at PhillyDB …

Presented at PhillyDB
http://www.meetup.com/PhillyDB/events/16478731/

Published in: Technology

1 Comment
0 Likes
Statistics
Notes
  • Good ppt. On the different note - SDB Explorer has been made as an industry leading graphical user interface (GUI) to explore Amazon SimpleDB service. SDB Explorer facilitates core functionality of Amazon’s NoSQL SimpleDB in productive way.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
1,994
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
63
Comments
1
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Atomic - \nEither a operation completes successfully, or fails. \nNo partial writes or updates.\n\nConsistent - \nDatabase provides mechanisms to ensure consistent data. \nIf a transaction fails, the database reverts to the previous consistent state. \nIf columns can refer to other tables, references \nto non-existent rows are not allowed.\n\nIsolation - Concurrent operations operate \non written data, not data that is in the process of being modified.\n\nDurability - One a transaction has completed, \nthe transaction’s changes will survive hardware failure.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Concurrent operations - \nMost of the database operations are just a \nfunction that is operated on each row.\n
  • Concurrent operations - \nMost of the database operations are just a \nfunction that is operated on each row.\n
  • Concurrent operations - \nMost of the database operations are just a \nfunction that is operated on each row.\n
  • Concurrent operations - \nMost of the database operations are just a \nfunction that is operated on each row.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Conditional update a ticket item stored in \nSimpleDB where Attribute named “Reserved” = “False”, set it to “True”\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Since all data is stored as strings, compare on lexicographical ordering.\nZero pad to ensure “10” is greater than “2”.\n\nNegative numbers need to be set with an offset. -5 becomes 5 if the offset is 10.\n\nDates in ISO 8601 for lexicographical comparison.\n
  • Scale as you go. AmazonDB has geographic aware nodes. Integrates well \nwith other Amazon AWS products.\n
  • \n
  • \n
  • \n
  • I’m going to throw out some red meat to the group tonight.\n\n“Nightmare Scenario”\n\nYour database is so screwed up, \nyou might as well just use a Non-Relational Database.\n\nRelational databases don’t do \nsquat if you never bother to use referential integrity.\n\n“Race To the Bottom”\n\nDesigning database systems is “Hard” - \nmuch easier to throw out features that were relied on \n- pretty much everything ACID encompasses.\n\n“Me Too Syndrome” - Can’t swing a dead cat without hitting\na new NonRel system. Possibly indicates that it’s just a fad?\n\nTwitter - Ruby as a Fad. Gave up and went to the Java platform with Scala.\n
  • Schema Versioning:\nWe wouldn’t have a conflict between Rel and NonRel \nif it wasn’t like pulling teeth trying to update a schema\n\nClustering:\nPaying huge bucks for clustering is gone the way of the dodo.\nLook at Google: Commodity hardware and systems designed to share\nnothing.\n\nEnterprise OSes vs. Linux - Came up from behind, took over the \ndatacenter. Commercial UNIX vendors woke up when the dirt got shoveled\nover them.\n
  • \n
  • Transcript

    • 1. Amazon SimpleDB Sean Collins
    • 2. Sean Collins www.coreitpro.comcontact@coreitpro.com
    • 3. Tale of Two Cities• Relational• “Non-Relational”
    • 4. Tale of Two Cities• Relational
    • 5. Relational Model Information Retrieval P. BAXENDALE, Editor A Relational Model of Data for The relational view (or model) of data described in Section 1 appears to be superior in several respects to the Large Shared Data Banks graph or network model [3,4] presently in vogue for non- inferential systems. It provides a means of describing data with its natural structure only-that is, without superim- E. F. CODD posing any additional structure for machine representation IBM Research Laboratory, San Jose, California purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence be- tween programs on the one hand and machine representa- Future users of large data banks must be protected from tion and organization of data on the other. having to know how the data is organized in the machine (the A further advantage of the relational view is that it internal representation). A prompting service which supplies forms a sound basis for treating derivability, redundancy, such information is not a satisfactory solution. Activities of users and consistency of relations-these are discussedin Section at terminals and most application programs should remain 2. The network model, on the other hand, has spawned a unaffected when the internal representation of data is changed number of confusions, not the least of which is mistaking and even when some aspects of the external representation the derivation of connections for the derivation of rela- are changed. Changes in data representation will often be tions (seeremarks in Section 2 on the “connection trap”). needed as a result of changes in query, update, and report Finally, the relational view permits a clearer evaluation traffic and natural growth in the types of stored information. of the scope and logical limitations of present formatted Existing noninferential, formatted data systems provide users data systems, and also the relative merits (from a logical with tree-structured files or slightly more general network standpoint) of competing representations of data within a models of the data. In Section 1, inadequacies of these models single system. Examples of this clearer perspective are are discussed. A model based on n-ary relations, a normal cited in various parts of this paper. Implementations of form for data base relations, and the concept of a universal systems to support the relational model are not discussed. data sublanguage are introduced. In Section 2, certain opera- 1.2. DATA DEPENDENCIES PRESENTSYSTEMS IN tions on relations (other than logical inference) are discussed The provision of data description tables in recently de- and applied to the problems of redundancy and consistency veloped information systems represents a major advance in the user’s model. toward the goal of data independence [5,6,7]. Such tables KEY WORDS AND PHRASES: data bank, data base, data structure, data facilitate changing certain characteristics of the data repre- organization, hierarchies of data, networks of data, relations, derivability, sentation stored in a data bank. However, the variety of redundancy, consistency, composition, join, retrieval language, predicate calculus, security, data integrity data representation characteristics which can be changed CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29 without logically impairing some application programs is still quite limited. Further, the model of data with which users interact is still cluttered with representational prop- erties, particularly in regard to the representation of col- lections of data (as opposed to individual items). Three of the principal kinds of data dependencies which still need 1. Relational Model and Normal Form to be removed are: ordering dependence, indexing depend- ence, and accesspath dependence. In some systems these 1.I. INTR~xJ~TI~N dependencies are not clearly separable from one another. This paper is concerned with the application of ele- 1.2.1. Ordering Dependence. Elements of data in a mentary relation theory to systems which provide shared data bank may be stored in a variety of ways, someinvolv- access large banks of formatted data. Except for a paper to ing no concern for ordering, some permitting each element by Childs [l], the principal application of relations to data to participate in one ordering only, others permitting each systems has been to deductive question-answering systems. element to participate in several orderings. Let us consider Levein and Maron [2] provide numerous referencesto work those existing systems which either require or permit data in this area. elements to be stored in at least one total ordering which is In contrast, the problems treated here are those of data closely associated with the hardware-determined ordering independence-the independence of application programs of addresses.For example, the records of a file concerning and terminal activities from growth in data types and parts might be stored in ascending order by part serial changesin data representation-and certain kinds of data number. Such systems normally permit application pro- inconsistency which are expected to become troublesome grams to assumethat the order of presentation of records even in nondeductive systems. from such a file is identical to (or is a subordering of) the Volume 13 / Number 6 / June, 1970 Communications of the ACM 377
    • 6. • A Relational Model of Data for Large Shared Data Banks• E. F. Codd • IBM Research Laboratory, San Jose, California• CACM June 1970
    • 7. • Data as Relations • “In many commercial, governmental, and scientific data banks ... some of the relations are of quite high degree... Accordingly, we propose that users deal, not with relations which are domain-ordered, but with relationships”
    • 8. Relationships• Customer To Order• Order to Items• And So Forth
    • 9. Relational• Provides SQL interface to developers• ACID • Atomicity • Consistency • Isolation • Durability
    • 10. Tale of Two Cities• “Non-Relational”
    • 11. CAP Theorem
    • 12. CAP Theorem• Consistency• Availability• Partition-tolerance
    • 13. Non-Relational• Less structured • “Schema-less” • Key-value storage • Implement parts of ACID
    • 14. WHY?
    • 15. WHY?• Speed
    • 16. WHY?• Speed• Flexibility
    • 17. WHY?• Speed• Flexibility• Scale
    • 18. Speed
    • 19. Speed• No JOINS
    • 20. Speed• No JOINS• No special column types
    • 21. Speed• No JOINS• No special column types• Concurrent operations
    • 22. Flexibility
    • 23. Flexibility• No table definition• Store whatever you want• Wherever you want• Adjust on the fly
    • 24. Scalability
    • 25. Scalability• Eventual consistency • Writes propagate across nodes • Propagation time is not constant
    • 26. Amazon SimpleDB• Amazon AWS• “Structured Data” Storage• Notable users include Netflix
    • 27. SimpleDB Data Model• Domain • Item • Name • Attributes
    • 28. SimpleDB Data Model• All data stored as Strings
    • 29. SimpleDB FeaturesEventually Consistent Consistent Read Read Stale Reads Possible No Stale Reads Lowest read latency Potential higher read latency Potential lower readHighest read throughput throughput
    • 30. SimpleDB Features• Conditional Transactions • PUT/DELETE • At the Item Level • Based on Item Attributes
    • 31. Using SimpleDB• Operations are issued as HTTP GET requests (REST)• Responses are XML• Supports an SQL-like syntax for fetching items from the domain
    • 32. Using SimpleDB• Supports an SQL-like syntax for fetching items from the domain • SELECT <specification> FROM <domain> WHERE <condition> • Specifications • * (all attributes) • itemName() • count(*) • Specific attributes
    • 33. https://sdb.amazonaws.com/?Action=PutAttributes&Attribute.1.Name=Color&Attribute.1.Value=Blue&Attribute.2.Name=Size&Attribute.2.Value=Med&Attribute.3.Name=Price&Attribute.3.Value=0014.99&Attribute.3.Replace=true&AWSAccessKeyId=[valid access key id]&DomainName=MyDomain&ItemName=Item123&SignatureVersion=2&SignatureMethod=HmacSHA256&Timestamp=2010-01-25T15%3A03%3A05-07%3A00&Version=2009-04-15&Signature=[valid signature]
    • 34. <PutAttributesResponse> <ResponseMetadata> <RequestId>490206ce-8292-456c-a00f-61b335eb202b</RequestId> <BoxUsage>0.0000219907</BoxUsage> </ResponseMetadata></PutAttributesResponse>
    • 35. Case Study• ZINC Database • Commercially available compounds • Virtual Screening • Clean “Drug Like” (#13) • Approx. 3,751,744 compounds
    • 36. Data Model• Item • Name = ZINC_ID • Attributes • Molecular Weight • Charge • SMILES • “Simplified molecular input line entry specification”
    • 37. Boto• Provides a library for accessing Amazon AWS services• Encapsulates SimpleDB data in Python objects • Dictionaries • Iterators • etc..
    • 38. for item in domain.select("SELECT * FROM zinc_13"): print item.name print item.keys() print item.values()
    • 39. Some Tips• Aggregate your operations • <= 25 rows per request• Shard your data across Domains• Handling Numerical Data • Zero Padding • Negative Numbers Offsets • Dates
    • 40. Advantages• Faster development times• (No) Administration• No Hardware!• Scale-as-you-go• Pay-as-you-go
    • 41. Pricing• 1GB Free Storage• $0.25/GB/mo Thereafter• $0.10/GB Transfer In• $.15/GB Out• 25 Machine Hours Free/month• $0.14/hr Thereafter
    • 42. Limitations• Less Features = More Work for the Developer • Dates • Numerical Data • Data Consistency
    • 43. Limits LimitationsFollowing is a table that describes current limits within Amazon SimpleDB. Parameter Restriction Domain size 10 GB per domain Domain size 1 billion attributes per domain Domain name 3-255 characters (a-z, A-Z, 0-9, _, -, and .) Domains per account 100 Attribute name-value pairs per item 256 Attribute name length 1024 bytes Attribute value length 1024 bytes Item name length 1024 bytes Attribute name, attribute value, and item All UTF-8 characters that are valid in XML documents. name allowed characters Control characters and any sequences that are not valid in XML are returned Base64-encoded. For more information, see Working with XML-Restricted Characters . Attributes per PutAttributes operation 256 Attributes requested per Select 256 operation Items per BatchPutAttributesoperation 25 Maximum items in Selectresponse 2500 Maximum query execution time 5 seconds Maximum number of unique attributes 20 per Selectexpression Maximum number of comparisons per 20 Selectexpression Maximum response size for Select 1MB Copyright Information
    • 44. Editorial• NoSQL vs. SQL • Coder vs. Architect• Business Requirements • Time vs. Features• “The Nightmare Scenario”• “Race to the Bottom”• “Me Too Syndrome”
    • 45. Editorial• Relational Databases Need to Catch Up • Meet/Exceed developer expectations • Netflix wouldn’t have fork-lifted ~1 Billion Rows out of Oracle “just for fun”
    • 46. Q&A