Z A H I D M I A N
F E B R U A R Y 2 7 , 2 0 1 1
Amazon SimpleDB
Need for NoSQL
 Avoid Overhead Associated with Traditional RDBMS
 Scale Horizontally (significant) as well as
Vertically
 High Availability
 Simplify data storage and model (make it efficient
for storing and retrieving data)
 Generally a Hash table
Tradeoffs SimpleDB vs. RDBMS
 Simplicity
 Lack of support for joins, views, constraints, transactions,
stored procedures, etc.
 Schema-less, type-less (all values are stored as text)
 Simplified querying language (Select * …)
 No fine-tuning necessary
 Uses Web Services to access data
 BASE implementation instead of ACID
 Key is “Eventual” commits
Tradeoffs SimpleDB vs. RDBMS
 Proprietary Query “language”
 Designed to retrieve Items (not records)
 Basic operations
 Specific operations like CreateDomain, DeleteAttributes,
PutAttributes, etc.
 Storage Structure
 One large Hash table
 Each value is hash, so automatically indexed
 Little or No Infrastructure planning
 Hosted by Amazon
Sample SimpleDB SOAP Message
SimpleDB Object Model
 User Account (One Store per account)
 Domain – equivalent to a Table
 Item – equivalent to a Record
 Attribute – equivalent to a Column
 Value – equivalent to a column value
 Multiple values per attribute are allowed
User Account
(Account/Authentication Info)
Domains Items Attributes
Values
SimpleDB Model
Application Design Considerations
 Normalized vs. Non-Normalized Storage
 Data Caching at the Application level
 Normalized Data
Contacts ContactEmailAdresses
ContactID Name DOB Gender ContactID EmailAddress
1 Adam Smith … M 1 asmith1@...
2 Sarjo T … M 1 asmith2@...
3 Sarah K … F 2 sarjot@...
3 sarah1@...
3 sarah2@...
Application Design Considerations
Non-Normalized Data in SimpleDB
Contacts
ContactID Name DOB Gender EmailAddress
1 Adam Smith … M asmith1@...
asmith2@...
asmith3@...
2 Sarjo T … M sarjot@...
3 Sarah K … F sarah1@...
sarah2@...
Contacts
ContactID Name DOB Gender EmailAddress1 EmailAddress2 EmailAddress3
1 Adam Smith … M asmith1@... asmith2@... asmith3@...
2 Sarjo T … M sarjot@...
3 Sarah K … F sarah1@... sarah2@...
Add Additional Attributes as needed
Null attributes don’t exist
Add Additional Values as needed
Application Design Considerations
 Analytical Processing
 No support for group by or aggregation
 Application must implement appropriate functionality
 Can be costly operation at the data level
 Bulk Operations
 Little support for bulk updates
 At least two trips (one to get the items, the other to send batch
request)
Application Design Considerations
 No Transactional Support
 Application must “mimic” a transaction by guaranteeing
commits
 Support for Consistent Reads (discouraged)
 Constraints
 All constraints (type or data) must be handled by the
Application
Application Design Considerations
 Working With Data/Values
 Value Size Limit of 1024 bytes
 Possibly break into chunks of data
 Lexicographical search creates problems
 Negative Numbers Offset
 Need to use an “offset” number to add to numeric values to
handle negative values
 Zero Padding
 Pad all numbers with leading “0”
 Dates
 Convert all dates to ISO 8601 standard before saving
Hosting Environment
 Challenges to Consider
 Data Privacy
 Legal Requirements
 No Backup Support
 “Lock-in” Factor (can’t migrate from SimpleDB)
 “Open Cash Register” Problem (rogue script/processing can be
costly)
 Difficult to Maintain DB for Application Development
Lifecycle (unit test, dev, test, perf, prod)
Pros of Using SimpleDB
Item Explanation
Infrastructure Amazon hosts the environment, so virtually no cost to get started; no
need for a local datacenter; “pay as you go” for processing
Simplicity Extremely efficient storage and retrieval of data
Flexibility Schema-less; type-less data; easy prototyping
Security data is stored with Amazon and accessible through authenticated
requests only
High Availability BASE implementation provides high availability
Fault-Tolerance Data replicated across multiple nodes; managed by Amazon
Indexing Hash table storage means all data is “automatically” indexed
Cons of Using SimpleDB
Item Explanation
Not RDBMS Not a RDBMS substitute. Lacks features like stored procedures, referential integrity, views,
datatypes, text search, schemas, granular security
Lacks “rich” SQL Rudimentary search operations; cannot group by, aggregate, etc.
SLA Loosely defined SLA;
Joins Joins can be performed at the application layer, but requires multiple operations between
client/server
Limits on data 10 GB Store; 100 domains; 256 values per item; 1,024 bytes per attribute
Limits on Operations 1 MB response size; 2,500 items returned per Select; 5 seconds maximum for operation
Limits on Predicates 20 maximum predicates per Select; cannot reference other attributes of the Item
Hosting No local implementation makes it difficult to develop application (release management,
performance testing, unit testing, etc.); no backup support; privacy issues
Migration Limited options for migrating data
Appropriate Use Cases
Type of
Application
Explanation
Managing Data for Online
Games
User scores and achievement data; User settings or preferences; user-generated
content (comments, feedback, etc.); dynamic game content
Managing Session State Applications like online games, web sites, and batch processes can manage the
state of their process
“static” content Nightly Builds from RDBMS (e.g. pre-configured Sales Per Region data);
Simple Collections Any collections (e.g., urls, contacts, etc.)
Inappropriate Use Cases
Type of Application Explanation
Analytical Processing Applications where data computation is required
on large data
Highly Structured Data Requirements Applications that require constraints and
structures
Data Privacy If data privacy is an issue
Allowing Third-party Extensions Makes it difficult since there is no schema
Data is core-competency When data infrastructure is the core-
competency; when data storage is what gives you
leverage over others

Amazon SimpleDB

  • 1.
    Z A HI D M I A N F E B R U A R Y 2 7 , 2 0 1 1 Amazon SimpleDB
  • 2.
    Need for NoSQL Avoid Overhead Associated with Traditional RDBMS  Scale Horizontally (significant) as well as Vertically  High Availability  Simplify data storage and model (make it efficient for storing and retrieving data)  Generally a Hash table
  • 3.
    Tradeoffs SimpleDB vs.RDBMS  Simplicity  Lack of support for joins, views, constraints, transactions, stored procedures, etc.  Schema-less, type-less (all values are stored as text)  Simplified querying language (Select * …)  No fine-tuning necessary  Uses Web Services to access data  BASE implementation instead of ACID  Key is “Eventual” commits
  • 4.
    Tradeoffs SimpleDB vs.RDBMS  Proprietary Query “language”  Designed to retrieve Items (not records)  Basic operations  Specific operations like CreateDomain, DeleteAttributes, PutAttributes, etc.  Storage Structure  One large Hash table  Each value is hash, so automatically indexed  Little or No Infrastructure planning  Hosted by Amazon
  • 5.
  • 6.
    SimpleDB Object Model User Account (One Store per account)  Domain – equivalent to a Table  Item – equivalent to a Record  Attribute – equivalent to a Column  Value – equivalent to a column value  Multiple values per attribute are allowed
  • 7.
    User Account (Account/Authentication Info) DomainsItems Attributes Values SimpleDB Model
  • 8.
    Application Design Considerations Normalized vs. Non-Normalized Storage  Data Caching at the Application level  Normalized Data Contacts ContactEmailAdresses ContactID Name DOB Gender ContactID EmailAddress 1 Adam Smith … M 1 asmith1@... 2 Sarjo T … M 1 asmith2@... 3 Sarah K … F 2 sarjot@... 3 sarah1@... 3 sarah2@...
  • 9.
    Application Design Considerations Non-NormalizedData in SimpleDB Contacts ContactID Name DOB Gender EmailAddress 1 Adam Smith … M asmith1@... asmith2@... asmith3@... 2 Sarjo T … M sarjot@... 3 Sarah K … F sarah1@... sarah2@... Contacts ContactID Name DOB Gender EmailAddress1 EmailAddress2 EmailAddress3 1 Adam Smith … M asmith1@... asmith2@... asmith3@... 2 Sarjo T … M sarjot@... 3 Sarah K … F sarah1@... sarah2@... Add Additional Attributes as needed Null attributes don’t exist Add Additional Values as needed
  • 10.
    Application Design Considerations Analytical Processing  No support for group by or aggregation  Application must implement appropriate functionality  Can be costly operation at the data level  Bulk Operations  Little support for bulk updates  At least two trips (one to get the items, the other to send batch request)
  • 11.
    Application Design Considerations No Transactional Support  Application must “mimic” a transaction by guaranteeing commits  Support for Consistent Reads (discouraged)  Constraints  All constraints (type or data) must be handled by the Application
  • 12.
    Application Design Considerations Working With Data/Values  Value Size Limit of 1024 bytes  Possibly break into chunks of data  Lexicographical search creates problems  Negative Numbers Offset  Need to use an “offset” number to add to numeric values to handle negative values  Zero Padding  Pad all numbers with leading “0”  Dates  Convert all dates to ISO 8601 standard before saving
  • 13.
    Hosting Environment  Challengesto Consider  Data Privacy  Legal Requirements  No Backup Support  “Lock-in” Factor (can’t migrate from SimpleDB)  “Open Cash Register” Problem (rogue script/processing can be costly)  Difficult to Maintain DB for Application Development Lifecycle (unit test, dev, test, perf, prod)
  • 14.
    Pros of UsingSimpleDB Item Explanation Infrastructure Amazon hosts the environment, so virtually no cost to get started; no need for a local datacenter; “pay as you go” for processing Simplicity Extremely efficient storage and retrieval of data Flexibility Schema-less; type-less data; easy prototyping Security data is stored with Amazon and accessible through authenticated requests only High Availability BASE implementation provides high availability Fault-Tolerance Data replicated across multiple nodes; managed by Amazon Indexing Hash table storage means all data is “automatically” indexed
  • 15.
    Cons of UsingSimpleDB Item Explanation Not RDBMS Not a RDBMS substitute. Lacks features like stored procedures, referential integrity, views, datatypes, text search, schemas, granular security Lacks “rich” SQL Rudimentary search operations; cannot group by, aggregate, etc. SLA Loosely defined SLA; Joins Joins can be performed at the application layer, but requires multiple operations between client/server Limits on data 10 GB Store; 100 domains; 256 values per item; 1,024 bytes per attribute Limits on Operations 1 MB response size; 2,500 items returned per Select; 5 seconds maximum for operation Limits on Predicates 20 maximum predicates per Select; cannot reference other attributes of the Item Hosting No local implementation makes it difficult to develop application (release management, performance testing, unit testing, etc.); no backup support; privacy issues Migration Limited options for migrating data
  • 16.
    Appropriate Use Cases Typeof Application Explanation Managing Data for Online Games User scores and achievement data; User settings or preferences; user-generated content (comments, feedback, etc.); dynamic game content Managing Session State Applications like online games, web sites, and batch processes can manage the state of their process “static” content Nightly Builds from RDBMS (e.g. pre-configured Sales Per Region data); Simple Collections Any collections (e.g., urls, contacts, etc.)
  • 17.
    Inappropriate Use Cases Typeof Application Explanation Analytical Processing Applications where data computation is required on large data Highly Structured Data Requirements Applications that require constraints and structures Data Privacy If data privacy is an issue Allowing Third-party Extensions Makes it difficult since there is no schema Data is core-competency When data infrastructure is the core- competency; when data storage is what gives you leverage over others