SlideShare a Scribd company logo
1 of 51
Download to read offline
Amazon SimpleDB
    Sean Collins
Sean Collins




 www.coreitpro.com
contact@coreitpro.com
Tale of Two Cities



• Relational
• “Non-Relational”
Tale of Two Cities



• Relational
Relational Model
  Information                               Retrieval                                                                                                                                                      P. BAXENDALE,   Editor




  A Relational Model of Data for                                                                                                                                      The relational view (or model) of data described in
                                                                                                                                                                   Section 1 appears to be superior in several respects to the
  Large Shared Data Banks                                                                                                                                          graph or network model [3,4] presently in vogue for non-
                                                                                                                                                                   inferential systems. It provides a means of describing data
                                                                                                                                                                   with its natural structure only-that   is, without superim-
  E. F. CODD                                                                                                                                                       posing any additional structure for machine representation
  IBM Research Laboratory,                                                    San Jose, California                                                                 purposes. Accordingly, it provides a basis for a high level
                                                                                                                                                                   data language which will yield maximal independence be-
                                                                                                                                                                   tween programs on the one hand and machine representa-
  Future             users        of        large               data            banks                must          be       protected                   from       tion and organization of data on the other.
  having             to know               how       the         data            is organized                        in the           machine               (the      A further advantage of the relational view is that it
  internal             representation).                           A           prompting                    service            which             supplies           forms a sound basis for treating derivability, redundancy,
  such        information                   is not         a satisfactory                           solution.            Activities             of users           and consistency of relations-these are discussedin Section
  at      terminals                and            most           application                         programs                  should               remain         2. The network model, on the other hand, has spawned a
  unaffected                  when          the      internal                 representation                         of data              is changed               number of confusions, not the least of which is mistaking
  and         even         when             some           aspects                of          the         external              representation                     the derivation of connections for the derivation of rela-
  are         changed.                 Changes                   in      data            representation                          will        often            be   tions (seeremarks in Section 2 on the “connection trap”).
  needed              as       a result             of       changes                   in query,                   update,              and          report           Finally, the relational view permits a clearer evaluation
  traffic            and       natural             growth                in the           types              of      stored             information.               of the scope and logical limitations of present formatted
        Existing           noninferential,                        formatted                    data              systems             provide            users      data systems, and also the relative merits (from a logical
  with         tree-structured                          files          or       slightly                  more           general                network            standpoint) of competing representations of data within a
  models             of the         data.           In Section                   1, inadequacies                            of these                models         single system. Examples of this clearer perspective are
  are         discussed.               A       model              based                on n-ary                    relations,               a normal               cited in various parts of this paper. Implementations of
  form         for     data            base             relations,               and           the         concept              of      a universal                systems to support the relational model are not discussed.
  data         sublanguage                        are        introduced.                      In Section                 2, certain                 opera-
                                                                                                                                                                      1.2. DATA DEPENDENCIES PRESENTSYSTEMS
                                                                                                                                                                                                     IN
  tions        on      relations               (other             than           logical                 inference)              are         discussed
                                                                                                                                                                      The provision of data description tables in recently de-
  and         applied              to the            problems                    of       redundancy                        and          consistency
                                                                                                                                                                   veloped information systems represents a major advance
  in the        user’s          model.
                                                                                                                                                                   toward the goal of data independence [5,6,7]. Such tables
  KEY       WORDS             AND          PHRASES:                   data       bank,          data         base,        data         structure,           data   facilitate changing certain characteristics of the data repre-
  organization,               hierarchies            of      data,            networks              of     data,        relations,          derivability,          sentation stored in a data bank. However, the variety of
  redundancy,                 consistency,              composition,                  join,          retrieval           language,              predicate
  calculus,       security,         data       integrity
                                                                                                                                                                   data representation characteristics which can be changed
  CR CATEGORIES:                       3.70,         3.73,            3.75,      4.20,         4.22,        4.29                                                   without logically impairing some application programs is
                                                                                                                                                                   still quite limited. Further, the model of data with which
                                                                                                                                                                   users interact is still cluttered with representational prop-
                                                                                                                                                                   erties, particularly in regard to the representation of col-
                                                                                                                                                                   lections of data (as opposed to individual items). Three of
                                                                                                                                                                   the principal kinds of data dependencies which still need
  1.        Relational                       Model                    and         Normal                         Form                                              to be removed are: ordering dependence, indexing depend-
                                                                                                                                                                   ence, and accesspath dependence. In some systems these
     1.I. INTR~xJ~TI~N                                                                                                                                             dependencies are not clearly separable from one another.
     This paper is concerned with the application of ele-                                                                                                             1.2.1. Ordering Dependence. Elements of data in a
  mentary relation theory to systems which provide shared                                                                                                          data bank may be stored in a variety of ways, someinvolv-
  access large banks of formatted data. Except for a paper
         to                                                                                                                                                        ing no concern for ordering, some permitting each element
  by Childs [l], the principal application of relations to data                                                                                                    to participate in one ordering only, others permitting each
  systems has been to deductive question-answering systems.                                                                                                        element to participate in several orderings. Let us consider
  Levein and Maron [2] provide numerous referencesto work                                                                                                          those existing systems which either require or permit data
  in this area.                                                                                                                                                    elements to be stored in at least one total ordering which is
     In contrast, the problems treated here are those of data                                                                                                      closely associated with the hardware-determined ordering
  independence-the independence of application programs                                                                                                            of addresses.For example, the records of a file concerning
  and terminal activities from growth in data types and                                                                                                            parts might be stored in ascending order by part serial
  changesin data representation-and certain kinds of data                                                                                                          number. Such systems normally permit application pro-
  inconsistency which are expected to become troublesome                                                                                                           grams to assumethat the order of presentation of records
  even in nondeductive systems.                                                                                                                                    from such a file is identical to (or is a subordering of) the

  Volume               13 / Number                         6 / June,                     1970                                                                                          Communications     of   the   ACM     377
• A Relational Model of Data for Large
  Shared Data Banks

• E. F. Codd
  • IBM Research Laboratory, San Jose,
    California
• CACM June 1970
• Data as Relations
 • “In many commercial,
    governmental, and scientific data
    banks ... some of the relations are of
    quite high degree... Accordingly, we
    propose that users deal, not with
    relations which are domain-ordered,
    but with relationships”
Relationships


• Customer To Order
• Order to Items
• And So Forth
Relational
• Provides SQL interface to developers
• ACID
  • Atomicity
  • Consistency
  • Isolation
  • Durability
Tale of Two Cities



• “Non-Relational”
CAP Theorem
CAP Theorem


• Consistency
• Availability
• Partition-tolerance
Non-Relational


• Less structured
 • “Schema-less”
 • Key-value storage
 • Implement parts of ACID
WHY?
WHY?


• Speed
WHY?


• Speed
• Flexibility
WHY?


• Speed
• Flexibility
• Scale
Speed
Speed


• No JOINS
Speed


• No JOINS
• No special column types
Speed


• No JOINS
• No special column types
• Concurrent operations
Flexibility
Flexibility


• No table definition
• Store whatever you want
• Wherever you want
• Adjust on the fly
Scalability
Scalability


• Eventual consistency
 • Writes propagate across nodes
 • Propagation time is not constant
Amazon SimpleDB


• Amazon AWS
• “Structured Data” Storage
• Notable users include Netflix
SimpleDB Data Model


• Domain
 • Item
   • Name
   • Attributes
SimpleDB Data Model



• All data stored as Strings
SimpleDB Features

Eventually Consistent
                                Consistent Read
       Read



  Stale Reads Possible           No Stale Reads



  Lowest read latency     Potential higher read latency



                              Potential lower read
Highest read throughput
                                  throughput
SimpleDB Features


• Conditional Transactions
 • PUT/DELETE
 • At the Item Level
 • Based on Item Attributes
Using SimpleDB

• Operations are issued as HTTP GET
  requests (REST)
• Responses are XML
• Supports an SQL-like syntax for
  fetching items from the domain
Using SimpleDB
• Supports an SQL-like syntax for fetching items from the
   domain

  • SELECT <specification> FROM <domain> WHERE
      <condition>

  • Specifications
     • * (all attributes)
     • itemName()
     • count(*)
     • Specific attributes
https://sdb.amazonaws.com/
?Action=PutAttributes
&Attribute.1.Name=Color
&Attribute.1.Value=Blue
&Attribute.2.Name=Size
&Attribute.2.Value=Med
&Attribute.3.Name=Price
&Attribute.3.Value=0014.99
&Attribute.3.Replace=true
&AWSAccessKeyId=[valid access key id]
&DomainName=MyDomain
&ItemName=Item123
&SignatureVersion=2
&SignatureMethod=HmacSHA256
&Timestamp=2010-01-25T15%3A03%3A05-
07%3A00
&Version=2009-04-15
&Signature=[valid signature]
<PutAttributesResponse>
 <ResponseMetadata>
  <RequestId>490206ce-8292-456c-
a00f-61b335eb202b</RequestId>
  <BoxUsage>0.0000219907</BoxUsage>
 </ResponseMetadata>
</PutAttributesResponse>
Case Study

• ZINC Database
 • Commercially available compounds
 • Virtual Screening
 • Clean “Drug Like” (#13)
 • Approx. 3,751,744 compounds
Data Model

• Item

  • Name = ZINC_ID
  • Attributes

    • Molecular Weight
    • Charge

    • SMILES
          • “Simplified molecular input line entry
             specification”
Boto
• Provides a library for accessing
  Amazon AWS services

• Encapsulates SimpleDB data in
  Python objects

  • Dictionaries
  • Iterators
  • etc..
for item in domain.select("SELECT * FROM zinc_13"):

    print item.name

    print item.keys()

    print item.values()
Some Tips
• Aggregate your operations
 • <= 25 rows per request
• Shard your data across Domains
• Handling Numerical Data
 • Zero Padding
 • Negative Numbers Offsets
 • Dates
Advantages

• Faster development times
• (No) Administration
• No Hardware!
• Scale-as-you-go
• Pay-as-you-go
Pricing
• 1GB Free Storage
• $0.25/GB/mo Thereafter
• $0.10/GB Transfer In
• $.15/GB Out
• 25 Machine Hours Free/month
• $0.14/hr Thereafter
Limitations

• Less Features = More Work for the
  Developer
 • Dates
 • Numerical Data
 • Data Consistency
Limits


                     Limitations
Following is a table that describes current limits within Amazon SimpleDB.


  Parameter                                   Restriction

  Domain size                                 10 GB per domain

  Domain size                                 1 billion attributes per domain

  Domain name                                 3-255 characters (a-z, A-Z, 0-9, '_', '-', and '.')

  Domains per account                         100

  Attribute name-value pairs per item         256

  Attribute name length                       1024 bytes

  Attribute value length                      1024 bytes

  Item name length                            1024 bytes

  Attribute name, attribute value, and item   All UTF-8 characters that are valid in XML documents.
  name allowed characters
                                              Control characters and any sequences that are not valid in
                                              XML are returned Base64-encoded. For more information,
                                              see Working with XML-Restricted Characters .

  Attributes per PutAttributes operation      256

  Attributes requested per Select             256
  operation

  Items per BatchPutAttributesoperation       25

  Maximum items in Selectresponse             2500

  Maximum query execution time                5 seconds

  Maximum number of unique attributes         20
  per Selectexpression

  Maximum number of comparisons per           20
  Selectexpression

  Maximum response size for Select            1MB

                                           Copyright Information
Editorial
• NoSQL vs. SQL
 • Coder vs. Architect
• Business Requirements
 • Time vs. Features
• “The Nightmare Scenario”
• “Race to the Bottom”
• “Me Too Syndrome”
Editorial

• Relational Databases Need to Catch Up
 • Meet/Exceed developer
    expectations

 • Netflix wouldn’t have fork-lifted ~1
    Billion Rows out of Oracle “just for
    fun”
Q&A

More Related Content

What's hot

A new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningA new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningAttaporn Ninsuwan
 
SMARCOS CNR Paper Workshop Distributed UI
SMARCOS CNR Paper Workshop Distributed UISMARCOS CNR Paper Workshop Distributed UI
SMARCOS CNR Paper Workshop Distributed UISmarcos Eu
 
16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSNDhaya kanthavel
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureHiroshi Ono
 
Management High-level overview of the OMG Data Distribution Service (DDS)
Management High-level overview of the OMG Data Distribution Service (DDS)Management High-level overview of the OMG Data Distribution Service (DDS)
Management High-level overview of the OMG Data Distribution Service (DDS)Gerardo Pardo-Castellote
 
Ontology Mapping for Dynamic Multiagent Environment
Ontology Mapping for Dynamic Multiagent Environment Ontology Mapping for Dynamic Multiagent Environment
Ontology Mapping for Dynamic Multiagent Environment IJORCS
 
Quality of Service in Publish/Subscribe Middleware
Quality of Service in Publish/Subscribe MiddlewareQuality of Service in Publish/Subscribe Middleware
Quality of Service in Publish/Subscribe MiddlewareAngelo Corsaro
 
Intelligent Bias of Network Structures in the Hierarchical BOA
Intelligent Bias of Network Structures in the Hierarchical BOAIntelligent Bias of Network Structures in the Hierarchical BOA
Intelligent Bias of Network Structures in the Hierarchical BOAMartin Pelikan
 
Design and implementation of a personal super Computer
Design and implementation of a personal super ComputerDesign and implementation of a personal super Computer
Design and implementation of a personal super Computerijcsit
 
Assessing no sql databases for telecom applications
Assessing no sql databases for telecom applicationsAssessing no sql databases for telecom applications
Assessing no sql databases for telecom applicationsJoão Gabriel Lima
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Jobst Landgrebe The HL7 Services Aware Interoperability Framework (SAIF)
Jobst Landgrebe The HL7 Services Aware Interoperability Framework (SAIF)Jobst Landgrebe The HL7 Services Aware Interoperability Framework (SAIF)
Jobst Landgrebe The HL7 Services Aware Interoperability Framework (SAIF)Barry Smith
 

What's hot (15)

A new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningA new study of dss based on neural network and data mining
A new study of dss based on neural network and data mining
 
SMARCOS CNR Paper Workshop Distributed UI
SMARCOS CNR Paper Workshop Distributed UISMARCOS CNR Paper Workshop Distributed UI
SMARCOS CNR Paper Workshop Distributed UI
 
16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
Management High-level overview of the OMG Data Distribution Service (DDS)
Management High-level overview of the OMG Data Distribution Service (DDS)Management High-level overview of the OMG Data Distribution Service (DDS)
Management High-level overview of the OMG Data Distribution Service (DDS)
 
Ontology Mapping for Dynamic Multiagent Environment
Ontology Mapping for Dynamic Multiagent Environment Ontology Mapping for Dynamic Multiagent Environment
Ontology Mapping for Dynamic Multiagent Environment
 
Quality of Service in Publish/Subscribe Middleware
Quality of Service in Publish/Subscribe MiddlewareQuality of Service in Publish/Subscribe Middleware
Quality of Service in Publish/Subscribe Middleware
 
Intelligent Bias of Network Structures in the Hierarchical BOA
Intelligent Bias of Network Structures in the Hierarchical BOAIntelligent Bias of Network Structures in the Hierarchical BOA
Intelligent Bias of Network Structures in the Hierarchical BOA
 
Design and implementation of a personal super Computer
Design and implementation of a personal super ComputerDesign and implementation of a personal super Computer
Design and implementation of a personal super Computer
 
Gem Intelligence Structure
Gem Intelligence StructureGem Intelligence Structure
Gem Intelligence Structure
 
Assessing no sql databases for telecom applications
Assessing no sql databases for telecom applicationsAssessing no sql databases for telecom applications
Assessing no sql databases for telecom applications
 
06542014
0654201406542014
06542014
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Adidrds
AdidrdsAdidrds
Adidrds
 
Jobst Landgrebe The HL7 Services Aware Interoperability Framework (SAIF)
Jobst Landgrebe The HL7 Services Aware Interoperability Framework (SAIF)Jobst Landgrebe The HL7 Services Aware Interoperability Framework (SAIF)
Jobst Landgrebe The HL7 Services Aware Interoperability Framework (SAIF)
 

Similar to Amazon SimpleDB

Varsha ppt 1
Varsha ppt 1Varsha ppt 1
Varsha ppt 1pks9779
 
Data oriented and Process oriented Strategies for Legacy Information Systems ...
Data oriented and Process oriented Strategies for Legacy Information Systems ...Data oriented and Process oriented Strategies for Legacy Information Systems ...
Data oriented and Process oriented Strategies for Legacy Information Systems ...IDES Editor
 
Asif nosql
Asif nosqlAsif nosql
Asif nosqlAsif Ali
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banksSammy Alvarez
 
A history and evaluation of system r
A history and evaluation of system rA history and evaluation of system r
A history and evaluation of system rsugeladi
 
Chapter 2 database environment
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment>. &lt;
 
CBSE XII Database Concepts And MySQL Presentation
CBSE XII Database Concepts And MySQL PresentationCBSE XII Database Concepts And MySQL Presentation
CBSE XII Database Concepts And MySQL PresentationGuru Ji
 
online Record Linkage
online Record Linkageonline Record Linkage
online Record LinkagePriya Pandian
 
Advanced Database Systems CS352Unit 2 Individual Project.docx
Advanced Database Systems CS352Unit 2 Individual Project.docxAdvanced Database Systems CS352Unit 2 Individual Project.docx
Advanced Database Systems CS352Unit 2 Individual Project.docxnettletondevon
 
Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)sones GmbH
 
What Is Super Key In Dbms
What Is Super Key In DbmsWhat Is Super Key In Dbms
What Is Super Key In DbmsTheresa Singh
 
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...IDES Editor
 
E.F. Codd (1970). Evolution of Current Generation Database Tech.docx
E.F. Codd (1970).  Evolution of Current Generation Database Tech.docxE.F. Codd (1970).  Evolution of Current Generation Database Tech.docx
E.F. Codd (1970). Evolution of Current Generation Database Tech.docxjacksnathalie
 

Similar to Amazon SimpleDB (20)

Varsha ppt 1
Varsha ppt 1Varsha ppt 1
Varsha ppt 1
 
Data oriented and Process oriented Strategies for Legacy Information Systems ...
Data oriented and Process oriented Strategies for Legacy Information Systems ...Data oriented and Process oriented Strategies for Legacy Information Systems ...
Data oriented and Process oriented Strategies for Legacy Information Systems ...
 
Asif nosql
Asif nosqlAsif nosql
Asif nosql
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.
 
A history and evaluation of system r
A history and evaluation of system rA history and evaluation of system r
A history and evaluation of system r
 
oracle intro
oracle introoracle intro
oracle intro
 
Info fabric
Info fabricInfo fabric
Info fabric
 
Chapter 2 database environment
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment
 
CBSE XII Database Concepts And MySQL Presentation
CBSE XII Database Concepts And MySQL PresentationCBSE XII Database Concepts And MySQL Presentation
CBSE XII Database Concepts And MySQL Presentation
 
online Record Linkage
online Record Linkageonline Record Linkage
online Record Linkage
 
Advanced Database Systems CS352Unit 2 Individual Project.docx
Advanced Database Systems CS352Unit 2 Individual Project.docxAdvanced Database Systems CS352Unit 2 Individual Project.docx
Advanced Database Systems CS352Unit 2 Individual Project.docx
 
Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)
 
What Is Super Key In Dbms
What Is Super Key In DbmsWhat Is Super Key In Dbms
What Is Super Key In Dbms
 
Ddbms1
Ddbms1Ddbms1
Ddbms1
 
Database management systems
Database management systemsDatabase management systems
Database management systems
 
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
 
E.F. Codd (1970). Evolution of Current Generation Database Tech.docx
E.F. Codd (1970).  Evolution of Current Generation Database Tech.docxE.F. Codd (1970).  Evolution of Current Generation Database Tech.docx
E.F. Codd (1970). Evolution of Current Generation Database Tech.docx
 
View of data DBMS
View of data DBMSView of data DBMS
View of data DBMS
 
Computer Science Dissertation Literature Review Example
Computer Science Dissertation Literature Review ExampleComputer Science Dissertation Literature Review Example
Computer Science Dissertation Literature Review Example
 

Recently uploaded

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionDEEPRAJ PATHAK
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Women in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationWomen in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationDianaGray10
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxKunal Gupta
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 

Recently uploaded (20)

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Why Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile EvolutionWhy Agile? - A handbook behind Agile Evolution
Why Agile? - A handbook behind Agile Evolution
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Women in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automationWomen in Automation 2024: Career session - explore career paths in automation
Women in Automation 2024: Career session - explore career paths in automation
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 

Amazon SimpleDB

  • 1. Amazon SimpleDB Sean Collins
  • 3. Tale of Two Cities • Relational • “Non-Relational”
  • 4. Tale of Two Cities • Relational
  • 5. Relational Model Information Retrieval P. BAXENDALE, Editor A Relational Model of Data for The relational view (or model) of data described in Section 1 appears to be superior in several respects to the Large Shared Data Banks graph or network model [3,4] presently in vogue for non- inferential systems. It provides a means of describing data with its natural structure only-that is, without superim- E. F. CODD posing any additional structure for machine representation IBM Research Laboratory, San Jose, California purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence be- tween programs on the one hand and machine representa- Future users of large data banks must be protected from tion and organization of data on the other. having to know how the data is organized in the machine (the A further advantage of the relational view is that it internal representation). A prompting service which supplies forms a sound basis for treating derivability, redundancy, such information is not a satisfactory solution. Activities of users and consistency of relations-these are discussedin Section at terminals and most application programs should remain 2. The network model, on the other hand, has spawned a unaffected when the internal representation of data is changed number of confusions, not the least of which is mistaking and even when some aspects of the external representation the derivation of connections for the derivation of rela- are changed. Changes in data representation will often be tions (seeremarks in Section 2 on the “connection trap”). needed as a result of changes in query, update, and report Finally, the relational view permits a clearer evaluation traffic and natural growth in the types of stored information. of the scope and logical limitations of present formatted Existing noninferential, formatted data systems provide users data systems, and also the relative merits (from a logical with tree-structured files or slightly more general network standpoint) of competing representations of data within a models of the data. In Section 1, inadequacies of these models single system. Examples of this clearer perspective are are discussed. A model based on n-ary relations, a normal cited in various parts of this paper. Implementations of form for data base relations, and the concept of a universal systems to support the relational model are not discussed. data sublanguage are introduced. In Section 2, certain opera- 1.2. DATA DEPENDENCIES PRESENTSYSTEMS IN tions on relations (other than logical inference) are discussed The provision of data description tables in recently de- and applied to the problems of redundancy and consistency veloped information systems represents a major advance in the user’s model. toward the goal of data independence [5,6,7]. Such tables KEY WORDS AND PHRASES: data bank, data base, data structure, data facilitate changing certain characteristics of the data repre- organization, hierarchies of data, networks of data, relations, derivability, sentation stored in a data bank. However, the variety of redundancy, consistency, composition, join, retrieval language, predicate calculus, security, data integrity data representation characteristics which can be changed CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29 without logically impairing some application programs is still quite limited. Further, the model of data with which users interact is still cluttered with representational prop- erties, particularly in regard to the representation of col- lections of data (as opposed to individual items). Three of the principal kinds of data dependencies which still need 1. Relational Model and Normal Form to be removed are: ordering dependence, indexing depend- ence, and accesspath dependence. In some systems these 1.I. INTR~xJ~TI~N dependencies are not clearly separable from one another. This paper is concerned with the application of ele- 1.2.1. Ordering Dependence. Elements of data in a mentary relation theory to systems which provide shared data bank may be stored in a variety of ways, someinvolv- access large banks of formatted data. Except for a paper to ing no concern for ordering, some permitting each element by Childs [l], the principal application of relations to data to participate in one ordering only, others permitting each systems has been to deductive question-answering systems. element to participate in several orderings. Let us consider Levein and Maron [2] provide numerous referencesto work those existing systems which either require or permit data in this area. elements to be stored in at least one total ordering which is In contrast, the problems treated here are those of data closely associated with the hardware-determined ordering independence-the independence of application programs of addresses.For example, the records of a file concerning and terminal activities from growth in data types and parts might be stored in ascending order by part serial changesin data representation-and certain kinds of data number. Such systems normally permit application pro- inconsistency which are expected to become troublesome grams to assumethat the order of presentation of records even in nondeductive systems. from such a file is identical to (or is a subordering of) the Volume 13 / Number 6 / June, 1970 Communications of the ACM 377
  • 6. • A Relational Model of Data for Large Shared Data Banks • E. F. Codd • IBM Research Laboratory, San Jose, California • CACM June 1970
  • 7. • Data as Relations • “In many commercial, governmental, and scientific data banks ... some of the relations are of quite high degree... Accordingly, we propose that users deal, not with relations which are domain-ordered, but with relationships”
  • 8. Relationships • Customer To Order • Order to Items • And So Forth
  • 9. Relational • Provides SQL interface to developers • ACID • Atomicity • Consistency • Isolation • Durability
  • 10. Tale of Two Cities • “Non-Relational”
  • 12. CAP Theorem • Consistency • Availability • Partition-tolerance
  • 13. Non-Relational • Less structured • “Schema-less” • Key-value storage • Implement parts of ACID
  • 14.
  • 15. WHY?
  • 19.
  • 20. Speed
  • 22. Speed • No JOINS • No special column types
  • 23. Speed • No JOINS • No special column types • Concurrent operations
  • 24.
  • 26. Flexibility • No table definition • Store whatever you want • Wherever you want • Adjust on the fly
  • 27.
  • 29. Scalability • Eventual consistency • Writes propagate across nodes • Propagation time is not constant
  • 30. Amazon SimpleDB • Amazon AWS • “Structured Data” Storage • Notable users include Netflix
  • 31. SimpleDB Data Model • Domain • Item • Name • Attributes
  • 32. SimpleDB Data Model • All data stored as Strings
  • 33. SimpleDB Features Eventually Consistent Consistent Read Read Stale Reads Possible No Stale Reads Lowest read latency Potential higher read latency Potential lower read Highest read throughput throughput
  • 34. SimpleDB Features • Conditional Transactions • PUT/DELETE • At the Item Level • Based on Item Attributes
  • 35. Using SimpleDB • Operations are issued as HTTP GET requests (REST) • Responses are XML • Supports an SQL-like syntax for fetching items from the domain
  • 36. Using SimpleDB • Supports an SQL-like syntax for fetching items from the domain • SELECT <specification> FROM <domain> WHERE <condition> • Specifications • * (all attributes) • itemName() • count(*) • Specific attributes
  • 37. https://sdb.amazonaws.com/ ?Action=PutAttributes &Attribute.1.Name=Color &Attribute.1.Value=Blue &Attribute.2.Name=Size &Attribute.2.Value=Med &Attribute.3.Name=Price &Attribute.3.Value=0014.99 &Attribute.3.Replace=true &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123 &SignatureVersion=2 &SignatureMethod=HmacSHA256 &Timestamp=2010-01-25T15%3A03%3A05- 07%3A00 &Version=2009-04-15 &Signature=[valid signature]
  • 38. <PutAttributesResponse> <ResponseMetadata> <RequestId>490206ce-8292-456c- a00f-61b335eb202b</RequestId> <BoxUsage>0.0000219907</BoxUsage> </ResponseMetadata> </PutAttributesResponse>
  • 39. Case Study • ZINC Database • Commercially available compounds • Virtual Screening • Clean “Drug Like” (#13) • Approx. 3,751,744 compounds
  • 40. Data Model • Item • Name = ZINC_ID • Attributes • Molecular Weight • Charge • SMILES • “Simplified molecular input line entry specification”
  • 41. Boto • Provides a library for accessing Amazon AWS services • Encapsulates SimpleDB data in Python objects • Dictionaries • Iterators • etc..
  • 42. for item in domain.select("SELECT * FROM zinc_13"): print item.name print item.keys() print item.values()
  • 43.
  • 44. Some Tips • Aggregate your operations • <= 25 rows per request • Shard your data across Domains • Handling Numerical Data • Zero Padding • Negative Numbers Offsets • Dates
  • 45. Advantages • Faster development times • (No) Administration • No Hardware! • Scale-as-you-go • Pay-as-you-go
  • 46. Pricing • 1GB Free Storage • $0.25/GB/mo Thereafter • $0.10/GB Transfer In • $.15/GB Out • 25 Machine Hours Free/month • $0.14/hr Thereafter
  • 47. Limitations • Less Features = More Work for the Developer • Dates • Numerical Data • Data Consistency
  • 48. Limits Limitations Following is a table that describes current limits within Amazon SimpleDB. Parameter Restriction Domain size 10 GB per domain Domain size 1 billion attributes per domain Domain name 3-255 characters (a-z, A-Z, 0-9, '_', '-', and '.') Domains per account 100 Attribute name-value pairs per item 256 Attribute name length 1024 bytes Attribute value length 1024 bytes Item name length 1024 bytes Attribute name, attribute value, and item All UTF-8 characters that are valid in XML documents. name allowed characters Control characters and any sequences that are not valid in XML are returned Base64-encoded. For more information, see Working with XML-Restricted Characters . Attributes per PutAttributes operation 256 Attributes requested per Select 256 operation Items per BatchPutAttributesoperation 25 Maximum items in Selectresponse 2500 Maximum query execution time 5 seconds Maximum number of unique attributes 20 per Selectexpression Maximum number of comparisons per 20 Selectexpression Maximum response size for Select 1MB Copyright Information
  • 49. Editorial • NoSQL vs. SQL • Coder vs. Architect • Business Requirements • Time vs. Features • “The Nightmare Scenario” • “Race to the Bottom” • “Me Too Syndrome”
  • 50. Editorial • Relational Databases Need to Catch Up • Meet/Exceed developer expectations • Netflix wouldn’t have fork-lifted ~1 Billion Rows out of Oracle “just for fun”
  • 51. Q&A

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. Atomic - \nEither a operation completes successfully, or fails. \nNo partial writes or updates.\n\nConsistent - \nDatabase provides mechanisms to ensure consistent data. \nIf a transaction fails, the database reverts to the previous consistent state. \nIf columns can refer to other tables, references \nto non-existent rows are not allowed.\n\nIsolation - Concurrent operations operate \non written data, not data that is in the process of being modified.\n\nDurability - One a transaction has completed, \nthe transaction&amp;#x2019;s changes will survive hardware failure.\n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. Concurrent operations - \nMost of the database operations are just a \nfunction that is operated on each row.\n
  19. Concurrent operations - \nMost of the database operations are just a \nfunction that is operated on each row.\n
  20. Concurrent operations - \nMost of the database operations are just a \nfunction that is operated on each row.\n
  21. Concurrent operations - \nMost of the database operations are just a \nfunction that is operated on each row.\n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. Conditional update a ticket item stored in \nSimpleDB where Attribute named &amp;#x201C;Reserved&amp;#x201D; = &amp;#x201C;False&amp;#x201D;, set it to &amp;#x201C;True&amp;#x201D;\n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. Since all data is stored as strings, compare on lexicographical ordering.\nZero pad to ensure &amp;#x201C;10&amp;#x201D; is greater than &amp;#x201C;2&amp;#x201D;.\n\nNegative numbers need to be set with an offset. -5 becomes 5 if the offset is 10.\n\nDates in ISO 8601 for lexicographical comparison.\n
  41. Scale as you go. AmazonDB has geographic aware nodes. Integrates well \nwith other Amazon AWS products.\n
  42. \n
  43. \n
  44. \n
  45. I&amp;#x2019;m going to throw out some red meat to the group tonight.\n\n&amp;#x201C;Nightmare Scenario&amp;#x201D;\n\nYour database is so screwed up, \nyou might as well just use a Non-Relational Database.\n\nRelational databases don&amp;#x2019;t do \nsquat if you never bother to use referential integrity.\n\n&amp;#x201C;Race To the Bottom&amp;#x201D;\n\nDesigning database systems is &amp;#x201C;Hard&amp;#x201D; - \nmuch easier to throw out features that were relied on \n- pretty much everything ACID encompasses.\n\n&amp;#x201C;Me Too Syndrome&amp;#x201D; - Can&amp;#x2019;t swing a dead cat without hitting\na new NonRel system. Possibly indicates that it&amp;#x2019;s just a fad?\n\nTwitter - Ruby as a Fad. Gave up and went to the Java platform with Scala.\n
  46. Schema Versioning:\nWe wouldn&amp;#x2019;t have a conflict between Rel and NonRel \nif it wasn&amp;#x2019;t like pulling teeth trying to update a schema\n\nClustering:\nPaying huge bucks for clustering is gone the way of the dodo.\nLook at Google: Commodity hardware and systems designed to share\nnothing.\n\nEnterprise OSes vs. Linux - Came up from behind, took over the \ndatacenter. Commercial UNIX vendors woke up when the dirt got shoveled\nover them.\n
  47. \n