The CIOs Guide to NoSQL
 

The CIOs Guide to NoSQL

on

  • 1,923 views

 

Statistics

Views

Total Views
1,923
Views on SlideShare
1,890
Embed Views
33

Actions

Likes
0
Downloads
44
Comments
0

3 Embeds 33

http://www.dataversity.net 30
http://twitter.com 2
https://si0.twimg.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The CIOs Guide to NoSQL The CIOs Guide to NoSQL Presentation Transcript

    • The CIO's Guide to NoSQL
      Dan McCreary
      July 2011
      Version 5
    • Agenda
      Historical Context
      The Business Case for NoSQL
      Terminology
      How NoSQL is Different
      Key NoSQL Products
      Call to Action: The NoSQL Pilot Project
      The Future of NoSQL
      Copyright Kelly-McCreary & Associates, LLC
      2
    • Background for Dan McCreary
      Bell Labs
      NeXT Computer (Steve Jobs)
      Owner of Custom Object-Oriented Software Consultancy
      Federal data integration (National Information Exchange Model)
      Native XML/XQuery – 2006
      Advocate of NoSQL/XRX systems
      Copyright Kelly-McCreary & Associates, LLC
      3
    • NoSQL Training Areas
      Copyright Kelly-McCreary & Associates, LLC
      4
      Track
      Course
      You Are
      Here
      The CIO's
      Guide to
      NoSQL
      Managers
      Project Manager's
      Guide to NoSQL
      Transitioning
      to NoSQL
      Architectural
      Tradeoff Modeling
      Architects/Project Managers
      XQuery
      MapReduce
      Hadoop
      Functional
      Programming
      Developer
    • Sample of NoSQL Jargon
      Document orientation
      Schema free
      MapReduce
      Horizontal scaling
      Sharding and auto-sharding
      Brewer's CAP Theorem
      Consistency
      Reliability
      Partition tolerance
      Single-point-of-failure
      Object-Relational mapping
      Key-value stores
      Column stores
      Document-stores
      Memcached
      5
      Copyright Kelly-McCreary & Associates, LLC
      Indexing
      B-Tree
      Configurable durability
      Documents for archives
      Functional programming
      Document Transformation
      Document Indexing and Search
      Alternate Query Languages
      Aggregates
      OLAP
      XQuery
      MDX
      RDF
      SPARQL
      Architecture Tradeoff Modeling
      ATAM
      Note that within the context of NoSQL many of these terms have different meanings!
    • Selecting a Database…
      "Selecting the right data storage solution is no longer a trivial task."
      Copyright Kelly-McCreary & Associates, LLC
      6
      Does it look like document?
      Use Microsoft
      Office
      Yes
      Start
      No
      Use theRDBMS
      Stop
    • Pressures on SQL Only Systems
      Copyright Kelly-McCreary & Associates, LLC
      7
      Scalability
      Large Data Sets
      Reliability
      SQL
      Social Networks
      OLAP/BI/DataWarehouse
      Linked Data
      Document-Data
      Agile
      Schema Free
    • Simplicity is a Virtue
      Many systems derive their strength by dramatically limiting the features in their system
      Simplicity allows database designers to focus on the primary business driver
      Examples:
      Touch screen interfaces
      Key/Value data stores
      Copyright Kelly-McCreary & Associates, LLC
      8
    • Historical Context
      Mainframe Era
      Commodity Processors
      1 CPU
      COBOL and FORTRAN
      Punchcards and flat files
      $10,000 per CPU hour
      10,000 CPUs
      Functional programming
      MapReduce "farms"
      Pennies per CPU hour
      Copyright Kelly-McCreary & Associates, LLC
      9
    • Two Approaches to Computation
      Copyright 2010 Dan McCreary & Associates
      1930s and 40s
      Alonzo Church
      John Von Neumann
      Manage state with a program counter.
      Make computations act like math functions.
      Which is simpler? Which is cheaper? Which will scale to 10,000 CPUs?
      10
    • Standard vs. MapReduce Prices
      Copyright Kelly-McCreary & Associates, LLC
      11
      John's Way
      Alonzo's Way
      http://aws.amazon.com/elasticmapreduce/#pricing
    • MapReduce CPUs Cost Less!
      Copyright Kelly-McCreary & Associates, LLC
      12
      82% Cost
      Reduction!
      Cuts cost from 32 to 6 cents per CPU hour!
      Perhaps Alanzo was right!
      Why? (hint: how "shareable" is this process)
      http://aws.amazon.com/elasticmapreduce/#pricing
    • Perspectives
      Kelly-McCreary & Associates, LLC
      13
      Object
      Stores
      OLAP
      MDX
      Native XML
      NoSQL for
      Web 2.0
      and
      BigData
      Graph
      Stores
      Perspective depends on your context
    • Architectural Tradeoffs
      Kelly-McCreary & Associates, LLC
      14
      "I want a fast car with good mileage."
      "I want a scaleable database with low cost that runs well on the 1,000 CPUs in our data center."
    • Recent History
      The term NoSQL became re-popularized around 2009
      Used for conferences of advocates of non-relational databases
      Became a contagious idea "meme"
      First of many "NoSQL meetups" in San Francisco organized by Jon Oskarsson
      Conversion from "No SQL" to "Not Only SQL" in recent year
      15
      Kelly-McCreary & Associates, LLC
    • NoSQL on Google Trends
      16
      Kelly-McCreary & Associates, LLC
    • NoSQL and Web 2.0 Startups
      Many web 2.0 startups did not use Oracle or MySQL
      They built their own data stores influenced by Amazon’s Dynamo and Google’s BigTable in order to store and process huge amounts of data
      In the social community or cloud computing applications, most of these data stores became OpenSource software
      17
      Kelly-McCreary & Associates, LLC
    • Google MapReduce
      2004 paper that had huge impact of functional programming in the entire community
      Copied by many organizations, including Yahoo
      Copyright Kelly-McCreary & Associates, LLC
      18
    • Google Bigtable Paper
      2006 paper that gave focus to scaleable databases
      designed to reliably scale to petabytes of
      data and thousands of machines
      Copyright Kelly-McCreary & Associates, LLC
      19
    • Amazon's Dynamo Paper
      Werner Vogels
      CTO - Amazon.com
      October 2, 2007
      Used to power Amazon's S3 service
      One of the most influential papers in the NoSQL movement
      Copyright Kelly-McCreary & Associates, LLC
      20
      Giuseppe DeCandia, DenizHastorun, MadanJampani, GunavardhanKakulapati, AvinashLakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazon's Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
    • NoSQL "Meetups"
      “NoSQLerscame to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.”
      21
      Kelly-McCreary & Associates, LLC
      Computerworld magazine, July 1st, 2009
    • Key Motivators
      Licensing RDBMS on multiple CPUs
      The Thee "V"s
      Velocity – lots of data arriving fast
      Volume – web-scale BigData
      Variability – many exceptions
      Desire to escape rigid schema design
      Avoidance of complex Object-Relational Mapping (the "Vietnam" of computer science)
      22
      Kelly-McCreary & Associates, LLC
    • Copyright 2008 Dan McCreary & Associates
      The constraints of yesterday…
      Challenge:
      Ask ourselves the question…
      Do our current method of solving problems with tabular data…
      Reflect the storage of the 1950s…
      Or our actual business requirements?
      What structures best solve the actual business problem?
      23
      Many Processes Today Are Driven By…
    • Copyright 2008 Dan McCreary & Associates
      No-Shredding!
      My
      Data
      Relational databases take a single hierarchical document and shred it into many pieces so it will fit in tabular structures
      Document stores prevent this shredding
      24
    • Copyright 2008 Dan McCreary & Associates
      Is Shredding Really Necessary?
      Every time you take hierarchical data and put it into a traditional database you have to put repeating groups in separate tables and use SQL “joins” to reassemble the data
      25
    • Object Relational Mapping
      T2
      T1
      T3
      T4
      Relational
      Database
      Object Middle
      Tier
      Web Browser
      T1 – HTML into Objects
      T2 –Objects into SQL Tables
      T3 – Tables into Objects
      T4 – Objects into HTML
      26
      Kelly-McCreary & Associates, LLC
    • "The Vietnam of Applications"
      Object-relational mapping has become one of the most complex components of building applications today
      A "Quagmire" where many projects get lost
      Many "heroic efforts" have been made to solve the problem:
      Hibernate
      Ruby on Rails
      But sometimes the way to avoid complexity is to keep your architecture very simple
      Copyright Kelly-McCreary & Associates, LLC
      27
    • Document Stores Need No Translation
      Copyright 2010 Dan McCreary & Associates
      Document
      Document
      Application Layer
      Database
      Documents in the database
      Documents in the application
      No object middle tier
      No "shredding"
      No reassembly
      Simple!
      28
    • Zero Translation (XML)
      Copyright 2010 Dan McCreary & Associates
      REST-Interfaces
      XForms
      XML database
      Web Browser
      XML lives in the web browser (XForms)
      REST interfaces
      XML in the database (Native XML, XQuery)
      XRX Web Application Architecture
      No translation!
      29
    • "Schema Free"
      Systems that automatically determine how to index data as the data is loaded into the database
      No a prioriknowledge of data structure
      No need for up-front logical data modeling
      …but some modeling is still critical
      Adding new data elements or changing data elements is not disruptive
      Searching millions of records still has sub-second response time
      30
      Copyright 2010 Dan McCreary & Associates
    • Monoculture and Mono-architecture
      Image Source: Wikipedia
      31
      Copyright 2010 Dan McCreary & Associates
    • Eric Evans
      “The whole point of seeking alternatives [to RDBMS systems] is that you need to solve a problem that relational databases are a bad fit for.”
      Eric Evans
      Rackspace
      32
      Kelly-McCreary & Associates, LLC
    • Evolution of Ideas in OpenSource
      Copyright Kelly-McCreary & Associates, LLC
      33
      New Products
      New Database Ideas
      Proprietary Software
      Product A
      OpenSource
      Schema-free
      Product B
      Product B
      MapReduce
      Auto-sharding
      Cloud Computing
      How quickly can new ideas be recombined into new database products?
      OpenSource software has proved to be the most efficient way to quickly recombine new ideas into new products
    • 34
      Copyright 2010 Dan McCreary & Associates
      Storage Architectural Patterns
      Tables
      Trees
      Stars
      Triples
    • Finding the Right Match
      Schema-Free
      Standards Compliant
      Mature Query Language
      Use CMU's Architectural Tradeoff and Modeling (ATAM) Process
      35
      Copyright 2010 Dan McCreary & Associates
    • Brewer's CAP Theorem
      Consistency
      You can not have all three so pick two!
      Availability
      Partition Tolerance
      36
      Kelly-McCreary & Associates, LLC
    • Avoidance of Unneeded Complexity
      Relational databases provide a variety of features to ALWAYS support strict data consistency
      Rich feature set and the ACID properties implemented by RDBMSs might be more than necessary for particular applications and use cases
      37
      Kelly-McCreary & Associates, LLC
    • High Throughput
      Some NoSQL databases provide a significantly higher data throughput than traditional RDBMS
      Hypertable which pursues Google’s Bigtable approach allows the local search engine Zvent to store one billion data cells per day
      Google is able to process 20 petabytesa day stored in BigTable via it’s MapReduce approach
      38
      Kelly-McCreary & Associates, LLC
    • Complexity and Cost of Settingup Database Clusters
      NoSQL databases are designedin a way that “PC clusters can be easily and cheaply expanded without the complexity and cost of ’sharding,’ which involves cutting up databases into multiple tables to run on large clusters or grids”.
      Nati Shalom, CTO and founder of GigaSpaces
      39
      Kelly-McCreary & Associates, LLC
    • Compromising Reliability for Better Performance
      Shalom argues that there are “different scenarios where applications would be willing to compromise reliability for better performance.”
      Performance over reliability
      Example: HTTP session data example
      “needs to be shared between various web servers but since the data is transient in nature (it goes away when the user logs off) there is no need to store it in persistent storage.”
      40
      Kelly-McCreary & Associates, LLC
    • "Once Size Fits…"
      "One Size Does Not Fit All"
      James Hamilton Nov. 3rd, 2009
      Kelly-McCreary & Associates, LLC
      41
      http://perspectives.mvdirona.com/CommentView,guid,afe46691-a293-4f9a-8900-5688a597726a.aspx
    • Different Thinking
      Sequential Processing
      Parallel Processing
      The output of any step can be used in the next step
      State must be carefully managed
      Each loop of XQuery FLOWR statements are independent thread (no side-effects)
      42
      Kelly-McCreary & Associates, LLC
    • Cloud Computing
      High scalability
      Especially in the horizontal direction (multi CPUs)
      Low administration overhead
      Simple web page administration
      43
      Kelly-McCreary & Associates, LLC
    • Databases work well in the cloud
      Data warehousing specific databases for batch data processing and map/reduce operations
      Simple, scalable and fast key/value-stores
      Databases containing a richer feature set than key/value-stores fitting the gap with traditional
      RDBMS while offering good performance and scalability properties (such as document databases).
      44
      Kelly-McCreary & Associates, LLC
    • Auto-Sharding
      When one database gets almost full it tells a "coordinator" system and the data automatically gets migrated to other systems
      Copyright Kelly-McCreary & Associates, LLC
      45
      After
      45% full
      Before
      90% full
      45% full
    • Scale Up vs. Scale Out
      Scale Up
      Scale Out
      Make Many CPUs work together
      Learn how to divide your problems into independent threads
      Make a single CPU as fast as possible
      Increase clock speed
      Add RAM
      Make disk I/O go faster
      Copyright Kelly-McCreary & Associates, LLC
      46
    • Functional Programming
      What does it mean to your IT staff?
      What experience do they have in functional programming?
      Can they "unlearn" the habits of the procedural world?
      Copyright Kelly-McCreary & Associates, LLC
      47
    • The NO-SQL Universe
      Copyright 2010 Dan McCreary & Associates
      Document Stores
      Key-Value Stores
      XML
      Graph Stores
      Object Stores
      Column Stores
      48
    • Key Value Stores
      A table with two columns and a simple interface
      Add a key-value
      For this key, give me the value
      Delete a key
      Blazingly fast and easy to scale
      Copyright Kelly-McCreary & Associates, LLC
      49
      Key
      Value
    • Types of Key-Value Stores
      Eventually‐consistent Key‐Value store
      Hierarchical Key-Value Stores
      Key-Value Stores In RAM
      Key Value Stores on Disk
      Ordered Key-Value Stores
      Copyright Kelly-McCreary & Associates, LLC
      50
    • Cassendra
      Apache open source project
      Originally developed by Facebook
      Designed for highly distributed high-reliable systems
      No single point of failure
      Column-family data model
      Copyright Kelly-McCreary & Associates, LLC
      51
      http://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf
    • Voldomort
      A distributed key-value system
      Used at LinkedIn
      10K-20K node operations/CPU
      Auto-sharding
      Graceful server failure handling
      Copyright Kelly-McCreary & Associates, LLC
      52
    • MongoDB
      Open Source License
      Document/Collection centric
      Sharding built-in, automatic
      Stores data in JSON format
      Query language is JSON
      Can be 10x faster than MySQL
      Many languages (C++, JavaScript, Java, Perl, Python etc.)
      Copyright Kelly-McCreary & Associates, LLC
      53
    • Hadoop/Hbase
      Open source implementation of MapReduce algorithm written in Java
      Initially created by Yahoo
      300 person-years development
      Column-oriented data store
      Java interface
      Hbase designed specifically to work with Hadoop
      Copyright Kelly-McCreary & Associates, LLC
      54
    • CouchDB
      Apache Document Store
      Written in ERLANG
      RESTful JSON API
      Distributed, featuring robust, incremental replication with bi-directional conflict detection and management
      Copyright Kelly-McCreary & Associates, LLC
      55
    • Memcached
      Free & open source in-memory caching system
      Designed to speeding up dynamic web applications by alleviating database load
      RAM resident key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering
      Simple interface
      Designed for quick deployment, ease of development
      APIs in many languages
      Copyright Kelly-McCreary & Associates, LLC
      56
    • MarkLogic
      Native XML database designed to used by Petabyte data stores
      ACID compliant
      Heavy use by federal agencies, document publishers and "high-variability" data
      Arguably the most successful NoSQL company
      Copyright Kelly-McCreary & Associates, LLC
      57
    • eXist
      OpenSource native XML database
      Strong support for XQuery and XQuery extensions
      Heavily used by the Text Encoding Initiative (TEI) community and XRX/XForms communities
      Ideal for metadata management
      Integrated Lucene search and structured search
      Copyright Kelly-McCreary & Associates, LLC
      58
    • Riak
      Community and Commercial licenses
      A "Dynamo-inspired" database
      Written in ERLANG
      Query JSON or ERLANG
      Copyright Kelly-McCreary & Associates, LLC
      59
    • Hypertable
      Open Source
      Closely modeled after Google's Bigtable project
      High performance distributed data storage system
      Designed to support applications requiring maximum performance, scalability, and reliability
      Hypertable Query Language (HQL) that is syntactically similar to SQL
      Copyright Kelly-McCreary & Associates, LLC
      60
    • Selecting a NoSQL Pilot Project
      The "Goldilocks Pilot Project Strategy"
      Not to big, not to small, just the right size
      Duration
      Sponsorship
      Importance
      Skills
      Mentorship
      61
      Copyright 2010 Dan McCreary & Associates
    • The Future of the NoSQL Movement
      Will data sets continue to grow at exponential rates?
      Will new system options become more diverse?
      Will new markets have different demands?
      Will some ideas be "absorbed" into existing RDBMS vendors products?
      Will the NoSQL community continue to be the place where new database ideas and products are incubated?
      Will the job of doing high-quality architectural tradeoffs analysis become easier?
      Copyright Kelly-McCreary & Associates, LLC
      62
      Growth
      Diversity
    • Using the Wrong Architecture
      Start
      Finish
      Credit: Isaac Homelund – MN Office of the Revisor
    • Using the Right Architecture
      Finish
      Start
      Find ways to remove barriers to empowering
      the non programmers on your team.
    • Questions
      Dan McCreary
      President, Kelly-McCreary & Associates
      dan@danmccreary.com
      65
      Kelly-McCreary & Associates, LLC