North Bay Ruby Meetup 101911

474 views

Published on

North Bay Ruby Meetup slides. Resources: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis, http://nosql.mypopescu.com/

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • Resources: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis, http://nosql.mypopescu.com/, http://dbmsmusings.blogspot.com/, http://nosqltapes.com/
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
474
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • North Bay Ruby Meetup 101911

    1. 1. Devs, Ops, & Data A gentle introduction to the world of DataEngine Yard
    2. 2. Agenda•Context•RDBMs •Do’s and Don’ts•NoSQL •Survey•General Advice•QuestionsEngine Yard 2
    3. 3. I work with Data!• Data Engineer @ Engine Yard• Organizer of the DFW Big My Team Data Group (@dfwbigdata) is Hiring!• Previous life • Sr Web Developer • Data Architect • Student: MS in CSCI & MS in Info Mgmt from WashU STLEngine Yard 3
    4. 4. The Universe Relational NoSQL/CoSQL “The rest” VS VSEngine Yard 4
    5. 5. Relational World• ACID Properties - Atomicity • Either all of a transaction’s actions are committed or none are - Consistency • Any transaction the database performs will take it from one consistent state to another - Isolation • Operations cannot access data that has been modified during a transaction that has not yet completed - Durability • DBMS recover the committed transaction updates against any kind of system failure (hardware or software)Engine Yard 6
    6. 6. Relational World• Relational Model - How data should be formatted (Normalized)• Unified Language for Querying - SQL• Data - Tabular, structured, relatively centralized• Scale vertically - Go up until no more - Sharding goes against the model!• Established theory & algorithmsEngine Yard 7
    7. 7. RDBMS Performance• Size of your data matters - Migrations - Schema changes• Hardware matters - Disk IO - RAM• Restores are not Magic• Debug your queries - explain() - slow query logs• Indexes!Engine Yard 8
    8. 8. A bit of context• CAP - Consistency • All clients consistent view of data - Availability • Clients have access to read & write data - Partition Tolerance • System won’t fail if individual nodes can’t communicate• Horizontal Scaling• Data interaction is DB-specific• Data - Unstructured - Large quantitiesEngine Yard 10
    9. 9. A bit of context Data Model Column- C Key/Value Document Graph Oriented o n s Single Membase, i Master MongoDB Neo4j Redis* s t e n Multi- Cassandra, c Master/ Riak CouchDB HBase, Dynamo Hypertable yEngine Yard 11
    10. 10. A bit of context http://blog.nahurst.com/visual-guide-to-nosql-systemsEngine Yard 12
    11. 11. Survey of NoSQL Stores• Disk-backed in-memory database• Datatype Server (awesome!)• Has notion of transactions• Pros: - Blazing fast, easy to set up• Cons: - May not be best for large databases• Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory). - Stock prices. Analytics. Real-time data collection. Real-time communicationEngine Yard 13
    12. 12. Survey of NoSQL Stores• Document Oriented DB (Erlang)• Flexible replication (MM, MS)• Pros: - DB consistency, ease of use - MVCC - write operations do not block reads• Cons: - Needs compacting from time to time• Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important. - CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi- site deployments.Engine Yard 14
    13. 13. Survey of NoSQL Stores• Dynamo-based key/value store• Pros: - Fault tolerance - Distributed - Scalable• Cons: - Learning curve is a bit steep - multi-site replication in commercial version only• Best used: If you want something Cassandra- like (Dynamo-like) but simpler. - Point-of-sales data collection. Factory control systems. Places where even seconds of downtime hurt.Engine Yard 15
    14. 14. Survey of NoSQL Stores• Document Oriented DB (binary JSON)• Memory Mapped Files, Schema-less• Pros: - Easy to get started - 2 Ruby ORMS (MongoId, MongoMapper)• Cons: - Cluster reconfiguration tricky• Best used: If you need dynamic queries. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks. - For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.Engine Yard 16
    15. 15. Survey of NoSQL Stores• Column-based DB - Facebook• Pros: - Best of BigTable (column families) and Dynamo - Querying by column, range of keys• Cons: - Bloat and complexity (Java)• Best used: When you write more than you read (logging). If every component of the system must be in Java. - Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis.Engine Yard 17
    16. 16. Survey of NoSQL Stores• Graph DB written in Java• Pros: - Native way to describe relationships - Advanced path-finding with multiple algorithms• Cons: -?• Best used: For graph-style data. Neo4j is quite different from the others in this sense. - Social relations, public transport links, road maps, network topologies.Engine Yard 18
    17. 17. Survey of NoSQL Stores• MapReduce Framework• Distributed FS, task tracker, ... (full Ecosystem)• Pros: - Process extremely large data volumes• Cons: - Bloat and complexity (Java), steep learning curve• Best used: When you write more than you read (logging). If every component of the system must be in Java. - Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis.Engine Yard 19
    18. 18. “The rest”• Full Text Search - Awesome search capabilities - Helps reduce load of your database• Caches - Very fast! Use in any application where low-latency data access - Membase • Memcache compatible, but with persistence and clustering • All nodes are identical (master-master replication)Engine Yard 20
    19. 19. Keep in mind• Data and query models• Durability needs• Scalability needs• Partition needs - data on multiple servers?• Consistency - reads!• Server performance• Analytical workloadEngine Yard 21
    20. 20. Advice• Give them a try! - Fast to set up• Data Models matter - Going from RDBMS to NoSQL will require a conversion step so plan for it• No silver bullet - Your app will likely use different repositories for specific usagesEngine Yard 22
    21. 21. Questions?Engine Yard 23

    ×