Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Heterogenous Persistence

183 views

Published on

Heterogenous Persistence - Percona Live - USA 2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Heterogenous Persistence

  1. 1. Heterogeneous Persistence A guide for the modern DBA Marcos Albe Jervin Real Ryan Lowe Liz Van Dijk
  2. 2. Introduction Hello everyone
  3. 3. Introduction MySQL everyone?
  4. 4. Introduction Memcached?
  5. 5. Agenda ● Introduction ● Why a single DBMS is not enough ● What makes a DBMS ● Different flavors of DMBS ● Top picks
  6. 6. Why one DBMS is not enough "If you feel things are not efficient in your code, is likely that you are suffering of poor data structures choice/design" ~ Anonymous
  7. 7. Why one DBMS is not enough ● Different data structures ● Different access patterns ● Different consistency and durability requirements. ● Different scaling needs ● Different budgets ● Theoretical fundamentalism
  8. 8. Why one DBMS is not enough A more concrete example OLAP -vs- OLTP
  9. 9. OLAP -vs- OLTP
  10. 10. PROs CONs ● No SPOF ● Workload optimized services ● Easier to scale* ● Additional complexity ● Operational needs (additional staffing) ● Cost ($$$)*
  11. 11. La Carte ● Key Value Stores ○ Memcached ○ MemcacheDB ○ Redis ○ Riak KV ○ Cassandra ○ Amazon's DynamoDB ● Graph ○ Neo4J ○ OrientDB ○ Titan ○ Virtuoso ○ ArangoDB ● Relational ○ MySQL ○ PostgreSQL ● Time Series ○ InfluxDB ○ Graphite ○ OpenTSDB ○ Blueflood ○ Prometheus ● Columnar ○ Vertica ○ Infobright ○ Amazon RedShift ○ Apache HBase ● Document ○ MongoDB ○ Couchbase ● Fulltext ○ Sphinx ○ Lucene/Solr
  12. 12. What makes a DB?
  13. 13. General Criteria ● Specialty ● Cost ● API/Interfaces ● Scalability ● CAP ● ACID ● Secondary Features
  14. 14. What makes a DBMS: General ● Licensing ● Language support ● OS support ● Community & workforce ● Tools ecosystem
  15. 15. ● Data Architecture ○ Logical data model ○ Physical data model ● Standards adherence (where defined) ● Atomicity ● Consistency ● Isolation ● Durability ● Referential integrity ● Transactions ● Locking ● Crash recovery ● Unicode support What makes a DBMS: Fundamental Features
  16. 16. ● Interface / connectors / protocols ● Sequences / auto-incrementals / atomic counters ● Conditional entry updates ● MapReduce ● Compression ● In-memory ● Availability ● Concurrency handling ● Scalability ● Embeddable ● Backups What makes a DBMS: Fundamental Features cont.
  17. 17. ● CRUD ● Union ● Intersect ● JOIN (inner, outer) ● Inner selects ● Merge joins ● Common Table Expressions ● Windowing Functions ● Parallel Query ● Subqueries ● Aggregation ● Derived tables What makes a DBMS: querying capabilities
  18. 18. ● Cursors ● Triggers ● Stored procedures ● Functions ● Views ● Materialized views ● Virtual columns ● UDF ● XML/JSON/YAML support What makes a DBMS: programmatic capabilities
  19. 19. ● Database (tables size sum) ● Number of Tables ● Tables individual size ● Variable length column size ● Row width ● Row columns count ● Row count ● Column name ● Blob size ● Char ● Numeric ● Date (min / max) What makes a DBMS: sizing limits
  20. 20. ● B-Tree ● Full text indexing ● Hash ● Bitmap ● Expression ● Partials ● Reverse ● GiST ● GIS indexing ● Composite keys ● Graph support What makes a DBMS: indexing
  21. 21. ● Replication ● Failover ● Clustering ● CAP choice What makes a DBMS: high availability
  22. 22. Partitioning ● Range ● Hash ● Range+hash ● List ● Expression ● Sub-partitioning Sharding ● By key ● By table What makes a DBMS: scalability
  23. 23. ● Integer ● Floating point ● Decimal ● String ● Binary ● Date/time ● Boolean ● Binary ● Set ● Enumeration ● Blob ● Clob ● JSON/XML/YAML (as native types) What makes a DBMS: supported data types
  24. 24. ● Authentication methods ● Access Control Lists ● Pluggable Authentication Modules support ● Encryption at-rest ● Encryption over the wire ● User proxy What makes a DBMS: security features
  25. 25. ● Data organization model: unstructured, semi-structured, structured ● Data model (schema) stability: Static? Stable? Dynamic? Highly dynamic? ● Writes: append-only; append mostly; updates only; updates mostly ● Reads: full scans; range scans; multi-range scans; point reads; ● Reads by age: new only; new mostly; old only; old mostly; whole range ● Reads by complexity: simple, related, deeply-nested relations, ....? What makes a DBMS: workload
  26. 26. ACID vs BASE ● Atomic ● Consistent ● Isolated ● Durable ● Basic Availability ● Soft-state ● Eventual Consistency
  27. 27. CAP Theorem ● Consistency ● Availability ● Partitioning
  28. 28. Relational Databases
  29. 29. Relational Databases
  30. 30. Relational Databases: write anomalies
  31. 31. Relational Databases: write anomalies
  32. 32. Relational Databases: normalization
  33. 33. Relational Databases: normalization
  34. 34. Relational Databases: query language results = new Array(); table = open(‘mydata’); while (row = table.fetch()) { if (row.x > 100) { results.push(row); } }
  35. 35. Relational Databases: query language SELECT * FROM mydata WHERE x > 100;
  36. 36. Relational Databases: JOINs SELECT o.order_id AS Order, CONCAT(c.customer_name, “ (“, c. customer_email, “)”) as Customer, GROUP_CONCAT(i.item_name), SUM(item_price) FROM orders AS o JOIN order_items AS oi ON oi.order_id = o.order_id JOIN items AS i ON i.item_id = oi.item_id JOIN customers AS c ON c.customer_id = o.customer_id
  37. 37. Relational Databases: good use cases ● Highly-structured data with complex querying needs ● Projects that need very high data durability and guarantees of database-level consistency and integrity ● Simple projects with limited data growth and limited amount of entities ● Projects that require PCI/DSS, HIPPA or similar security requirements ● Analysis of portions of larger BigData stores ● Projects where duplicated data volumes would be a problem
  38. 38. Relational Databases: bad use cases ● Unstructured data ● Deep Hierarchies / Nested -> XML ● Deep recursion: ● Ever-growing datasets; Projects that are basically logging data ● Projects recording time-series ● Reporting on massive datasets
  39. 39. Relational Databases: bad use cases ● Projects supporting extreme concurrency ● Projects supporting massive data intake ● Queues ● Cache storage
  40. 40. PROs CONs ● Very mature ● Abundant workforce ● ACID guarantees ● Referential integrity ● Highly expressive query language ● Ubiquitous ● Rigid schema ● Difficult to scale horizontally ● Expensive writes ● JOIN bombs
  41. 41. Relational Databases: MySQL
  42. 42. ● Well known / mature / extensive documentation ● GPLv2 + commercial license for OEMs, ISVs and VARs ● Client libraries for about every programming language ● Many different engines ● SQL/ACID impose scalability limits ● Asynchronous / Semi-synchronous / Virtually synchronous replication ● Can be AP or CP depending on replication model Relational Databases: MySQL
  43. 43. PROs CONs ● Open source ● Mature and ubiquitous ● ACID ● Choice of AP or CP ● Highly available ● Abundant tooling and expertise ● General purpouse; Likely good to start anything you want. ● Difficult to shard ● Replication issues ● Not 100% standard compliant ● Storage engines imposed limiations ● General purpouse; No single bullet solutions for scaling!
  44. 44. Relational Databases: PostgreSQL
  45. 45. ● Mature / adequate documentation ● PostgreSQL License (similar to BSD/MIT) ● Client libraries for about every programming language ● Highly Standards Compliant ● SQL/ACID impose scalability limits ● Asynchronous / Semi-synchronous ● Virtually synchronous replication via 3rd party ● Can be AP or CP depending on replication model` Relational Databases: PostgreSQL
  46. 46. PROs CONs ● Open source ● Mature and stable ● ACID ● Lots of advanced features ● Vacuum ● Difficult to shard ● Operations feel like an afterthought ● Less forgiving ● Vacuum
  47. 47. K/V Stores
  48. 48. CRUD ● CREATE ● READ ● UPDATE ● DELETE
  49. 49. HASHING ● Computers: 0, 1, 2, …, n - 1, n ● Key Value Pair: (k, v) (k, v) => hash(k) mod n
  50. 50. THUNDERING HERD
  51. 51. CONSISTENT HASHING
  52. 52. CONSISTENT HASHING
  53. 53. K/V Stores - Good Use Cases ● Lots of data ● Object cache in front of RDBMS ● High concurrency ● Massive small-data intake ● Simple data access patterns
  54. 54. K/V Stores - Good Use Cases ● Lots of data ○ Usually easily horizontally scalable ● Object cache in front of RDBMS ● High concurrency ● Massive small-data intake ● Simple data access patterns
  55. 55. K/V Stores - Good Use Cases ● Lots of data ● Object cache in front of RDBMS ○ Memcached, anyone? ● High concurrency ● Massive small-data intake ● Simple data access patterns
  56. 56. K/V Stores - Good Use Cases ● Lots of data ● Object cache in front of RDBMS ● High concurrency ○ Very simple locking model ● Massive small-data intake ● Simple data access patterns
  57. 57. K/V Stores - Good Use Cases ● Lots of data ● Object cache in front of RDBMS ● High concurrency ● Massive small-data intake ● Simple data access patterns
  58. 58. K/V Stores - Good Use Cases ● Lots of data ● Object cache in front of RDBMS ● High concurrency ● Massive small-data intake ● Simple data access patterns ○ CRUD on PK access
  59. 59. K/V Stores - Bad Use Cases ● Durability and consistency* ● Complex data access patterns ● Non-PK access* ● Operations*
  60. 60. K/V Stores - Bad Use Cases ● Durability and consistency* ● Complex data access patterns ● Non-PK access* ● Operations*
  61. 61. K/V Stores - Bad Use Cases ● Durability and consistency* ● Complex data access patterns* ● Non-PK access* ● Operations*
  62. 62. K/V Stores - Bad Use Cases ● Durability and consistency* ● Complex data access patterns ● Non-PK access* ● Operations*
  63. 63. K/V Stores - Bad Use Cases ● Durability and consistency* ● Complex data access patterns ● Non-PK access* ● Operations* ○ Complex systems fail in complex ways
  64. 64. SIMPLE FAILURE
  65. 65. COMPLICATED FAILURE
  66. 66. EXAMPLE K/V STORES ● Memcached ● MemcacheDB ● Redis* ● Riak KV ● Cassandra* ● Amazon DynamoDB*
  67. 67. EXAMPLE K/V STORES ● Memcached ● MemcacheDB ● Redis* ● Riak KV ● Cassandra* ● Amazon DynamoDB*
  68. 68. EXAMPLE K/V STORES ● Memcached ● MemcacheDB ● Redis* ● Riak KV ● Cassandra* ● Amazon DynamoDB*
  69. 69. EXAMPLE K/V STORES ● Memcached ● MemcacheDB ● Redis* ● Riak KV ● Cassandra* ● Amazon DynamoDB*
  70. 70. EXAMPLE K/V STORES ● Memcached ● MemcacheDB ● Redis* ● Riak KV ● Cassandra* ● Amazon DynamoDB*
  71. 71. EXAMPLE K/V STORES ● Memcached ● MemcacheDB ● Redis* ● Riak KV ● Cassandra* ● Amazon DynamoDB*
  72. 72. EXAMPLE K/V STORES ● Memcached ● MemcacheDB ● Redis* ● Riak KV ● Cassandra* ● Amazon DynamoDB*
  73. 73. PROs CONs ● Highly scalable ● Simple access patterns ● Operational complexities ● Limited access patterns
  74. 74. Key Value Stores - Questions?
  75. 75. Columnar Databases
  76. 76. Columnar Data Layout ● Row-oriented ● Column-oriented 001:10,Smith,Joe,40000; 002:12,Jones,Mary,50000; 003:11,Johnson,Cathy,44000; 004:22,Jones,Bob,55000; ... 10:001,12:002,11:003,22:004; Smith:001,Jones:002,Johnson:003,Jones:004; Joe:001,Mary:002,Cathy:003,Bob:004; 40000:001,50000:002,44000:003,55000:004; ...
  77. 77. Columnar Data Layout ● Row-oriented Read Approach What we want to read Read Operation Memory Page 1 2 3 4 10 Smith Bob 40000 12 Jones Mary 50000 11 Johnson Cathy 44000
  78. 78. Columnar Data Layout ● Column-oriented Read Approach What we want to read Read Operation Memory Page 1 2 3 4 10 12 11 22 Smith Jones Johnson Joe Mary Cathy Bob
  79. 79. Columnar Databases - Considerations ● Buffering and compression can help to reduce the impact of writes, but they should still be avoided when possible ○ Usually, an ETL process should be put in place to prepare data for analysis in a column-based format ● Covering Indexes in row-based stores could provide similar benefits, but only up to a point → index maintenance work can become too expensive ● Column-based stores are self-indexing and more disk-space efficient ● SQL can be used for most column-based stores
  80. 80. Columnar Databases - Considerations ● Buffering and compression can help to reduce the impact of writes, but they should still be avoided when possible ○ Usually, an ETL process should be put in place to prepare data for analysis in a column-based format ● Covering Indexes in row-based stores could provide similar benefits, but only up to a point → index maintenance work can become too expensive ● Column-based stores are self-indexing and more disk-space efficient ● SQL can be used for most column-based stores
  81. 81. Columnar Databases - Considerations ● Buffering and compression can help to reduce the impact of writes, but they should still be avoided when possible ○ Usually, an ETL process should be put in place to prepare data for analysis in a column-based format ● Covering Indexes in row-based stores could provide similar benefits, but only up to a point → index maintenance work can become too expensive ● Column-based stores are self-indexing and more disk-space efficient ● SQL can be used for most column-based stores
  82. 82. Columnar Databases - Considerations ● Buffering and compression can help to reduce the impact of writes, but they should still be avoided when possible ○ Usually, an ETL process should be put in place to prepare data for analysis in a column-based format ● Covering Indexes in row-based stores could provide similar benefits, but only up to a point → index maintenance work can become too expensive ● Column-based stores are self-indexing and more disk-space efficient ● SQL can be used for most column-based stores
  83. 83. ● Suitable for read-mostly or read-intensive, large data repositories ● Good for full table / large range reads. ● Good for unstructured problems where “good” indexes are hard to forecast ● Good for re-creatable datasets ● Good for structured data Columnar Database - Good use cases
  84. 84. ● Suitable for read-mostly or read-intensive, large data repositories ● Good for full table / large range reads. ● Good for unstructured problems where “good” indexes are hard to forecast ● Good for re-creatable datasets ● Good for structured data Columnar Database - Good use cases
  85. 85. ● Suitable for read-mostly or read-intensive, large data repositories ● Good for full table / large range reads. ● Good for unstructured problems where “good” indexes are hard to forecast ● Good for re-creatable datasets ● Good for structured data Columnar Database - Good use cases
  86. 86. ● Suitable for read-mostly or read-intensive, large data repositories ● Good for full table / large range reads. ● Good for unstructured problems where “good” indexes are hard to forecast ● Good for re-creatable datasets ● Good for structured data Columnar Database - Good use cases
  87. 87. ● Suitable for read-mostly or read-intensive, large data repositories ● Good for full table / large range reads. ● Good for unstructured problems where “good” indexes are hard to forecast ● Good for re-creatable datasets ● Good for structured data Columnar Database - Good use cases
  88. 88. ● Not good for “SELECT *” queries or queries fetching most of the columns ● Not good for writes ● Not good for mixed read/write ● Bad for unstructured data Columnar Database - Bad use cases
  89. 89. ● Not good for “SELECT *” queries or queries fetching most of the columns ● Not good for writes ● Not good for mixed read/write ● Bad for unstructured data Columnar Database - Bad use cases
  90. 90. ● Not good for “SELECT *” queries or queries fetching most of the columns ● Not good for writes ● Not good for mixed read/write ● Bad for unstructured data Columnar Database - Bad use cases
  91. 91. ● Not good for “SELECT *” queries or queries fetching most of the columns ● Not good for writes ● Not good for mixed read/write ● Bad for unstructured data Columnar Database - Bad use cases
  92. 92. Columnar Database - Examples ● InfoBright (ICE) ● Vertica ● Amazon Redshift ● Apache HBase
  93. 93. Columnar Database - Examples ● InfoBright (ICE) ● Vertica ● Amazon Redshift ● Apache HBase
  94. 94. Columnar Database - Examples ● InfoBright (ICE) ● Vertica ● Amazon Redshift ● Apache HBase
  95. 95. Columnar Database - Examples ● InfoBright (ICE) ● Vertica ● Amazon Redshift ● Apache HBase ○ https://www.percona.com/live/data-performance-conference- 2016/sessions/solr-how-index-10-billion-phrases-mysql-and-hbase
  96. 96. Columnar - Questions?
  97. 97. Graph Databases
  98. 98. Graph Databases - Good Use Cases ● Highly Connected Data ● Millions or Billions of Records ● Re-Creatable Data Set ● Structured Data
  99. 99. Graph Databases - Good Use Cases ● Highly Connected Data ○ Network & IT Operations, Recommendations, Fraud Detection, Social Networking, Identity & Access Management, Geo Routing, Insurance Risk Analysis, Counter Terrorism ● Millions or Billions of Records ● Re-Creatable Data Set ● Structured Data
  100. 100. Graph Databases - Good Use Cases ● Highly Connected Data ● Millions or Billions of Records ○ Relational databases can also solve this problem at a smaller scale ● Re-Creatable Data Set ● Structured Data
  101. 101. Graph Databases - Good Use Cases ● Highly Connected Data ● Millions or Billions of Records ● Re-Creatable Data Set ○ Keep as much as possible outside of the critical path ● Structured Data
  102. 102. Graph Databases - Good Use Cases ● Highly Connected Data ● Millions or Billions of Records ● Re-Creatable Data Set ● Structured Data ○ You cannot graph a relationship unless you can define it
  103. 103. Graph Databases - Bad Use Cases ● Unstructured Data ● Non-Connected Data ● Highly Concurrent RW Workloads ● Anything in the Critical OLTP Path* ● Ever-Growing Data Set
  104. 104. Graph Databases - Bad Use Cases ● Unstructured Data ○ You cannot graph a relationship if you cannot define it ● Non-Connected Data ● Highly Concurrent Workloads ● Anything in the Critical OLTP Path* ● Ever-Growing Data Set
  105. 105. Graph Databases - Bad Use Cases ● Unstructured Data ● Non-Connected Data ○ Graphiness is important here ● Highly Concurrent Workloads ● Anything in the Critical OLTP Path* ● Ever-Growing Data Set
  106. 106. Graph Databases - Bad Use Cases ● Unstructured Data ● Non-Connected Data ● Highly Concurrent RW Workloads ○ Performance breaks down ● Anything in the Critical OLTP Path* ● Ever-Growing Data Set
  107. 107. Graph Databases - Bad Use Cases ● Unstructured Data ● Non-Connected Data ● Highly Concurrent Workloads ● Anything in the Critical OLTP Path* ○ I'm not only talking about writes here ● Ever-Growing Data Set
  108. 108. Graph Databases - Bad Use Cases ● Unstructured Data ● Non-Connected Data ● Highly Concurrent RW Workloads ● Anything in the Critical OLTP Path* ● Ever-Growing Data Set
  109. 109. Example Graph Databases ● Neo4j ● OrientDB ● Titan ● Virtuoso ● ArangoDB
  110. 110. Example Graph Databases ● Neo4j ● OrientDB ● Titan ● Virtuoso ● ArangoDB
  111. 111. Example Graph Databases ● Neo4j ● OrientDB ● Titan ● Virtuoso ● ArangoDB
  112. 112. Example Graph Databases ● Neo4j ● OrientDB ● Titan ● Virtuoso ● ArangoDB
  113. 113. Example Graph Databases ● Neo4j ● OrientDB ● Titan ● Virtuoso ● ArangoDB
  114. 114. Example Graph Databases ● Neo4j ● OrientDB ● Titan ● Virtuoso ● ArangoDB
  115. 115. THE CODE
  116. 116. PROs CONs ● Solves a very specific (and hard) data problem ● Learning curve not bad for developer usage ● Data analysts’ dream ● Very little operational expertise for hire ● Little community and virtually no tooling for administration and operations. ● Big mismatch in paradigm vs RDBMS; Hard to switch for DBAs. ● Hard/Expensive to scale horizontally ● Writes are computationally expensive
  117. 117. Graph Databases - Questions?
  118. 118. Time Series
  119. 119. ID: {timestamp, value} db1-threads: {1460928171, 6}
  120. 120. Time Series - Good Use Cases ● Uh … Time Series Data ● Write-mostly (95%+) - Sequential Appends ● Rare updates, rarer still to the distant past ● Deletes occur at the opposite end (the beginning) ● Data does not fit in memory
  121. 121. Time Series - Good Use Cases ● Uh … Time Series Data ● Write-mostly (95%+) - Sequential Appends ● Rare updates, rarer still to the distant past ● Deletes occur at the opposite end (the beginning) ● Data does not fit in memory
  122. 122. Time Series - Good Use Cases ● Uh … Time Series Data ● Write-mostly (95%+) - Sequential Appends ● Rare updates, rarer still to the distant past ● Deletes occur at the opposite end (the beginning) ● Data does not fit in memory
  123. 123. Time Series - Good Use Cases ● Uh … Time Series Data ● Write-mostly (95%+) - Sequential Appends ● Rare updates, rarer still to the distant past ● Deletes occur at the opposite end (the beginning) ● Data does not fit in memory
  124. 124. Time Series - Good Use Cases ● Uh … Time Series Data ● Write-mostly (95%+) - Sequential Appends ● Rare updates, rarer still to the distant past ● Deletes occur at the opposite end (the beginning) ● Data does not fit in memory
  125. 125. Time Series - Good Use Cases ● Uh … Time Series Data ● Write-mostly (95%+) - Sequential Appends ● Rare updates, rarer still to the distant past ● Deletes occur at the opposite end (the beginning) ● Data does not fit in memory
  126. 126. Time Series - Bad Use Cases ● Uh … Not Time Series Data ● Small data
  127. 127. Example Time Series Databases ● InfluxDB ● Graphite ● OpenTSDB ● Blueflood ● Prometheus
  128. 128. Example Time Series Databases ● InfluxDB ● Graphite ● OpenTSDB ● Blueflood ● Prometheus
  129. 129. Example Time Series Databases ● InfluxDB ● Graphite ● OpenTSDB ● Blueflood ● Prometheus
  130. 130. Example Time Series Databases ● InfluxDB ● Graphite ● OpenTSDB ● Blueflood ● Prometheus
  131. 131. Example Time Series Databases ● InfluxDB ● Graphite ● OpenTSDB ● Blueflood ● Prometheus
  132. 132. Example Time Series Databases ● InfluxDB ● Graphite ● OpenTSDB ● Blueflood ● Prometheus
  133. 133. PROs CONs ● Solves a very specific (big) data problem ● Well-defined and finite data access patterns ● Terrible query semantics
  134. 134. Time Series - Questions?
  135. 135. Document Stores
  136. 136. Document Stores: Document Oriented
  137. 137. Document Stores: Document Oriented
  138. 138. Document Stores: Flexible Schema
  139. 139. Document Stores: Flexible Schema
  140. 140. Document Stores: Flexible Schema
  141. 141. Document Stores: Flexible Schema
  142. 142. Document Stores: Flexible Schema
  143. 143. ShardShardShard Document Stores: Scalable by Design Primary Primary Primary Replica Replica Replica Replica Replica Replica
  144. 144. InstanceInstanceInstance Document Stores: Scalable By Design Shard Shard Shard Replica Replica Replica Replica Replica Replica
  145. 145. Document Stores
  146. 146. Document Stores: MongoDB
  147. 147. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  148. 148. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  149. 149. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ○ Different locking behaviors ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  150. 150. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  151. 151. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  152. 152. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  153. 153. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  154. 154. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  155. 155. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  156. 156. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  157. 157. Document Stores: MongoDB ● Sharding and replication for dummies! ● Pluggable storage engines for distinct workloads. ● Excellent compression options with PerconaFT, RocksDB, WiredTiger ● On disk encryption (Enterprise Advanced) ● In-memory storage engine (Beta) ● Connectors for all major programming languages ● Sharding and replica aware connectors ● Geospatial functions ● Aggregation framework ● .. a lot more except being transactional
  158. 158. ● Catalogs ● Analytics/BI (BI Connector on 3.2) ● Time series Document Stores: MongoDB > Use Cases
  159. 159. ● Catalogs ● Analytics/BI (BI Connector on 3.2) ● Time series Document Stores: MongoDB > Use Cases
  160. 160. ● Catalogs ● Analytics/BI (BI Connector on 3.2) ● Time series Document Stores: MongoDB > Use Cases
  161. 161. Document Stores: Couchbase
  162. 162. Document Stores: Couchbase ● MongoDB - more or less ● Global Secondary Indexes is exciting which produces localized secondary indexes for low latency queries (Multi Dimensional Scaling) ● Drop in replacement for Memcache
  163. 163. Document Stores: Couchbase ● MongoDB - more or less ● Global Secondary Indexes is exciting which produces localized secondary indexes for low latency queries (Multi Dimensional Scaling) ● Drop in replacement for Memcache
  164. 164. Document Stores: Couchbase ● MongoDB - more or less ● Global Secondary Indexes is exciting which produces localized secondary indexes for low latency queries (Multi Dimensional Scaling) ● Drop in replacement for Memcache
  165. 165. Document Stores: Couchbase > Use Cases ● Internet of Things (direct or indirect receiver/pipeline) ● Mobile data persistence via Couchbase Mobile i.e. field devices with unstable connections and local/close priximity ingestion points ● Distributed K/V store
  166. 166. Document Stores: Couchbase > Use Cases ● Internet of Things (direct or indirect receiver/pipeline) ● Mobile data persistence via Couchbase Mobile i.e. field devices with unstable connections and local/close priximity ingestion points ● Distributed K/V store
  167. 167. Document Stores: Couchbase > Use Cases ● Internet of Things (direct or indirect receiver/pipeline) ● Mobile data persistence via Couchbase Mobile i.e. field devices with unstable connections and local/close priximity ingestion points ● Distributed K/V store
  168. 168. Document Store: Questions?
  169. 169. Fulltext Search
  170. 170. Fulltext Search: Inverted Index
  171. 171. Fulltext Search: Search in a Box
  172. 172. Fulltext Search: Optimized Out ● Optimized to take data out - little optimizations for getting data in https://flic.kr/p/abeTEw
  173. 173. Fulltext Search: Structured/Non-Structured Data
  174. 174. Fulltext Search
  175. 175. Fulltext Search: Elasticsearch ● Lucene based ● RESTful interface - JSON in, JSON out ● Flexible schema ● Automatic sharding and replication (NDB like) ● Reasonable defaults ● Extension model ● Written in Java, JVM limitation applies i.e. GC ● ELK - Elasticsearch+Logstash+Kibana
  176. 176. Fulltext Search: Elasticsearch ● Lucene based ● RESTful interface - JSON in, JSON out ● Flexible schema ● Automatic sharding and replication (NDB like) ● Reasonable defaults ● Extension model ● Written in Java, JVM limitation applies i.e. GC ● ELK - Elasticsearch+Logstash+Kibana
  177. 177. Fulltext Search: Elasticsearch ● Lucene based ● RESTful interface - JSON in, JSON out ● Flexible schema ● Automatic sharding and replication (NDB like) ● Reasonable defaults ● Extension model ● Written in Java, JVM limitation applies i.e. GC ● ELK - Elasticsearch+Logstash+Kibana
  178. 178. Fulltext Search: Elasticsearch ● Lucene based ● RESTful interface - JSON in, JSON out ● Flexible schema ● Automatic sharding and replication (NDB like) ● Reasonable defaults ● Extension model ● Written in Java, JVM limitation applies i.e. GC ● ELK - Elasticsearch+Logstash+Kibana
  179. 179. Fulltext Search: Elasticsearch ● Lucene based ● RESTful interface - JSON in, JSON out ● Flexible schema ● Automatic sharding and replication (NDB like) ● Reasonable defaults ● Extension model ● Written in Java, JVM limitation applies i.e. GC ● ELK - Elasticsearch+Logstash+Kibana
  180. 180. Fulltext Search: Elasticsearch ● Lucene based ● RESTful interface - JSON in, JSON out ● Flexible schema ● Automatic sharding and replication (NDB like) ● Reasonable defaults ● Extension model ● Written in Java, JVM limitation applies i.e. GC ● ELK - Elasticsearch+Logstash+Kibana
  181. 181. Fulltext Search: Elasticsearch ● Lucene based ● RESTful interface - JSON in, JSON out ● Flexible schema ● Automatic sharding and replication (NDB like) ● Reasonable defaults ● Extension model ● Written in Java, JVM limitation applies i.e. GC ● ELK - Elasticsearch+Logstash+Kibana
  182. 182. Fulltext Search: Elasticsearch ● Lucene based ● RESTful interface - JSON in, JSON out ● Flexible schema ● Automatic sharding and replication (NDB like) ● Reasonable defaults ● Extension model ● Written in Java, JVM limitation applies i.e. GC ● ELK - Elasticsearch+Logstash+Kibana
  183. 183. Fulltext Search: Elasticsearch > Use Cases ● Logs Analysis - ELK Stack i.e. Netflix ● Full Text search i.e. Github, Wikipedia, StackExchange, etc ● https://www.elastic.co/use-cases
  184. 184. Fulltext Search: Elasticsearch > Use Cases ● Logs Analysis - ELK Stack i.e. Netflix ● Full Text search i.e. Github, Wikipedia, StackExchange, etc ● https://www.elastic.co/use-cases
  185. 185. Fulltext Search: Elasticsearch > Use Cases ● Logs Analysis - ELK Stack i.e. Netflix ● Full Text search i.e. Github, Wikipedia, StackExchange, etc ● https://www.elastic.co/use-cases ○ Sentiment analysis ○ Personalized experience ○ etc
  186. 186. ● Lucene based ● Quite cryptic query interface - Innovator’s Dilemma ● Support for SQL based query on 6.1 ● Structured schema, data types needs to be predefined ● Written in Java, JVM limitation applies i.e. GC ● Near realtime indexing - DIH, ● Rich document handling - PDF, doc[x] ● SolrCloud support for sharding and replication Fulltext Search: Solr
  187. 187. ● Lucene based ● Quite cryptic query interface - Innovator’s Dilemma ● Support for SQL based query on 6.1 ● Structured schema, data types needs to be predefined ● Written in Java, JVM limitation applies i.e. GC ● Near realtime indexing - DIH, ● Rich document handling - PDF, doc[x] ● SolrCloud support for sharding and replication Fulltext Search: Solr
  188. 188. ● Lucene based ● Quite cryptic query interface - Innovator’s Dilemma ● Support for SQL based query on 6.1 ● Structured schema, data types needs to be predefined ● Written in Java, JVM limitation applies i.e. GC ● Near realtime indexing - DIH, ● Rich document handling - PDF, doc[x] ● SolrCloud support for sharding and replication Fulltext Search: Solr
  189. 189. ● Lucene based ● Quite cryptic query interface - Innovator’s Dilemma ● Support for SQL based query on 6.1 ● Structured schema, data types needs to be predefined ● Written in Java, JVM limitation applies i.e. GC ● Near realtime indexing - DIH, ● Rich document handling - PDF, doc[x] ● SolrCloud support for sharding and replication Fulltext Search: Solr
  190. 190. ● Lucene based ● Quite cryptic query interface - Innovator’s Dilemma ● Support for SQL based query on 6.1 ● Structured schema, data types needs to be predefined ● Written in Java, JVM limitation applies i.e. GC ● Near realtime indexing - DIH, ● Rich document handling - PDF, doc[x] ● SolrCloud support for sharding and replication Fulltext Search: Solr
  191. 191. ● Lucene based ● Quite cryptic query interface - Innovator’s Dilemma ● Support for SQL based query on 6.1 ● Structured schema, data types needs to be predefined ● Written in Java, JVM limitation applies i.e. GC ● Near real-time indexing - DIH, ● Rich document handling - PDF, doc[x] ● SolrCloud support for sharding and replication Fulltext Search: Solr
  192. 192. ● Lucene based ● Quite cryptic query interface - Innovator’s Dilemma ● Support for SQL based query on 6.1 ● Structured schema, data types needs to be predefined ● Written in Java, JVM limitation applies i.e. GC ● Near realtime indexing - DIH, ● Rich document handling - PDF, doc[x] ● SolrCloud support for sharding and replication Fulltext Search: Solr
  193. 193. ● Lucene based ● Quite cryptic query interface - Innovator’s Dilemma ● Support for SQL based query on 6.1 ● Structured schema, data types needs to be predefined ● Written in Java, JVM limitation applies i.e. GC ● Near realtime indexing - DIH, ● Rich document handling - PDF, doc[x] ● SolrCloud support for sharding and replication Fulltext Search: Solr
  194. 194. ● Search and Relevancy ○ https://www.percona.com/live/data-performance-conference-2016/sessions/solr-how-index-10- billion-phrases-mysql-and-hbase ● Recommendation Engine ● Spatial Search Fulltext Search: Solr > Use Cases
  195. 195. ● Search and Relevancy ● Recommendation Engine ● Spatial Search Fulltext Search: Solr > Use Cases
  196. 196. ● Search and Relevancy ● Recommendation Engine ● Spatial Search Fulltext Search: Solr > Use Cases
  197. 197. ● Structured data ● MySQL protocol - SphinxQL ● Durable indexes via binary logs ● Realtime indexes via MySQL queries ● Distributed index for scaling ● No native support for replication i.e. via rsync ● Very good documentation ● Fastest full indexing/reindexing [?] Fulltext Search: Sphinx Search
  198. 198. ● Structured data ● MySQL protocol - SphinxQL ● Durable indexes via binary logs ● Realtime indexes via MySQL queries ● Distributed index for scaling ● No native support for replication i.e. via rsync ● Very good documentation ● Fastest full indexing/reindexing [?] Fulltext Search: Sphinx Search
  199. 199. ● Structured data ● MySQL protocol - SphinxQL ● Durable indexes via binary logs ● Realtime indexes via MySQL queries ● Distributed index for scaling ● No native support for replication i.e. via rsync ● Very good documentation ● Fastest full indexing/reindexing [?] Fulltext Search: Sphinx Search
  200. 200. ● Structured data ● MySQL protocol - SphinxQL ● Durable indexes via binary logs ● Realtime indexes via MySQL queries ● Distributed index for scaling ● No native support for replication i.e. via rsync ● Very good documentation ● Fastest full indexing/reindexing [?] Fulltext Search: Sphinx Search
  201. 201. ● Structured data ● MySQL protocol - SphinxQL ● Durable indexes via binary logs ● Realtime indexes via MySQL queries ● Distributed index for scaling ● No native support for replication i.e. via rsync ● Very good documentation ● Fastest full indexing/reindexing [?] Fulltext Search: Sphinx Search
  202. 202. ● Structured data ● MySQL protocol - SphinxQL ● Durable indexes via binary logs ● Realtime indexes via MySQL queries ● Distributed index for scaling ● No native support for replication i.e. via rsync ● Very good documentation ● Fastest full indexing/reindexing [?] Fulltext Search: Sphinx Search
  203. 203. ● Structured data ● MySQL protocol - SphinxQL ● Durable indexes via binary logs ● Realtime indexes via MySQL queries ● Distributed index for scaling ● No native support for replication i.e. via rsync ● Very good documentation ● Fastest full indexing/reindexing [?] Fulltext Search: Sphinx Search
  204. 204. ● Structured data ● MySQL protocol - SphinxQL ● Durable indexes via binary logs ● Realtime indexes via MySQL queries ● Distributed index for scaling ● No native support for replication i.e. via rsync ● Very good documentation ● Fastest full indexing/reindexing Fulltext Search: Sphinx Search
  205. 205. ● Real time full text + basic geo functions ● Above with with dependency or to simplify access with SphinxQL or even Sphinx storage engine for MySQL Fulltext Search: Sphinx Search > Use Cases
  206. 206. ● Real time full text + basic geo functions ● Above with with dependency or to simplify access with SphinxQL or even Sphinx storage engine for MySQL Fulltext Search: Sphinx Search > Use Cases
  207. 207. Search - Questions?
  208. 208. Docker Is Your Friend
  209. 209. Relational ● https://github.com/docker-library/mysql ● https://github.com/docker-library/postgres Key Value ● https://github.com/docker-library/memcached ● https://github.com/docker-library/redis ● https://github.com/docker-library/cassandra ● https://github.com/hectcastro/docker-riak (https://docs.docker. com/engine/examples/running_riak_service/) Docker Is Your Friend
  210. 210. Graph ● https://github.com/neo4j/docker-neo4j ● https://github.com/orientechnologies/orientdb-docker ● https://github.com/arangodb/arangodb-docker ● https://github.com/tenforce/docker-virtuoso (non official) ● https://hub.docker.com/r/itzg/titandb/~/dockerfile/ (non official) ● https://github.com/phani1kumar/docker-titan (non official) Full Text ● https://github.com/docker-solr/docker-solr/ ● https://github.com/stefobark/sphinxdocker Docker Is Your Friend
  211. 211. Docker Is Your Friend Time series ● https://github.com/tutumcloud/influxdb (non official) ● https://hub.docker.com/r/sitespeedio/graphite/ (non official) ● https://github.com/rackerlabs/blueflood/tree/master/demo/docker ● https://hub.docker.com/r/petergrace/opentsdb-docker/ (non-official) ● https://hub.docker.com/r/opower/opentsdb/ (non-official) ● https://prometheus.io/docs/introduction/install/#using-docker ● https://github.com/prometheus/prometheus/blob/master/Dockerfile ● Both via http://opentsdb.net/docs/build/html/resources.html
  212. 212. Docker Is Your Friend Document ● https://github.com/docker-library/mongo/ ● https://hub.docker.com/r/couchbase/server/~/dockerfile/ Columnar ● http://www.infobright.org/index.php/download/download-pentaho-ice-integrated-virtual-machine/ ● https://github.com/meatcar/docker-infobright/blob/master/Dockerfile ● https://github.com/vertica/docker-vertica

×