SlideShare a Scribd company logo
1 of 24
CouchDB at Its Core
Global Data Storage and Rich Incremental Indexing at Cloudant
Adam Kocoloski
StampedeCon 2013
What is Cloudant?
• Founded by “big data” scientists
• Particle physicists @ MIT analyzing
petabytes of collider data
• Frustrated by inadequate tools,
founders became experts in
scaling CouchDB (“BigCouch”)
2
• Started Cloudant in 2008 as a managed data layer
• Premise: Apps should grow into their data layer, not out of it
• Built: Scalable, global, fault-tolerant data layer managed service
• Funded by Avalon, Devonshire (Fidelity), IQT, Rackspace, Samsung Ventures, Toba
Capital, Y Combinator
Cloudant Overview
• Operational JSON document store
• Web service
• Advanced APIs
• Replication & Sync
• Full-text Search
• Geospatial
• Incremental MapReduce
• Scalable, Highly Available Performance
• Cross-data center data distribution & fail over
• Geo load balancing
• Multi-tenant and single-tenant clusters
• Monitoring, admin & dev dashboards
• Managed 24x7 by experts
4
5
Cloudant: 34 locations on 5 hosting providers
Anatomy of the Cloudant Data Network
US-EAST “Node”
Single-
tenant
cluster
Multi-tenant
cluster
HTTP POST, GET,…{JSON doc}
Edge Database Cluster
Mobile Devices
AP-JP
Filtered
Replication
& Sync
Secondary Data Centers
(for DR & distributed access)
EU-NL
6
Horizontal Clustering Framework
How CouchDB Fits In
Visualization
Lucene
Search
Chainable
MapReduce
Management
Monitoring
IOQ
Fabric Mem3 Rexi
Apache CouchDB
Docs: JSON,
Attachments
Developer APIs
Prioritizing IO types; prevents
“noisy neighbors” in multi-tenancy
Clustering API, Sharding,
Intra-cluster messaging
GET/PUT docs, Views,
Replication…
Horizontal Clustering Framework
Geospatial
Indexing
Geo-Load Balancing Connects users to closest copy of
data
Dashboards-Monitoring, Admin,
Development
7
Why CouchDB?
8
• Durable append-only storage engine
• Sequence tree enabling incremental processing of updates
• Data structures supporting eventual consistency
• Sophisticated replication & synchronization
The right primitives for a global data network
9
Append-only Storage
Append-only Storage
10
• Rewrite path to root in each index on
document update
• Large sequential writes, smaller random reads
• Wasted space must be periodically vacuumed
• Disk is cheap
• SSD-friendly access pattern
• We build what we run ➜ we make things
that are easy to run
• (We automated the heck out of the compactor)
This used to be controversial, now everyone does it
11
Sequence Index
Sequence Index
12
1
foo
2
bar
3
baz
4
bif
GET /db/_changes
{“seq”:1, “id”: “foo”, “rev”:”1-...”}
{“seq”:2, “id”: “bar”, “rev”:”1-...”}
{“seq”:3, “id”: “baz”, “rev”:”1-...”}
{“seq”:4, “id”: “bif”, “rev”:”1-...”}
Sequence Index
13
1
foo
2
bar
3
baz
4
bif
GET /db/_changes
{“seq”:1, “id”: “foo”, “rev”:”1-...”}
{“seq”:3, “id”: “baz”, “rev”:”1-...”}
{“seq”:4, “id”: “bif”, “rev”:”1-...”}
{“seq”:5, “id”: “bar”, “rev”:”2-...”}
5
bar
OR
GET /db/_changes?since=4
{“seq”:5, “id”: “bar”, “rev”:”2-...”}
Sequence Index
14
• Index each document in order of most recent update
• Allows incremental, resumable processing in the background
• Originally, MapReduce views
• First class API endpoint ➜ DIY integrations (c.f. ElasticSearch)
• Lucene-based text search
• Geospatial indexes and querying
• First class internal service ➜ add additional consumers as need arises
15
Eventual Consistency
Eventual Consistency
16
• CAP theorem (Brewer)
• O"en over-simplified
• I’ll offer my own oversimplification: “You must choose P”
• When faced with a network partition, you optimize for consistency
or availability
• Cloudant is an ODS
• Availability is paramount
• Strong consistency across geographies introduces unacceptable latency*
✱ Unless you’re Google and you install atomic clocks in your data centers
Eventual Consistency: Hash Histories
17
• Multiple concurrent versions of data will happen
• Default strategy cannot be to discard user data
• Hash histories track versions of a document
• Baked into every document
• Think git
• Document versions derived from contents + edit history
• Same series of edits, applied in same order, yield same
version ID
• History comparison detects divergences and how the
versions fit into the “family tree”
1-5a4...
2-ab6...
3-085...3-f57...
4-7ba... 4-8bf...
5-d4e...
18
Replication & Synchronization
Replication & Sync
19
1-5a4...
2-ab6...
3-085...
4-7ba... 4-8bf...
5-d4e...
1-5a4...
2-ab6...
3-085...3-f57...
/db1/foo /db2/foo
Replication & Sync
20
1-5a4...
2-ab6...
3-085...3-f57...
4-7ba... 4-8bf...
5-d4e...
/db1/foo /db2/foo
1-5a4...
2-ab6...
3-085...3-f57...
Replication & Sync
21
1-5a4...
2-ab6...
3-085...3-f57...
4-7ba... 4-8bf...
5-d4e...
1-5a4...
2-ab6...
3-085...3-f57...
4-7ba... 4-8bf...
5-d4e...
/db1/foo /db2/foo
Replication & Sync
22
• Not your RDBMS’ notion of replication
• Transfers updates from any source DB to any target DB
• Builds on earlier primitives
• Leverages sequence index to determine what’s changed
• Leverages hash histories to determine what’s missing on the target
• Critical “anti-entropy” element in clusters
• DBs are divided into partitions, copies of each partition are stored on
multiple distinct nodes
• Partition copies replicate with each other to ensure that documents are
durably stored and that consistency is achieved ... eventually
Why CouchDB Recap
23
• Durable append-only storage engine
• Sequence tree enabling incremental processing of updates
• Data structures supporting eventual consistency
• Sophisticated replication & synchronization
What’s Next?
24
• BigCouch ➜ CouchDB
• Cloudant will continue development under ASF umbrella
• Fewer code forks ➜ better velocity
• New CouchDB web UI “Fauxton”
• Better developer tooling for server-side code
• Plugins for Cloudant-specific functionality
• Cloudant is betting on data “at the edge”
Thank You
adam@cloudant.com
@kocolosk

More Related Content

What's hot

Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportWes McKinney
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN Jim Dowling
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
Using oracle12c pluggable databases to archive
Using oracle12c pluggable databases to archiveUsing oracle12c pluggable databases to archive
Using oracle12c pluggable databases to archiveSecure-24
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackWes McKinney
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Operationalizing MongoDB at AOL
Operationalizing MongoDB at AOLOperationalizing MongoDB at AOL
Operationalizing MongoDB at AOLradiocats
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangChen Zhang
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?Mydbops
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Yahoo Developer Network
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Mark Kerzner
 
Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4DataWorks Summit
 

What's hot (20)

Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Using oracle12c pluggable databases to archive
Using oracle12c pluggable databases to archiveUsing oracle12c pluggable databases to archive
Using oracle12c pluggable databases to archive
 
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLCHBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Operationalizing MongoDB at AOL
Operationalizing MongoDB at AOLOperationalizing MongoDB at AOL
Operationalizing MongoDB at AOL
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
What is new in MariaDB 10.6?
What is new in MariaDB 10.6?What is new in MariaDB 10.6?
What is new in MariaDB 10.6?
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
 
Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4
 

Viewers also liked

ZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBBradley Holt
 
Migrating to CouchDB
Migrating to CouchDBMigrating to CouchDB
Migrating to CouchDBJohn Wood
 
Couch Db In 60 Minutes
Couch Db In 60 MinutesCouch Db In 60 Minutes
Couch Db In 60 MinutesGeorge Ang
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHPBradley Holt
 
Couch db@nosql+taiwan
Couch db@nosql+taiwanCouch db@nosql+taiwan
Couch db@nosql+taiwanKenzou Yeh
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the WebKarel Minarik
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityFederico Galassi
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDBJohn Wood
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDBDavid Coallier
 

Viewers also liked (16)

ZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDBZendCon 2011 Learning CouchDB
ZendCon 2011 Learning CouchDB
 
Migrating to CouchDB
Migrating to CouchDBMigrating to CouchDB
Migrating to CouchDB
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
Couch Db In 60 Minutes
Couch Db In 60 MinutesCouch Db In 60 Minutes
Couch Db In 60 Minutes
 
Couch db
Couch dbCouch db
Couch db
 
CouchDB at New York PHP
CouchDB at New York PHPCouchDB at New York PHP
CouchDB at New York PHP
 
Couch db@nosql+taiwan
Couch db@nosql+taiwanCouch db@nosql+taiwan
Couch db@nosql+taiwan
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
Couch db
Couch dbCouch db
Couch db
 
CouchDB
CouchDBCouchDB
CouchDB
 
CouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental ComplexityCouchApps: Requiem for Accidental Complexity
CouchApps: Requiem for Accidental Complexity
 
CouchDB Vs MongoDB
CouchDB Vs MongoDBCouchDB Vs MongoDB
CouchDB Vs MongoDB
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
 
An introduction to CouchDB
An introduction to CouchDBAn introduction to CouchDB
An introduction to CouchDB
 
CouchDB
CouchDBCouchDB
CouchDB
 

Similar to CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013

Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at ScaleElasticsearch
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structureZhichao Liang
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Anthony Baker
 
Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)Howard Marks
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackElasticsearch
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysisDivante
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatternsAnurag S
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h basehdhappy001
 
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack  - OpenStack Fall 2012 SummitChef for OpenStack  - OpenStack Fall 2012 Summit
Chef for OpenStack - OpenStack Fall 2012 SummitMatt Ray
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and ActivatorKevin Webber
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit
 

Similar to CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013 (20)

Architecture at Scale
Architecture at ScaleArchitecture at Scale
Architecture at Scale
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)Building Storage for Clouds (ONUG Spring 2015)
Building Storage for Clouds (ONUG Spring 2015)
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
What’s Evolving in the Elastic Stack
What’s Evolving in the Elastic StackWhat’s Evolving in the Elastic Stack
What’s Evolving in the Elastic Stack
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatterns
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack  - OpenStack Fall 2012 SummitChef for OpenStack  - OpenStack Fall 2012 Summit
Chef for OpenStack - OpenStack Fall 2012 Summit
 
Chef for OpenStack- Fall 2012.pdf
Chef for OpenStack- Fall 2012.pdfChef for OpenStack- Fall 2012.pdf
Chef for OpenStack- Fall 2012.pdf
 
Play Framework and Activator
Play Framework and ActivatorPlay Framework and Activator
Play Framework and Activator
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013

  • 1. CouchDB at Its Core Global Data Storage and Rich Incremental Indexing at Cloudant Adam Kocoloski StampedeCon 2013
  • 2. What is Cloudant? • Founded by “big data” scientists • Particle physicists @ MIT analyzing petabytes of collider data • Frustrated by inadequate tools, founders became experts in scaling CouchDB (“BigCouch”) 2 • Started Cloudant in 2008 as a managed data layer • Premise: Apps should grow into their data layer, not out of it • Built: Scalable, global, fault-tolerant data layer managed service • Funded by Avalon, Devonshire (Fidelity), IQT, Rackspace, Samsung Ventures, Toba Capital, Y Combinator
  • 3. Cloudant Overview • Operational JSON document store • Web service • Advanced APIs • Replication & Sync • Full-text Search • Geospatial • Incremental MapReduce • Scalable, Highly Available Performance • Cross-data center data distribution & fail over • Geo load balancing • Multi-tenant and single-tenant clusters • Monitoring, admin & dev dashboards • Managed 24x7 by experts 4
  • 4. 5 Cloudant: 34 locations on 5 hosting providers
  • 5. Anatomy of the Cloudant Data Network US-EAST “Node” Single- tenant cluster Multi-tenant cluster HTTP POST, GET,…{JSON doc} Edge Database Cluster Mobile Devices AP-JP Filtered Replication & Sync Secondary Data Centers (for DR & distributed access) EU-NL 6
  • 6. Horizontal Clustering Framework How CouchDB Fits In Visualization Lucene Search Chainable MapReduce Management Monitoring IOQ Fabric Mem3 Rexi Apache CouchDB Docs: JSON, Attachments Developer APIs Prioritizing IO types; prevents “noisy neighbors” in multi-tenancy Clustering API, Sharding, Intra-cluster messaging GET/PUT docs, Views, Replication… Horizontal Clustering Framework Geospatial Indexing Geo-Load Balancing Connects users to closest copy of data Dashboards-Monitoring, Admin, Development 7
  • 7. Why CouchDB? 8 • Durable append-only storage engine • Sequence tree enabling incremental processing of updates • Data structures supporting eventual consistency • Sophisticated replication & synchronization The right primitives for a global data network
  • 9. Append-only Storage 10 • Rewrite path to root in each index on document update • Large sequential writes, smaller random reads • Wasted space must be periodically vacuumed • Disk is cheap • SSD-friendly access pattern • We build what we run ➜ we make things that are easy to run • (We automated the heck out of the compactor) This used to be controversial, now everyone does it
  • 11. Sequence Index 12 1 foo 2 bar 3 baz 4 bif GET /db/_changes {“seq”:1, “id”: “foo”, “rev”:”1-...”} {“seq”:2, “id”: “bar”, “rev”:”1-...”} {“seq”:3, “id”: “baz”, “rev”:”1-...”} {“seq”:4, “id”: “bif”, “rev”:”1-...”}
  • 12. Sequence Index 13 1 foo 2 bar 3 baz 4 bif GET /db/_changes {“seq”:1, “id”: “foo”, “rev”:”1-...”} {“seq”:3, “id”: “baz”, “rev”:”1-...”} {“seq”:4, “id”: “bif”, “rev”:”1-...”} {“seq”:5, “id”: “bar”, “rev”:”2-...”} 5 bar OR GET /db/_changes?since=4 {“seq”:5, “id”: “bar”, “rev”:”2-...”}
  • 13. Sequence Index 14 • Index each document in order of most recent update • Allows incremental, resumable processing in the background • Originally, MapReduce views • First class API endpoint ➜ DIY integrations (c.f. ElasticSearch) • Lucene-based text search • Geospatial indexes and querying • First class internal service ➜ add additional consumers as need arises
  • 15. Eventual Consistency 16 • CAP theorem (Brewer) • O"en over-simplified • I’ll offer my own oversimplification: “You must choose P” • When faced with a network partition, you optimize for consistency or availability • Cloudant is an ODS • Availability is paramount • Strong consistency across geographies introduces unacceptable latency* ✱ Unless you’re Google and you install atomic clocks in your data centers
  • 16. Eventual Consistency: Hash Histories 17 • Multiple concurrent versions of data will happen • Default strategy cannot be to discard user data • Hash histories track versions of a document • Baked into every document • Think git • Document versions derived from contents + edit history • Same series of edits, applied in same order, yield same version ID • History comparison detects divergences and how the versions fit into the “family tree” 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e...
  • 18. Replication & Sync 19 1-5a4... 2-ab6... 3-085... 4-7ba... 4-8bf... 5-d4e... 1-5a4... 2-ab6... 3-085...3-f57... /db1/foo /db2/foo
  • 19. Replication & Sync 20 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... /db1/foo /db2/foo 1-5a4... 2-ab6... 3-085...3-f57...
  • 20. Replication & Sync 21 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... /db1/foo /db2/foo
  • 21. Replication & Sync 22 • Not your RDBMS’ notion of replication • Transfers updates from any source DB to any target DB • Builds on earlier primitives • Leverages sequence index to determine what’s changed • Leverages hash histories to determine what’s missing on the target • Critical “anti-entropy” element in clusters • DBs are divided into partitions, copies of each partition are stored on multiple distinct nodes • Partition copies replicate with each other to ensure that documents are durably stored and that consistency is achieved ... eventually
  • 22. Why CouchDB Recap 23 • Durable append-only storage engine • Sequence tree enabling incremental processing of updates • Data structures supporting eventual consistency • Sophisticated replication & synchronization
  • 23. What’s Next? 24 • BigCouch ➜ CouchDB • Cloudant will continue development under ASF umbrella • Fewer code forks ➜ better velocity • New CouchDB web UI “Fauxton” • Better developer tooling for server-side code • Plugins for Cloudant-specific functionality • Cloudant is betting on data “at the edge”