Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
©2016	Couchbase	Inc. 1
The Couchbase Connect16
mobile app
Take our in-app survey!
©2016	Couchbase	Inc.
Big Data with NoSQL, Hadoop,
Spark and Kafka
Will Gardella, Director of Product Management
2
©2016	Couchbase	Inc. 3
Will Gardella
Director of Product Management
will.gardella@couchbase.com
@WillGardella
IMAGE GOES H...
©2016	Couchbase	Inc.
©2015	Couchbase	Inc. 4
4
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Agenda
• The Big Data Big Picture
• Spark & Hadoop
• Kafka
• Couchcbase Analytics...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Where does “big” data come from?
6
MobileWeb/Cloud Internet ofThings
©2016	Couchbase	Inc.
©2015	Couchbase	Inc. 7
COUCHBASE	CONFIDENTIAL
Couchbase is addressing the requirements of Digital Eco...
©2016	Couchbase	Inc. 8
Spark & Hadoop
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
NoSQL versus Hadoop
NoSQL Hadoop NoSQL Hadoop
Overlap Compliment
NoSQL or Hadoop?...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Big Data at a Glance
Couchbase Spark Hadoop
Use cases
• Operational
• Web / Mobil...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Couchbase + Spark use cases
11
Operations Analysis
§ Recommendations
§ Next gen d...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Use Case 1: Operationalize Analytics / ML
Examples: recommend content and product...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Use Case 2: Spark connects to everything
13
DCP
KV
N1QL
Views
Adapted from: Datab...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Lambda Architecture
1
4
5
DATA
SERVE
QUERY
New Data
Stream
Analysis
All Data
Prec...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Lambda Architecture
1
4
5
DATA
SERVE
QUERY
New Data
Stream
Analysis
All Data
Prec...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Database Change Protocol (DCP)
• Innovative protocol for data sync in Couchbase S...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Couchbase Spark Connector 2.0
• Spark 2 support
• Structured streaming
• New Data...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Couchbase Spark 2.0 Connector
Features
• Automatic cluster & resource management
...
©2016	Couchbase	Inc. 19
Kafka
©2016	Couchbase	Inc.
©2015	Couchbase	Inc. 20
You might need Kafka if…
Photo Credit: Cory Doctorow
https://www.flickr.com/p...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Kafka as an industrial data sharing “backbone”
• Before Kafka After Kafka
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Couchbase & Kafka Use Cases
• Couchbase as the Master Database
• Changes in the b...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Couchbase Kafka Connector 3.0 (DP now – GA Q4 2016)
Available Now: 2.0 GA
• Kafka...
©2016	Couchbase	Inc.
©2015	Couchbase	Inc. 24
New	in	
Apache	Kafka	0.9
• One service to manage
• Unified connector config,
...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Lamba + Hadoop + Spark + Storm + Kafka
New Data
Stream
MergedView
All Data
Precom...
©2016	Couchbase	Inc. 26
Analytics
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Sneak Peek: Couchbase Analytics (DP1)
27
One stop shopping for both operations an...
©2016	Couchbase	Inc. 28
What is Couchbase Analytics?
• Extend Couchbase Platform to power real-time analytics
• Ad-hoc que...
©2016	Couchbase	Inc.©2016	Couchbase	Inc.
Operations Analytics
Couchbase Analytics and friends
BatchOnline
“Hurry!The user ...
©2016	Couchbase	Inc.
ThankYou!
30
©2016	Couchbase	Inc. 31
The Couchbase Connect16
mobile app
Take our in-app survey!
©2016	Couchbase	Inc. 32
Share your opinion on Couchbase
1. Go here: http://gtnr.it/2eRxYWn
2. Create a profile
3. Provide ...
Upcoming SlideShare
Loading in …5
×

Big Data with NoSQL, Hadoop, Spark, and Kafka – Couchbase Connect 2016

1,316 views

Published on

This session kicks off the Couchbase Connect Big Data track by answering some fundamental questions about the relationship between NoSQL and the other major Big Data technologies: Hadoop, Spark, and Kafka. Each technology addresses the “big data” challenge, but they target different parts of it. We’ll get concrete about where it makes sense to deploy NoSQL versus where it makes sense to deploy Hadoop, Spark, and Kafka. More importantly, we’ll discuss how NoSQL and the other Big Data technologies complement each other and why they’re stronger together.

Published in: Software

Big Data with NoSQL, Hadoop, Spark, and Kafka – Couchbase Connect 2016

  1. 1. ©2016 Couchbase Inc. 1 The Couchbase Connect16 mobile app Take our in-app survey!
  2. 2. ©2016 Couchbase Inc. Big Data with NoSQL, Hadoop, Spark and Kafka Will Gardella, Director of Product Management 2
  3. 3. ©2016 Couchbase Inc. 3 Will Gardella Director of Product Management will.gardella@couchbase.com @WillGardella IMAGE GOES HERE
  4. 4. ©2016 Couchbase Inc. ©2015 Couchbase Inc. 4 4
  5. 5. ©2016 Couchbase Inc.©2016 Couchbase Inc. Agenda • The Big Data Big Picture • Spark & Hadoop • Kafka • Couchcbase Analytics (Sneak Peek) 5
  6. 6. ©2016 Couchbase Inc.©2016 Couchbase Inc. Where does “big” data come from? 6 MobileWeb/Cloud Internet ofThings
  7. 7. ©2016 Couchbase Inc. ©2015 Couchbase Inc. 7 COUCHBASE CONFIDENTIAL Couchbase is addressing the requirements of Digital Economy businesses
  8. 8. ©2016 Couchbase Inc. 8 Spark & Hadoop
  9. 9. ©2016 Couchbase Inc.©2016 Couchbase Inc. NoSQL versus Hadoop NoSQL Hadoop NoSQL Hadoop Overlap Compliment NoSQL or Hadoop? NoSQL and Hadoop.
  10. 10. ©2016 Couchbase Inc.©2016 Couchbase Inc. Big Data at a Glance Couchbase Spark Hadoop Use cases • Operational • Web / Mobile • Analytics • Machine Learning • Analytics • Machine Learning Processing mode • Online • Ad Hoc • Ad Hoc • Batch • Streaming (+/-) • Batch • Ad Hoc (+/-) Low latency = < 1 ms ops Seconds Seconds Performance Highly predictable Variable Variable Users are typically… Millions of customers 100’s of analysts or data scientists 100’s of analysts or data scientists Memory-centric Memory-centric Disk-centric Big data = 10s ofTerabytes Petabytes Petabytes ANALYTICALOPERATIONAL
  11. 11. ©2016 Couchbase Inc.©2016 Couchbase Inc. Couchbase + Spark use cases 11 Operations Analysis § Recommendations § Next gen data warehousing § Predictive analytics § Fraud detection § Catalog § Customer 360 + IOT § Personalization § Mobile applications
  12. 12. ©2016 Couchbase Inc.©2016 Couchbase Inc. Use Case 1: Operationalize Analytics / ML Examples: recommend content and products, spot fraud or spam • Data scientists train machine learning models • Load results into Couchbase so end users can interact with them online Hadoop Machine Learning Models Data Warehouse Historical Data
  13. 13. ©2016 Couchbase Inc.©2016 Couchbase Inc. Use Case 2: Spark connects to everything 13 DCP KV N1QL Views Adapted from: Databricks – NotYour Father’s Database https://www.brighttalk.com/webcast/12891/196891
  14. 14. ©2016 Couchbase Inc.©2016 Couchbase Inc. Lambda Architecture 1 4 5 DATA SERVE QUERY New Data Stream Analysis All Data PrecomputeViews (Map Reduce) Process Stream Incremental Views Batch Recompute Real-Time Increment Batch Layer Serving Layer Speed Layer 2 BATCH 3 SPEED
  15. 15. ©2016 Couchbase Inc.©2016 Couchbase Inc. Lambda Architecture 1 4 5 DATA SERVE QUERY New Data Stream Analysis All Data PrecomputeViews (Map Reduce) Process Stream Incremental Views Batch Recompute Real-Time Increment Batch Layer Serving Layer Speed Layer 2 BATCH 3 SPEED
  16. 16. ©2016 Couchbase Inc.©2016 Couchbase Inc. Database Change Protocol (DCP) • Innovative protocol for data sync in Couchbase Server • Efficient data sync, memory to memory • Removes slower disk-IO from the data sync path • Improves latencies to replication for data durability • Powers data replication & XDCR for HA / DR, maintains indexes, and more • Big data connectors use this as a fast sync mechanism 16
  17. 17. ©2016 Couchbase Inc.©2016 Couchbase Inc. Couchbase Spark Connector 2.0 • Spark 2 support • Structured streaming • New Databricks cloud analytics support • Efficiency • Improved DCP handling memory allocation creates less garbage • Easier management • Tolerates Couchbase cluster topology changes (e.g. add nodes & rebalance) • …except rollbacks 17
  18. 18. ©2016 Couchbase Inc.©2016 Couchbase Inc. Couchbase Spark 2.0 Connector Features • Automatic cluster & resource management • Create RDDs from KV, N1QL,Views • Create DStreams from DCP feeds • Persist RDDs and DStreams • Support SparkSQL, Datasets, DataFrames, and Structured Streaming
  19. 19. ©2016 Couchbase Inc. 19 Kafka
  20. 20. ©2016 Couchbase Inc. ©2015 Couchbase Inc. 20 You might need Kafka if… Photo Credit: Cory Doctorow https://www.flickr.com/photos/doctorow/14638938602
  21. 21. ©2016 Couchbase Inc.©2016 Couchbase Inc. Kafka as an industrial data sharing “backbone” • Before Kafka After Kafka
  22. 22. ©2016 Couchbase Inc.©2016 Couchbase Inc. Couchbase & Kafka Use Cases • Couchbase as the Master Database • Changes in the bucket update data elsewhere • Triggers / Event Handling • Handle events like deletions / expirations externally • E.g. expiration & replicated session tokens • Real-time Data Integration • Extract from Couchbase, transform and load data into another system • Real-time Data Processing • Extract from a bucket, process in real-time and load back to another Couchbase bucket
  23. 23. ©2016 Couchbase Inc.©2016 Couchbase Inc. Couchbase Kafka Connector 3.0 (DP now – GA Q4 2016) Available Now: 2.0 GA • Kafka Producer or Consumer • Stream events • Filters • Transform events • Sample Producer & Consumer • Improved DCP – less garbage collection, more memory efficient 23 Code: https://github.com/couchbase/couchbase-kafka-connector/ 3.0 (DP now - GA Q4 2016) § Adopts Kafka Connect (Apache Kafka 0.9+) § Dynamic topology support / rebalance Future § Rollback handling
  24. 24. ©2016 Couchbase Inc. ©2015 Couchbase Inc. 24 New in Apache Kafka 0.9 • One service to manage • Unified connector config, control, monitoring, metrics • Easy to set up as a self- service system for developers, ETL team • Confluent dashboards visualize the complete data pipeline
  25. 25. ©2016 Couchbase Inc.©2016 Couchbase Inc. Lamba + Hadoop + Spark + Storm + Kafka New Data Stream MergedView All Data PrecomputeViews (Map Reduce) Process Stream Incremental Views Batch Recompute Real-Time Increment Merge Batch Layer Serving Layer Speed Layer
  26. 26. ©2016 Couchbase Inc. 26 Analytics
  27. 27. ©2016 Couchbase Inc.©2016 Couchbase Inc. Sneak Peek: Couchbase Analytics (DP1) 27 One stop shopping for both operations and analytics Couchbase Query Optimized for operational (narrow) queries Many queries Each touches a little data Couchbase Analytics Fewer queries Each touches a lot of data Optimized for analytical (big) queries
  28. 28. ©2016 Couchbase Inc. 28 What is Couchbase Analytics? • Extend Couchbase Platform to power real-time analytics • Ad-hoc queries (“Ask me anything!”) • Workload isolation • Independent scaling • Common programming model & data model • Unified management • Fast data synchronization Data Query Index Search AnalyticsTransport Unified Administration Unified Declarative Programming Interface
  29. 29. ©2016 Couchbase Inc.©2016 Couchbase Inc. Operations Analytics Couchbase Analytics and friends BatchOnline “Hurry!The user is waiting!” “Better cache this inCouchbase…” Key Value CB Query CB Analytics Spark Hadoop 𝜇s ms 30s Minutes+ 1 record Trillions of records Start up overhead Job-based Parallel query
  30. 30. ©2016 Couchbase Inc. ThankYou! 30
  31. 31. ©2016 Couchbase Inc. 31 The Couchbase Connect16 mobile app Take our in-app survey!
  32. 32. ©2016 Couchbase Inc. 32 Share your opinion on Couchbase 1. Go here: http://gtnr.it/2eRxYWn 2. Create a profile 3. Provide feedback (~15 minutes)

×