Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Streaming with
Apache Kafka &
MongoDB
AndrewMorgan–MongoDBProduct
Marketing
DavidTucker–Director,PartnerEngineering
a...
Agenda
Target Audience
Apache Kafka
MongoDB
Integrating MongoDB and Kafka
Kafka – What’s Next
Next Steps
Target Audience
Target Audience
Target Audience
Target Audience
Target Audience
Target Audience
Apache Kafka /
Confluent Platform
What does Kafka
do?
Producers
Consumers
Kafka Connect
Kafka Connect
Topic
Your interfaces to the world
Connected to your s...
What is Streaming Data
Synchronous Req/Response
0 – 100s ms
Near Real Time
> 100s ms
Offline Batch
> 1 hour
KAFKA
Stream D...
Confluent’s Offerings
Core
Connect
Streams
Java Client
Kafka
Confluent Platform EnterpriseConfluent Platform
Stream Monito...
Confluent Platform: It’s Kafka ++
Feature Benefit Apache Kafka Confluent Platform
Confluent Platform
Enterprise
Apache Kaf...
Common Kafka Use Cases
Data transport and integration
• Log data
• Database changes
• Sensors and device data
• Monitoring...
People Using Kafka Today
Financial Services
Entertainment & Media
Consumer Tech
Travel & Leisure
Enterprise Tech
Telecom R...
MongoDB
Relational
Expressive Query Language
& Secondary Indexes
Strong Consistency
Enterprise Management
& Integrations
The World Has Changed
Data Risk Time Cost
NoSQL
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive Query Language
& Secondary Indexes
Str...
Nexus Architecture
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive Query Language
& Secondar...
Integrating MongoDB
and Kafka
Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze...
Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze...
Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze...
Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze...
Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze...
Where K-Streams Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
5
3
4
123
...
Topic C
Analyze
4
9
6
123
...
Top...
MongoDB As a Kafka Producer
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Framew...
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Millisecond latency. Expressi...
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Millisecond latency. Expressi...
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Millisecond latency. Expressi...
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Millisecond latency. Expressi...
https://www.mongodb.c
om/presentations/repla
cing-traditional-
technologies-mongodb-
single-platform-all-
financial-data-a...
http://www.slideshare.n
et/danharvey/change-
data-capture-with-
mongodb-and-kafka
Kafka – What’s Next
Kafka Connectors
• Confluent-supported connectors (included in CP)
• Community-written connectors (just a sampling)
JDBC
Kafka Futures
• Apache Core
• Admin API (KIP-4)
• Exactly-once delivery semantics
• Time-based topic indexing
• Kafka Stre...
Next Steps
MongoDB Atlas
Database as a service for MongoDB
MongoDB Atlas is…
• Automated: The easiest way to build, launch, and scale...
MongoDB Atlas Features
• Spin up a cluster in
seconds
• Replicated & always-
on deployments
• Fully elastic: scale
out or ...
MongoDB Enterprise Advanced
• MongoDB Ops
Manager or
MongoDB Cloud
Manager Premium
• MongoDB Compass
• MongoDB
Connector f...
Resources
• Data Streaming with Apache Kafka & MongoDB
• https://www.mongodb.com/collateral/data-streaming-with-apache-
ka...
Old Billingsgate, London
15th November
mongodb.com/europe
Use my discount code for 20% off: andrewmorgan20
Document Data Model
Relational MongoDB
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco"...
Document Model Benefits
{
customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
n...
Rich Functionality
MongoDB
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…”...
MongoDB Technical Capabilities
Application
Driver
Mongos
Primary
Secondary
Secondary
Shard 1
Primary
Secondary
Secondary
S...
MongoDB Use Cases
Single View Internet of Things Mobile Real-Time Analytics
Catalog Personalization Content Management
Data Streaming with Apache Kafka & MongoDB
Upcoming SlideShare
Loading in …5
×

Data Streaming with Apache Kafka & MongoDB

0 views

Published on

Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.

Published in: Software
  • Be the first to comment

Data Streaming with Apache Kafka & MongoDB

  1. 1. Data Streaming with Apache Kafka & MongoDB AndrewMorgan–MongoDBProduct Marketing DavidTucker–Director,PartnerEngineering andAlliancesatConfluent 13th September2016
  2. 2. Agenda Target Audience Apache Kafka MongoDB Integrating MongoDB and Kafka Kafka – What’s Next Next Steps
  3. 3. Target Audience
  4. 4. Target Audience
  5. 5. Target Audience
  6. 6. Target Audience
  7. 7. Target Audience
  8. 8. Target Audience
  9. 9. Apache Kafka / Confluent Platform
  10. 10. What does Kafka do? Producers Consumers Kafka Connect Kafka Connect Topic Your interfaces to the world Connected to your systems in real time
  11. 11. What is Streaming Data Synchronous Req/Response 0 – 100s ms Near Real Time > 100s ms Offline Batch > 1 hour KAFKA Stream Data Platform Search RDBMS Apps Monitoring Real-time AnalyticsNoSQL Stream Processing HADOOP Data Lake Impala DWH Hive Spark Map-Reduce
  12. 12. Confluent’s Offerings Core Connect Streams Java Client Kafka Confluent Platform EnterpriseConfluent Platform Stream MonitoringMore Clients Message DeliveryREST Proxy Stream MonitoringSchema Registry Connector ManagementPre-Built Connectors
  13. 13. Confluent Platform: It’s Kafka ++ Feature Benefit Apache Kafka Confluent Platform Confluent Platform Enterprise Apache Kafka High throughput, low latency, high availability, secure distributed message system Kafka Connect Advanced framework for connecting external sources/destinations into Kafka Java Client Provides easy integration into Java applications Kafka Streams Simple library that enables streaming application development within the Kafka framework Additional Clients Supports non-Java clients; C, C++, Python, etc. REST Proxy Provides universal access to Kafka from any network connected device via HTTP Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable Pre-Built Connectors HDFS, JDBC and other connectors fully Certified and fully supported by Confluent Confluent Control Center Includes Connector Management and Stream Monitoring Support Enterprise class support to keep your Kafka environment running at top performance Community Community 24x7x365 Free Free Subscription
  14. 14. Common Kafka Use Cases Data transport and integration • Log data • Database changes • Sensors and device data • Monitoring streams • Call data records • Stock ticker data Real-time stream processing • Monitoring • Asynchronous applications • Fraud and security
  15. 15. People Using Kafka Today Financial Services Entertainment & Media Consumer Tech Travel & Leisure Enterprise Tech Telecom Retail
  16. 16. MongoDB
  17. 17. Relational Expressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations
  18. 18. The World Has Changed Data Risk Time Cost
  19. 19. NoSQL Scalability & Performance Always On, Global Deployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations
  20. 20. Nexus Architecture Scalability & Performance Always On, Global Deployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations
  21. 21. Integrating MongoDB and Kafka
  22. 22. Where MongoDB Fits Prod 3 2 4 123 ... Topic A Prod 9 6 7 123 ... Topic B Filter Filter Merge 5 3 4 123 ... Topic C Analyze 4 9 6 123 ... Topic D Take Action Take Action
  23. 23. Where MongoDB Fits Prod 3 2 4 123 ... Topic A Prod 9 6 7 123 ... Topic B Filter Filter Merge 5 3 4 123 ... Topic C Analyze 4 9 6 123 ... Topic D Take Action Store Results Operational Database
  24. 24. Where MongoDB Fits Prod 3 2 4 123 ... Topic A Prod 9 6 7 123 ... Topic B Filter Filter Merge 5 3 4 123 ... Topic C Analyze 4 9 6 123 ... Topic D Take Action Store Results Key Events Operational Database
  25. 25. Where MongoDB Fits Prod 3 2 4 123 ... Topic A Prod 9 6 7 123 ... Topic B Filter Filter Merge 5 3 4 123 ... Topic C Analyze 4 9 6 123 ... Topic D Take Action Store Results Key Events Operational Database
  26. 26. Where MongoDB Fits Prod 3 2 4 123 ... Topic A Prod 9 6 7 123 ... Topic B Filter Filter Merge 5 3 4 123 ... Topic C Analyze 4 9 6 123 ... Topic D Take Action Store Results Key Events Operational Database Reference Data
  27. 27. Where K-Streams Fits Prod 3 2 4 123 ... Topic A Prod 9 6 7 123 ... Topic B 5 3 4 123 ... Topic C Analyze 4 9 6 123 ... Topic D Take Action Store Results Key Events Operational Database Reference Data Kafka Streams
  28. 28. MongoDB As a Kafka Producer
  29. 29. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Kafka Streams
  30. 30. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Configure where to land incoming data Distributed Processing Frameworks Kafka Streams
  31. 31. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Raw data processed to generate analytics models Distributed Processing Frameworks Kafka Streams
  32. 32. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake MongoDB exposes analytics models to operational apps. Handles real time updates Distributed Processing Frameworks Kafka Streams
  33. 33. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Compute new models against MongoDB & HDFS Distributed Processing Frameworks Kafka Streams
  34. 34. https://www.mongodb.c om/presentations/repla cing-traditional- technologies-mongodb- single-platform-all- financial-data-ahl
  35. 35. http://www.slideshare.n et/danharvey/change- data-capture-with- mongodb-and-kafka
  36. 36. Kafka – What’s Next
  37. 37. Kafka Connectors • Confluent-supported connectors (included in CP) • Community-written connectors (just a sampling) JDBC
  38. 38. Kafka Futures • Apache Core • Admin API (KIP-4) • Exactly-once delivery semantics • Time-based topic indexing • Kafka Streams • Exactly-once processing semantics • Interactive Queries: enable real-time sharing of application state with other applications • Confluent Platform Enterprise • Multi-cluster views and alerting added to Control Center
  39. 39. Next Steps
  40. 40. MongoDB Atlas Database as a service for MongoDB MongoDB Atlas is… • Automated: The easiest way to build, launch, and scale apps on MongoDB • Flexible: The only database as a service with all you need for modern applications • Secured: Multiple levels of security available to give you peace of mind • Scalable: Deliver massive scalability with zero downtime as you grow • Highly available: Your deployments are fault-tolerant and self-healing by default • High performance: The performance you need for your most demanding workloads
  41. 41. MongoDB Atlas Features • Spin up a cluster in seconds • Replicated & always- on deployments • Fully elastic: scale out or up in a few clicks with zero downtime • Automatic patches & simplified upgrades for the newest MongoDB features • Authenticated & encrypted • Continuous backup with point-in-time recovery • Fine-grained monitoring & custom alerts Safe & SecureRun for You • On-demand pricing model; billed by the hour • Multi-cloud support (AWS available with others coming soon) • Part of a suite of products & services designed for all phases of your app; migrate easily to different environments (private cloud, on- prem, etc) when needed No Lock-In Database as a service for MongoDB
  42. 42. MongoDB Enterprise Advanced • MongoDB Ops Manager or MongoDB Cloud Manager Premium • MongoDB Compass • MongoDB Connector for BI • Encrypted Storage Engine • LDAP / Kerberos Integration • DDL & DML Auditing • FIPS 140-2 Support SecurityTooling • 24 x 7 Support • 1 hr SLA • Emergency Patches • Customer Success Program • On-Demand Training Support License • Commercial License
  43. 43. Resources • Data Streaming with Apache Kafka & MongoDB • https://www.mongodb.com/collateral/data-streaming-with-apache- kafka-and-mongodb • Implementing a Kafka Consumer for MongoDB • https://www.mongodb.com/blog/post/mongodb-and-data-streaming- implementing-a-mongodb-kafka-consumer • Tailing the Oplog on a sharded MongoDB Cluster • https://www.mongodb.com/blog/post/tailing-mongodb-oplog-sharded- clusters
  44. 44. Old Billingsgate, London 15th November mongodb.com/europe Use my discount code for 20% off: andrewmorgan20
  45. 45. Document Data Model Relational MongoDB { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { number : “1-212-777-1212”, dnc : true, type : “home” }, number : “1-212-777-1213”, type : “cell” }] } Customer ID First Name Last Name City 0 John Doe New York 1 Mark Smith San Francisco 2 Jay Black Newark 3 Meagan White London 4 Edward Daniels Boston Phone Number Type DNC Customer ID 1-212-555-1212 home T 0 1-212-555-1213 home T 0 1-212-555-1214 cell F 0 1-212-777-1212 home T 1 1-212-777-1213 cell (null) 1 1-212-888-1212 home F 2
  46. 46. Document Model Benefits { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { number : “1-212-777-1212”, dnc : true, type : “home” }, number : “1-212-777-1213”, type : “cell” }] } Agility and flexibility Data model supports business change Rapidly iterate to meet new requirements Intuitive, natural data representation Eliminates ORM layer Developers are more productive Reduces the need for joins, disk seeks Programming is more simple Performance delivered at scale
  47. 47. Rich Functionality MongoDB Expressive Queries • Find anyone with phone # “1-212…” • Check if the person with number “555…” is on the “do not call” list Geospatial • Find the best offer for the customer at geo coordinates of 42nd St. and 6th Ave Text Search • Find all tweets that mention the firm within the last 2 days Aggregation • Count and sort number of customers by city Native Binary JSON support • Add an additional phone number to Mark Smith’s without rewriting the document • Select just the mobile phone number in the list • Sort on the modified date { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { number : “1-212-777-1212”, dnc : true, type : “home” }, { number : “1-212-777-1213”, type : “cell” }] } Left outer join ($lookup) • Query for all San Francisco residences, lookup their transactions, and sum the amount by person
  48. 48. MongoDB Technical Capabilities Application Driver Mongos Primary Secondary Secondary Shard 1 Primary Secondary Secondary Shard 2 … Primary Secondary Secondary Shard N db.customer.insert({…}) db.customer.find({ name: ”John Smith”}) 1.Dynamic Document Schema { name: “John Smith”, date: “2013-08-01”, address: “10 3rd St.”, phone: { home: 1234567890, mobile: 1234568138 } } 2. Native language drivers 4. High performance - Data locality - Indexes - RAM 3. High availability - Replica sets 5. Horizontal scalability - Sharding … …
  49. 49. MongoDB Use Cases Single View Internet of Things Mobile Real-Time Analytics Catalog Personalization Content Management

×