MongoDB & Hadoop:
Providing Business Insights
Thomas Boyd
Senior Solutions Architect, MongoDB
What is MongoDB?
The leading NoSQL database

General
Purpose

2

Document
Database

OpenSource
MongoDB Document Model
RDBMS

MongoDB

{

_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department :...
What is Hadoop?
“The Apache Hadoop software library is a framework that
allows for the distributed processing of large dat...
5

Applications
CRM, ERP, Collaboration, Mobile, BI

Data Management
Online Data

Offline Data

RDBMS
RDBMS

Hadoop

EDW

...
Consideration: Online vs. Offline
Online

• Real-time
• Low-latency
• High availability
6

vs.

Offline

• Long-running
• ...
Consideration: Online vs. Offline
Online

7

vs.

Offline
Hadoop is good for…

Risk Modeling

Recommendation
Engine

Ad Targeting

Transaction
Analysis

Trade
Surveillance

Network...
MongoDB is good for…

360 Degree View
of the Customer

Fraud Detection

User Data
Management

Content
Management &
Deliver...
MongoDB and Hadoop: Complementary

• Real-time systems
• Light-weight analytical
workloads

10

• “Data Lake”
• In-depth a...
Use MongoDB+Hadoop Together

ECommerce

Analysis
MongoDB
Connector for
Hadoop

•
•
•
•
•
•
11

Products & Inventory
Real-t...
Example – Fraud Detection

Nightly
Analysis

Payments

• Online payments
processing

MongoDB
Connector for
Hadoop

• Fraud...
Customer example – Global Travel
Firm

Travel

Algorithms
MongoDB
Connector for
Hadoop

•
•
•
•

13

Flights, hotels and c...
Customer example – MetLife

Churn
Analysis

Insurance
MongoDB
Connector for
Hadoop
•
•
•
•
•

14

Insurance policies
Demog...
Customer example – Criteo

Ad-Serving

Algorithms
MongoDB
Connector for
Hadoop

•
•
•
•
•

15

Catalogs and products
User ...
What is MongoDB-Hadoop Connector?
• Java Map-Reduce, Stream Map-Reduce, Pig, &
Hive access to MongoDB
– MongoDB as input
•...
Enhancing MongoDB-Hadoop Connector
• Version 1.1.0, July 2013

• Version 1.2.0, December 2013

– Pig support

– Apache Had...
MongoDB Native Analytics
• Rich query language
• Native secondary indexes
• Geospatial indexes & search
• Text indexes & s...
Resources
Resource
White paper: Big Data: Examples and
Guidelines for the Enterprise Decision Maker

http://www.mongodb.co...
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights
Upcoming SlideShare
Loading in...5
×

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights

3,324

Published on

Join us for a webinar on how MongoDB and Hadoop can work together to solve Big Data problems in today's enterprises. We will take an in depth look at how the two technologies make real business intelligence accessible to end users. After a brief introduction to both technologies, this webinar will dive deep into the MongoDB+Hadoop Connector and how it is applied to enable new business insights.

In this webinar you will learn:

What information problems are a good fit for MongoDB and Hadoop
How to integrate the two technologies using the MongoDB+Hadoop Connector
Programming paradigms for tackling common problems

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,324
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
201
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • This is where MongoDB fits into the existing enterprise IT stackMongoDB is an operational data store used for online data, in the same way that Oracle is an operational data store. It supports applications that ingest, store, manage and even analyze data in real-time. (Compared to Hadoop and data warehouses, which are used for offline, batch analytical workloads.)
  • Webinar: MongoDB and Hadoop - Working Together to provide Business Insights

    1. 1. MongoDB & Hadoop: Providing Business Insights Thomas Boyd Senior Solutions Architect, MongoDB
    2. 2. What is MongoDB? The leading NoSQL database General Purpose 2 Document Database OpenSource
    3. 3. MongoDB Document Model RDBMS MongoDB { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] } 3
    4. 4. What is Hadoop? “The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.”* • • • • Large datasets Analytics Batch Map-Reduce *source: hadoop.apache.org 4
    5. 5. 5 Applications CRM, ERP, Collaboration, Mobile, BI Data Management Online Data Offline Data RDBMS RDBMS Hadoop EDW Infrastructure OS & Virtualization, Compute, Storage, Network Security & Auditing Management & Monitoring Enterprise IT Stack
    6. 6. Consideration: Online vs. Offline Online • Real-time • Low-latency • High availability 6 vs. Offline • Long-running • High-Latency • Availability is lower priority
    7. 7. Consideration: Online vs. Offline Online 7 vs. Offline
    8. 8. Hadoop is good for… Risk Modeling Recommendation Engine Ad Targeting Transaction Analysis Trade Surveillance Network Failure Prediction 8 Churn Analysis Search Quality Data Lake
    9. 9. MongoDB is good for… 360 Degree View of the Customer Fraud Detection User Data Management Content Management & Delivery Reference Data Product Catalogs 9 Mobile & Social Apps Machine to Machine Apps Data Hub
    10. 10. MongoDB and Hadoop: Complementary • Real-time systems • Light-weight analytical workloads 10 • “Data Lake” • In-depth analytics
    11. 11. Use MongoDB+Hadoop Together ECommerce Analysis MongoDB Connector for Hadoop • • • • • • 11 Products & Inventory Real-time recommendations Customer profile Session management Customer clickstream Fraud detection • • • • Transaction history Clickstream history Recommendation model Fraud modeling
    12. 12. Example – Fraud Detection Nightly Analysis Payments • Online payments processing MongoDB Connector for Hadoop • Fraud modeling query only Fraud Detection query only 12 Results Cache 3rd Party Data Sources
    13. 13. Customer example – Global Travel Firm Travel Algorithms MongoDB Connector for Hadoop • • • • 13 Flights, hotels and cars Real-time offers User profiles, reviews User metadata (previous purchases, clicks, views) • • • • User segmentation Offer recommendation engine Ad serving engine Bundling engine
    14. 14. Customer example – MetLife Churn Analysis Insurance MongoDB Connector for Hadoop • • • • • 14 Insurance policies Demographic data Customer web data Call center data Real-time churn detection • Customer action analysis • Churn prediction algorithms
    15. 15. Customer example – Criteo Ad-Serving Algorithms MongoDB Connector for Hadoop • • • • • 15 Catalogs and products User profiles Clicks Views Transactions • User segmentation • Recommendation engine • Prediction engine
    16. 16. What is MongoDB-Hadoop Connector? • Java Map-Reduce, Stream Map-Reduce, Pig, & Hive access to MongoDB – MongoDB as input • mongo.job.input.format=com.hadoop.MongoInputFormat • mongo.input.uri=mongodb://my-db:27017/db1.collection1 – MongoDB as output • mongo.job.output.format=com.hadoop.MongoOutputFormat • mongo.input.uri=mongodb://my-db:27017/db1.collection2 – Using MongoDB backup files • mongo.job.output.format=com.hadoop.BSONFileOutputFormat • mapred.output.dir=file:///results.bson 16
    17. 17. Enhancing MongoDB-Hadoop Connector • Version 1.1.0, July 2013 • Version 1.2.0, December 2013 – Pig support – Apache Hadoop 2.2 support – Hive support – Multiple collections as M-R – Streaming support source – Read/Write MongoDB backups – Update writes – Custom splitting support – Much more…. 17 – Multiple mongos support – Performance improvements
    18. 18. MongoDB Native Analytics • Rich query language • Native secondary indexes • Geospatial indexes & search • Text indexes & search • Aggregation framework • Javascript Map-Reduce • Client-side analytics 18
    19. 19. Resources Resource White paper: Big Data: Examples and Guidelines for the Enterprise Decision Maker http://www.mongodb.com/lp/white paper/big-data-nosql Recorded Webinar Series: Thrive with Big Data http://www.mongodb.com/lp/bigdata-series Recorded Webinar: What’s New with MongoDB Hadoop Integration http://www.mongodb.com/presenta tions/webinar-whats-newmongodb-hadoop-integration Documentation: MongoDB Connector for Hadoop http://docs.mongodb.org/ecosyste m/tools/hadoop/ Trouble Tickets http://jira.mongodb.org (project = Hadoop Integration) Subscriptions, support, consulting, training 19 Location https://www.mongodb.com/produc ts/how-to-buy
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×