• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Real-Time Integration Between MongoDB and SQL Databases
 

Real-Time Integration Between MongoDB and SQL Databases

on

  • 1,961 views

Many companies have huge investment in Data Warehouse and BI tools and want to leverage those investments to process data collected by applications in MongoDB. For example, a company may need to blend ...

Many companies have huge investment in Data Warehouse and BI tools and want to leverage those investments to process data collected by applications in MongoDB. For example, a company may need to blend clickstream data collected by distributed MongoDB data storage with personal data from Oracle into the Data Warehouse system or Analytics platform to provide timely marketing reports. Most of the time the job requires converting a MongoDB JSON document structure into a traditional relational model. Traditional ETL (Extract Transform Load) process still needed to be developed for loading and conversion of unstructured data into traditional analytical tools or Hadoop. In this talk we discuss how to develop a real-time, scalable, fault-tolerant ETL process to integrate MongoDB with traditional RDBMS storage using the open-sourced Twitter Storm project. We will be capturing data streamed by MongoDB oplog or capped collections, transforming it into tables, rows and columns and loading it into a SQL database. We will discuss mongoDB oplog and Storm architecture. The principles discussed in the talk can be used for many other applications - like advanced analytics, continuous computations and so on. We will be using Java as our language of choice but you can use the same software stack with any language.

Statistics

Views

Total Views
1,961
Views on SlideShare
1,661
Embed Views
300

Actions

Likes
7
Downloads
0
Comments
0

6 Embeds 300

http://eugenedvorkin.com 281
http://cloud.feedly.com 9
http://www.linkedin.com 5
http://newsblur.com 2
http://feedly.com 2
http://wordpress.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Leading source of health and medical information.
  • Data is rawData is immutable, data is trueDynamic personalized marketing campaigns
  • Main data structure in stormNamed list of value where each valuecan be of any typeTyple know ho to serialize primitive data types, string and byte arrays. For any other type Register serializer for this type
  • The oplog is a capped collection that lives in a database called local on every replicating node and records all changes to the data. Every time a client writes to the primary, an entry with enough information to reproduce the write is automatically added to the primary’s oplog. Once the write is replicated to a given secondary, that secondary’s oplog also stores a record of the write. Each oplog entry is identified with a BSON timestamp, and all secondaries use the timestamp to keep track of the latest entry they’ve applied.
  • How do you now if you connected to shard cluster
  • Use mongo Oplog as a queue
  • Spout extend interface
  • Awards array in Person document – converted into 2 documents with id as of parent document Id
  • Awards array – converted into 2 documents with id as of parent document Id. Name space will be used later to insert data into correct table on SQL side
  • Instance of BasicDBList in Java
  • Flatten out your document structure – use loop or recursion to flatten it outHopefully you don’t have deeply nested documents, which against mongoDB guidelines for schema design
  • Use tickle tuples and update in batches
  • Local mode vs prod mode
  • Increasing papallelization of the bolt. Let say You want 5 bolts to process your array, because it more time consuming operation or you want more SQLWtirerBolts,Because it takes long time to insert data, then use parallelization hint parameters in bolt definition.System will create correspponding number of workers to process your request.
  • Local mode vs prod mode
  • Local mode vs prod mode

Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases Presentation Transcript