MongoDB & Hadoop - Understanding Your Big Data


Published on

Big Data is the evolution of supercomputing for commercial enterprise and governments. Originally the domain of companies operating at Internet scale, today Big Data connects organizations of all sizes with discovery about their patterns, and insights into their business.

But understanding the differences between the plethora of new technologies can be daunting. Graph / columnar / key value store / document are all called NoSQL, but which is best? How does Hadoop play in this ecosystem - its low cost and high efficiency have made it very popular, but how does it fit?

In this webinar, we will explore:

The full spectrum of Big Data
Hadoop and MongoDB: friends or frenemies?
Differences between Systems of Record and Systems of Engagement
MongoDB customer examples of Systems of Engagement

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • MongoDB provides agility, scalability, and performance without sacrificing the functionality of relational databases, like full index support and rich queriesIndexes: secondary, compound, text search, geospatial, and more
  • We have all these fantastic machines… they give the same metrics they used to, but now they transmit the data. We have metrics about metrics, and we need a place to store the data. We need a place to understand what the data means.
  • This is where MongoDB fits into the existing enterprise IT stackMongoDB is an operational data store used for online data, in the same way that Oracle is an operational data store. It supports applications that ingest, store, manage and even analyze data in real-time. (Compared to Hadoop and data warehouses, which are used for offline, batch analytical workloads.)
  • Makes MongoDB a Hadoop-enabled file systemRead and write to live data, in-placeCopy data between Hadoop and MongoDBUses MongoDB indexes to filter dataFull support for data processingHiveMapReducePigStreaming
  • What each of these has in common is that they’re retrospective: they’re about looking at the past to help predict the future. The learnings from these Hadoop applications end up being applied by a different technology. This is where MongoDB comes in.
  • Customer Data Management (e.g., Customer Relationship Management, Biometrics, User Profile Management)Product and Asset Catalogs (e.g., eCommerce, Inventory Management)Social and Collaboration Apps: (e.g., Social Networks and Feeds, Document and Project Collaboration Tools)Mobile Apps (e.g., for Smartphones and Tablets) Content Management (e.g, Web CMS, Document Management, Digital Asset and Metadata Management)Internet of Things / Machine to Machine (e.g., mHealth, Connected Home, Smart Meters)Security and Fraud Apps (e.g., Fraud Detection, Cyberthreat Analysis)DbaaS (Cloud Database-as-a-Service)Data Hub (Aggregating Data from Multiple Sources for Operational or Analytical Purposes)Big Data (e.g., Genomics, Clickstream Analysis, Customer Sentiment Analysis)
  • MongoDB & Hadoop - Understanding Your Big Data

    1. 1. Hadoop & MongoDB Understanding your Big Data
    2. 2. 2 MongoDB World
    3. 3. 3 Speakers Jnan Dash Senior Advisor Kelly Stirman Director of Products
    4. 4. 4 • Last 12 years (2002-Now) - Executive Consultant, on the board and advisory board of several new software companies including Big Data players such as MongoDB • 10 Years (1992-2002) – Oracle, Group Vice President, Systems Architecture and Technology, responsible for the server product planning and rollout • 16 years (1975-1992) – IBM, Planner, architect, and development manager for DB2 product line at Silicon Valley Lab and Austin Lab. Head of IBM‟s Database architecture, strategy, and technology Jnan Dash
    5. 5. 5 • Finally, some real innovation in DBMS • MongoDB momentum is unprecedented! • The changing landscape needs MongoDB – “Internet scale” distributed operations + highly flexible data model for agile development + open source • Perfect fit for cloud, mobility, and big data Why am I excited about MongoDB?
    6. 6. 6 • Big Data - Observations • Evolution of Database Technology • Hadoop+MongoDB • Customer Examples • Roadmap • Summary Agenda
    7. 7. 7 1. Thousand years ago – Experimental Science Description of natural phenomenon 2. Last few hundred years – Theoretical Science Newton‟s Laws, Maxwell‟s Equation,.. 3. Last few decades – Computational Science Simulation of complex phenomena 4. Today – Data-intensive Science Scientists overwhelmed with data deluge Unify theory, experiment & simulation The Fourth Paradigm
    8. 8. 8 Internet Scale Commercial Supercomputing • Originated with companies operating at Internet scale (to process ever increasing #users and data) – Yahoo in the 1990s, then Google, Facebook, Twitter – They needed to do it quickly, economically, and affordably at scale • Hadoop is the first commercial supercomputing software platform – Works at scale, affordable at scale • HPC was used for meteorology and engineering scientific super computing. Big data is commercial equivalent of HPC – Less about equations, more about discovery, patterns • Many technologies have been around for decades • Clustering • Parallel processing • Distributed file systems
    9. 9. 9 Big Data: 3V’s
    10. 10. 10 Some Make it 4V’s
    11. 11. 11 What’s driving Big Data - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time
    12. 12. 12 Big Data – the full spectrum Transaction Processing Analytical Processing Data Mining, Visualiz ation, and Integration Tools RDBMS OLAP/DW DW Appliance Hadoop, Im pala,.. NoSQL NewSQL, In - Memory, Str eam... Online/Realtime Offline/Batch
    13. 13. 13 Hadoop Ecosystem Programming Languages Computation Object Storage Zookeeper (Coordination) Core Apache Hadoop Related Apache Projects HDFS (Hadoop Distributed File System) MapReduce (Distributed Programing Framework) Hive (SQL) Pig (Data Flow) HBase (Wide Column Storage) HCatalog (Meta Data) HMS (Management) Table Storage
    14. 14. Database Technology Evolution
    15. 15. 15 Data Management over the years 1960’s File Systems 1970’s 1st Generation DBMS Data as Shared Resource 1980’s Relational Technology Ease of Query 1990’s New data types OLAP/DW Web Support Unstructured Data 2005+ Big Data Post-PC, Data Deluge, 3Vs, NoSQL
    16. 16. 16 Operational vs. Analytics 2010 RDBMS Key-Value/ Wide-column OLAP/DW Hadoop 2000 RDBMS OLAP/DW 1990 RDBMS Operational Database Data warehouse Document DB NoSQL
    17. 17. 17 MongoDB Features • JSON Document Model with Dynamic Schemas • Auto-Sharding for Horizontal Scalability • Text Search • Aggregation Framework and MapReduce • Full, Flexible Index Support and Rich Queries • Native Replication for High Availability • Advanced Security • Large Media Storage with GridFS
    18. 18. 18 Documents are Rich Data Structures { first_name: „Paul‟, surname: „Miller‟, cell: „+447557505611‟ city: „London‟, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: „Bentley‟, year: 1973, value: 100000, … }, { model: „Rolls Royce‟, year: 1965, value: 330000, … } } } Fields can contain an array of sub-documents Fields Typed field values Fields can contain arrays
    19. 19. 19 Machine Generated Data
    20. 20. 20 • Hundreds of thousands of records per second • Fast response required • Sometimes all data kept, sometimes just summary • Horizontal scalability required Fast Moving Data
    21. 21. 21 • A machine generates a specific kind of data • The data model is unlikely to change • But there are so many different machines… • Queryability across all types Data is Structured, but Varied…
    22. 22. 22 • Event data written multiple times per second, minute, or hour • Tracking progression of metrics over time Time Series Data
    23. 23. 23 Do More With Your Data MongoDB Rich Queries • Find Paul’s cars • Find everybody in London with a car built between 1970 and 1980 Geospatial • Find all of the car owners within 5km of Trafalgar Sq. Text Search • Find all the cars described as having leather seats Aggregation • Calculate the average value of Paul’s car collection Map Reduce • What is the ownership pattern of colors by geography over time? (is purple trending up in China?) { first_name: „Paul‟, surname: „Miller‟, city: „London‟, location: [51.524,-0.087], cars: [ { model: „Bentley‟, year: 1973, value: 100000, … }, { model: „Rolls Royce‟, year: 1965, value: 330000, … } } }
    24. 24. Hadoop & MongoDB
    25. 25. 25 Enterprise Big Data Stack EDWHadoop Management&Monitoring Security&Auditing RDBMS CRM, ERP, Collaboration, Mobile, BI OS & Virtualization, Compute, Storage, Network RDBMS Applications Infrastructure Data Management Online Data Offline Data
    26. 26. 26 MongoDB & Hadoop • Multi-source analytics • Interactive & Batch • Data lake • Online, Real-time • High concurrency & HA • Live analytics Operational Analytical MongoDB Connector for Hadoop
    27. 27. 27 Hadoop Is Good for… Risk Modeling Churn Analysis Recommendation Modeling Ad Targeting Transaction Analysis Trade Surveillance Network Failure Prediction Search Quality Data Lake
    28. 28. 28 MongoDB Is Good for… Single View Mobile Apps Fraud Detection Customer Data Management Content Management & Delivery Database-as-a- Service Product & Asset Catalogs Internet of Things Social & Collaboration
    29. 29. Customer Examples
    30. 30. 30 Many more examples Big Data Product & Asset Catalogs Security & Fraud Internet of Things Database-as-a- Service Mobile Apps Customer Data Management Single View Social & Collaboration Content Management Intelligence Agencies Top Investment and Retail Banks Top US Retailer Top Global Shipping Company Top Industrial Equipment Manufacturer Top Media Company Top Investment and Retail Banks
    31. 31. 31 MongoDB Enterprise Value
    32. 32. 32 • Makes MongoDB a Hadoop-enabled file system • Full use of MongoDB‟s indexes • Read and write to live data, in-place • Copy data between Hadoop and MongoDB • Full support for data processing – Hive – MapReduce – Pig – Streaming – EMR MongoDB+Hadoop Connector MongoDB Connector for Hadoop
    33. 33. 33 Customer Example – MetLife Customer Service • Insurance policies • Demographic data • Customer web data • Call center data • Real-time churn detection • Customer action analysis • Churn prediction algorithms Churn Analysis MongoDB Connector for Hadoop
    34. 34. 34 Customer Example - eCommerce Travel • Flights, hotels and cars • Real-time offers • User profiles, reviews • User metadata (previous purchases, clicks, views) • User segmentation • Offer recommendation engine • Ad serving engine • Bundling engine Algorithms MongoDB Connector for Hadoop
    35. 35. 35 Roadmap Capability Today Soon Connectivity Custom Centralized Administration MongoDB  Hadoop Dynamic reads Automated Snapshots BSON Support MapReduce, Hive, Pig Impala, Tez, Spark Hadoop  MongoDB Dynamic writes Bulk Loader
    36. 36. 36 • Big Data covers a wide spectrum – Volume, Velocity, Variety – Hence the mythical equation Big Data = Hadoop • Enterprises are more concerned about Variety – MongoDB provides the best platform • Hadoop and MongoDB are complimentary – MongoDB for operational workloads – Hadoop for analytical workloads Summary