Couchbase Server and IBM BigInsights: One + One = Three


Published on

Session presented at CouchConf San Francisco

Frequently the terms NoSQL and Big Data are used as synonyms. While both technologies divert from the traditional RDBMS data model and spread data across clusters of servers, the “problems” these technologies address are quite different. Hadoop, is focused on data analysis – gleaning insights from large volumes of data. NoSQL databases, focus on interactive applications – delivering high-performance, cost-effective data management for massive number of users. In this session, we share how IBM BigInsights and Couchbase Server can used together to build better applications.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Couchbase Server and IBM BigInsights: One + One = Three

  1. 1. Couchbase 2012 Couchbase Server and IBM BigInsights: One + One = Three Steve Beier Program Director, Big Data Applications & Solutions, IBM Dipti Borkar Director, Product Management, Couchbase © 2012 IBM Corporation
  2. 2. 2 kinds of database management system OLTP   Analy+cs  2 © 2012 IBM Corporation
  3. 3. 2 kinds of database management system OLTP   Analy+cs  3 © 2012 IBM Corporation
  4. 4. 2 kinds of database management system OLTP   Analy+cs  4 © 2012 IBM Corporation
  5. 5. 2 kinds of database management system Big  Users   Big  Data  5 © 2012 IBM Corporation
  6. 6. 2 kinds of database management system Simple,  fast,  elas+c   NoSQL  database   with  sub-­‐ millisecond   performance  at   scale   Map-­‐reduce  against   huge  datasets  to   cook  up  insights   and  answers  6 © 2012 IBM Corporation
  7. 7. Ad and offer targeting Ad Targeting 40  milliseconds  to   pick  the  right   offer   profiles,   raw  event  data   campaigns  /   offers,   ac:onable  insights   cooked  insights   raw  event  data   cooked  insights  7 © 2012 IBM Corporation
  8. 8. Content Recommendation Targeting content 3   oriented site targeted   recommenda:ons   1   events   relational database 2   user  profiles  8 © 2012 IBM Corporation
  9. 9. sqoopsqoop == sql RDBMS + hadoop • a data transfer tool for Hadoop • for moving data from non-Hadoop datasources (like relational databases, NoSQL) into/out-of HadoopCouchbase provides Cloudera Certified sqoopconnector9 © 2012 IBM Corporation
  10. 10. Ad Targeting Ad Targeting Platform Logs Logs Logs Couchbase Server Cluster Logs sqoop export Logs flume flow sqoop import Hadoop Cluster10 © 2012 IBM Corporation
  11. 11. Content Driven Site In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, Content Driven data behind content driven sites is shifting to Web Site Couchbase. Couchbase Server Cluster Original RDBMS Logs Logs Logs Logs Logs Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources. flume flow sqoop import sqoop export sqoop import Hadoop Cluster11 © 2012 IBM Corporation
  12. 12. Couchbase à Hadoop$ sqoop import –-connect http://couchbase-01:8091/pools --table DUMP$ sqoop import –-connect http://couchbase-01:8091/pools --table BACKFILL_512 © 2012 IBM Corporation
  13. 13. Couchbase à Hadoop$ sqoop import –-connect http://couchbase-01:8091/pools --table DUMP$ sqoop import –-connect http://couchbase-01:8091/pools --table BACKFILL_5For import, table must be either: •  DUMP: All items currently in Couchbase •  BACKFILL_n: All item mutations for n minutes13 © 2012 IBM Corporation
  14. 14. Hadoop à Couchbase$ sqoop export --connect http://couchbase-01:8091/pools --table REQUIRED_BUT_IGNORED -–export-dir HDFS_DIRECTORY_TO_EXPORT14 © 2012 IBM Corporation
  15. 15. sqoop Versionssqoop 1.4.2Cloudera CDH3 •  Ubuntu 10.10 – 11.10; later versions missing package needed for CDH3Cloudera CDH4 update 1 needed •  sqoop bug fix in Cloudera CDH4u1 required15 © 2012 IBM Corporation
  16. 16. Couchbase sqoop - Resources hadoop-couchbase-pdf.pdf16 © 2012 IBM Corporation
  17. 17. Big Data platform: Bring Together a Large Volume and Variety of Datato Find New Insights T-Mobile §  Analyzing a variety of data at enormous volumes" Multi-channel customer experience analysis §  Insights on streaming data" §  Large volume structured, semi-structure and UOIT unstructured data analysis" Detect life-threatening conditions in time to intervene Vestas Predict weather patterns to plan optimal wind turbine usage Big Data Platform Dublin City Council •  Variety Optimization and monitoring of •  Velocity public transportations •  Volume Brocade Identify network security intrusions17 © 2012 IBM Corporation © 2011 IBM Corporation
  18. 18. Green Energy: Vestas Wind Systems A/S Volume §  Weather and geographic data analysis for wind turbine and wind farm site planning §  Deployed IBM Big Data to store, manage and to analyze location- specific data §  Analyzing 2.8 petabytes of public and private weather data for each geographic location §  Reduced by 97% - from weeks to hours – the modeling time for wind forecasting information18 © 2012 IBM Corporation
  19. 19. IBM Watson Demonstrated the Power of Big Data Analytics Variety Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context andretrieving, analyzing and understanding vast amounts of information in real-time?19 © 2012 IBM Corporation
  20. 20. Big Data Analytics in Smarter Hospitals Velocity Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance IBM Data Baby youtube.com20 © 2012 IBM Corporation
  21. 21. Asian telco reduces billing costs and improves customer satisfaction. Capabilities: Stream Computing Analytic Accelerators Real-time mediation and analysis of 6B CDRs per day Data processing time reduced from 12 hrs to 1 sec Hardware cost reduced to 1/8th Proactively address issues 21 (e.g. dropped calls) impacting customer © 2012 IBM Corporation21 satisfaction.
  22. 22. Telecommunications – Analyze in real time§  A Telco processing Call Detail Records 500K/sec, 6B+ IPDRs analyzed –  6 Billion CDRs per day per day on more than 4 PBs/yr. –  Deduplicating data over 7 days sustaining 1GBps. –  Processing latency reduced from 12 hours to a few seconds§  A Telco implementing a solution to access and analyze call, internet usage and texting detail records (xDRs) in real-time –  91% reduction in time to merge data –  93% reduction in storage requirements –  85% reduction in servers used§  A Telco requiring a solution to analyze up to 25M messages per second. At these volumes, in- motion analysis is the only option –  “Streams handled at least an order of magnitude more events per second on the same hardware than competitors.” (Telco’s Chief Architect) –  Even at these volumes, Streams provided near linear scalability22 © 2012 IBM Corporation
  23. 23. Big Data is an integral part of an enterprise data platform §  Manage Big Data from the instant it enters the enterprise §  High fidelity – no changes to original format §  Available for new uses, analyses, and integrations Business Analytic Applications (e.g. Cognos, SPSS) and Solutions Big Data Applications Operational Data Store Big Data Platform IBM Big Data Solutions Client and Partner Solutions Warehouse and Appliances Big Data User Environment Developers End Users Admin. Big Data Enterprise Engine Traditional data sources Streaming Internet-scale analytics analytics Govern: Source data (Web, sensors, logs, media, etc. ) Quality, Lifecycle Management, Security, Privacy23 © 2012 IBM Corporation
  24. 24. IBM’s Big Data Platform Bringing Big Data to the Enterprise Data IBM Big Data Solutions Client and Partner Solutions Warehouse InfoSphere Warehouse Warehouse Appliances Big Data User Environments Netezza Developers End Users Administrators Master Data Mgmt InfoSphere MDM INTEGRATIONAGENTS Database Big Data Enterprise Engines DB2, Informix Content Analytics ECM Information Server Business Analytics Streaming Analytics Internet Scale Analytics Cognos & SPSS Marketing Open Source Foundational Components Unica Hadoop HBase Pig Lucene Jaql Hive Data Growth Management InfoSphere Optim24 24 © 2012 IBM Corporation
  25. 25. IBM Big Data Platform ToolsBusiness UsersData ScientistsBusiness AnalystsDevelopersAdministrators •  Determine product sentiment, intent, customer segmentation •  Execute reusable Apps to classify users, predict sales, and forecast trends •  Create spreadsheets and dashboards Analyzing big data •  Productive environment for executing analysis (cluster, rank, score with R, ML, Text) •  Create reusable analytic Apps without programming •  Dynamic open dashboard25 © 2012 IBM Corporation
  26. 26. THANK YOU dipti@couchbase.com26 © 2012 IBM Corporation