Published on

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • One way to think about what Streams computing can do for your data is to think back to innovations we've seen in the past. If you look at the industrial revolution, for example, we got very good at making things. We got better and better hardware, better and better machines, then came the invention of electricity, the invention of steam engines and so forth before that. But what really broke loose the manufacturing, and made it the extremely efficient capability we have today, was assembly line technology: realizing that you can do things in a continuous way and have multiple steps along the path. And so that's really what Stream computing is bringing to you, multiple steps of processing along the path as the data's flowing through.   This chart shows just this: data comes in element by element, flowing through a set of operations, like in an assembly line. For example, perhaps somebody's putting the wheels on a car, maybe somebody's bolting down the hood; these are individual operations. You may be doing filtering, you may be doing aggregation, you may be scoring against a model that's been built in BigInsights, but it's element by element, continuously passing through the infrastructure.   **** The point here isn’t to talk about this technology yet from a product perspective, but I like to frame velocity and note that these are real IBM examples today. ****   In Streams, in the listening step we actually are processing each and every one of them. So whatever processing has to be done or would've been done by putting it into storage - into the database and so forth, we're doing that on the wire. Today, the data is (optionally) going on the disk and available for the rest of the back end infrastructure and (BI) and so forth, but it's executing the processing while it's still on the wire, which gives you get tremendous efficiencies there for problems that are suited for streaming since you aren't going through the extra steps of going to disk and coming back out and maybe going to multiple steps along the stage of processing. Instead, doing it all in the continuous pipeline methodology.   Streams is all about analyzing Data in Motion and I ’ m going to talk more about this in a moment, but when we talked about the velocity of data earlier in this presentation, you may have asked yourself how can you analyze this data very, very quickly? And in the IBM Big Data technology stack (again, which we ’ ll talk about) we have an integrated technology for this called InfoSphere Streams.   +CLICK+ +CLICK+ In this example, an IPDR, it ’ s like a CDR (a call detail record for the internet), so being able to analyze half a million of these a second, over six billion of these a day, four petabytes a year of IPDRs, sustaining one gigabyte per second throughput is going to provide you with a lot of analytical power. And why would you want to analyze this kind of stuff? I like to call this data exhaust and it falls under a hot Big Data topic called Log Analysis. If you had the ability to store and analyze this data you could develop a corpus of information to build a more resilient network and trouble shooting easier and the opportunity to gain much more customer insight about what items they browse over, and what they buy, and so on . Now, bringing the thing you learn about your network to analyze these IPDRs as they hit the switch gives you the ability to figure out if things are about to turn bad before they do. +CLICK+ Finally, consider the fact that data that isn’t stored to disk doesn’t have to undergo retention policies, this is a tremendous opportunity for business to gain insight into the data without taking on the expense or requirements to store the data for a specific retention period.
  • Key Points Big Data platform is built upon open source. We’ve embraced open source movement because we believe the Hadoop technology is the correct one to address internet scale analytics. But our approach is to mature and build upon that technology for an enterprise class platform. Open source – built on Hadoop (map reduce, HDFS), HBase (Hadoop database), Pig (analysis of large data sets, high level language for data analysis programs), Lucene (full text search), Jawl (query language for Javascript Object Notation) We’ve matured it with two enterprise engines for processing large volumes of data and analyzing a variety of data. Streaming analytics is designed to manage stream flows and apply various analytics – mining, mathematical, video, etc. – against that streaming data. Internet scale analytics is designed to store data at rest, as-is, and apply analytics, such as text analytics against that data set. User Environments – This is an important part of maturing the platform and exposing the power of big data to existing resources, not just specialist programmers who can write map reduce programs. The develop environment is designed to provide a mature environment for developing and testing Big Data analytics and applications. There are end-user visualization capabilities to explore the data and analyze it. Integration – this is an important aspect of the BD platform – it had to be integrated in order to “bring big data to the enterprise” – the insight has to be integrated to warehouses, databases, applications, etc. One of the key vehicles for doing that is Information Integration – which includes governing that data. Proof Points & Stats Tera Echos – ‘our developers can deliver apps 45% faster due to the agility of the streams processing language’ – shows how a mature development environment and language speeds development of BD apps.
  • CCSF12_IBM_Biginsights_with_Couchbase

    1. 1. Couchbase 2012 Couchbase Server and IBM BigInsights: One + One = Three Steve Beier Program Director, Big Data Applications & Solutions, IBM Dipti Borkar Director, Product Management, Couchbase © 2012 IBM Corporation
    2. 2. 2 kinds of database management system OLTP Analytics2 © 2012 IBM Corporation
    3. 3. 2 kinds of database management system OLTP Analytics3 © 2012 IBM Corporation
    4. 4. 2 kinds of database management system OLTP Analytics4 © 2012 IBM Corporation
    5. 5. 2 kinds of database management system Big Users Big Data5 © 2012 IBM Corporation
    6. 6. 2 kinds of database management system Simple, fast, elastic NoSQL database with sub- millisecond performance at scale Map-reduce against huge datasets to cook up insights and answers6 © 2012 IBM Corporation
    7. 7. Ad and offer targeting Ad Targeting 40 milliseconds to pick the right offer profiles, raw event data campaigns / offers, actionable insights cooked insights raw event data cooked insights7 © 2012 IBM Corporation
    8. 8. Content Recommendation Targeting content 3 oriented site targeted recommendations 1 events relational database 2 user profiles8 © 2012 IBM Corporation
    9. 9. sqoopsqoop == sql RDBMS + hadoop • a data transfer tool for Hadoop • for moving data from non-Hadoop datasources (like relational databases, NoSQL) into/out-of HadoopCouchbase provides Cloudera Certified sqoopconnector9 © 2012 IBM Corporation
    10. 10. Ad Targeting Ad Targeting Platform Logs Logs Logs Couchbase Server Cluster Logs sqoop export Logs flume flow sqoop import Hadoop Cluster10 © 2012 IBM Corporation
    11. 11. Content Driven Site In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, Content Driven data behind content driven sites is shifting to Web Site Couchbase. Couchbase Server Cluster Original RDBMS Logs Logs Logs Logs Logs Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources. flume flow sqoop import sqoop export sqoop import Hadoop Cluster11 © 2012 IBM Corporation
    12. 12. Big Data platform: Bring Together a Large Volume and Variety of Datato Find New Insights T-Mobile  Analyzing a variety of data at enormous volumes Multi-channel customer experience analysis  Insights on streaming data  Large volume structured, semi-structure and UOIT unstructured data analysis Detect life-threatening conditions in time to intervene Vestas Predict weather patterns to plan optimal wind turbine usage Big Data Platform Dublin City Council • Variety Optimization and monitoring of • Velocity public transportations • Volume Brocade Identify network security intrusions12 © 2012 IBM Corporation © 2011 IBM Corporation
    13. 13. Green Energy: Vestas Wind Systems A/S Volume  Weather and geographic data analysis for wind turbine and wind farm site planning  Deployed IBM Big Data to store, manage and to analyze location- specific data  Analyzing 2.8 petabytes of public and private weather data for each geographic location  Reduced by 97% - from weeks to hours – the modeling time for wind forecasting information13 © 2012 IBM Corporation
    14. 14. IBM Watson Demonstrated the Power of Big Data Analytics Variety Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context andretrieving, analyzing and understanding vast amounts of information in real-time?14 © 2012 IBM Corporation
    15. 15. Big Data Analytics in Smarter Hospitals Velocity Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance IBM Data Baby youtube.com15 © 2012 IBM Corporation
    16. 16. Asian telco reduces billing costs and improves customer satisfaction. Capabilities: Stream Computing Analytic Accelerators Real-time mediation and analysis of 6B CDRs per day Data processing time reduced from 12 hrs to 1 sec Hardware cost reduced to 1/8th Proactively address issues 16 (e.g. dropped calls) impacting customer © 2012 IBM Corporation16 satisfaction.
    17. 17. Telecommunications – Analyze in real time A Telco processing Call Detail Records 500K/sec, 6B+ IPDRs analyzed – 6 Billion CDRs per day per day on more than 4 PBs/yr. – Deduplicating data over 7 days sustaining 1GBps. – Processing latency reduced from 12 hours to a few seconds A Telco implementing a solution to access and analyze call, internet usage and texting detail records (xDRs) in real-time – 91% reduction in time to merge data – 93% reduction in storage requirements – 85% reduction in servers used A Telco requiring a solution to analyze up to 25M messages per second. At these volumes, in-motion analysis is the only option – “Streams handled at least an order of magnitude more events per second on the same hardware than competitors.” (Telco’s Chief Architect) – Even at these volumes, Streams provided near linear scalability17 © 2012 IBM Corporation
    18. 18. Big Data is an integral part of an enterprise data platform  Manage Big Data from the instant it enters the enterprise  High fidelity – no changes to original format  Available for new uses, analyses, and integrations Business Analytic Applications (e.g. Cognos, SPSS) and Solutions Big Data Applications Operational Data Store Big Data Platform IBM Big Data Solutions Client and Partner Solutions Warehouse and Appliances Big Data User Environment Developers End Users Admin. Big Data Enterprise Engine Traditional data sources Streaming Internet-scale analytics analytics Source data (Web, sensors, logs, media, etc. )18 © 2012 IBM Corporation
    19. 19. IBM’s Big Data Platform Bringing Big Data to the Enterprise Data IBM Big Data Solutions Client and Partner Solutions Warehouse InfoSphere Warehouse Warehouse Appliances Big Data User Environments Netezza Developers End Users Administrators Master Data Mgmt InfoSphere MDM INTEGRATIONAGENTS Database Big Data Enterprise Engines DB2, Informix Content Analytics ECM Information Server Business Analytics Streaming Analytics Internet Scale Analytics Cognos & SPSS Marketing Open Source Foundational Components Unica Hadoop HBase Pig Lucene Jaql Hive Data Growth Management InfoSphere Optim19 19 © 2012 IBM Corporation
    20. 20. IBM Big Data Platform ToolsBusiness UsersData ScientistsBusiness AnalystsDevelopersAdministrators • Determine product sentiment, intent, customer segmentation • Execute reusable Apps to classify users, predict sales, and forecast trends • Create spreadsheets and dashboards Analyzing big data • Productive environment for executing analysis (cluster, rank, score with R, ML, Text) • Create reusable analytic Apps without programming • Dynamic open dashboard20 © 2012 IBM Corporation
    21. 21. Thank You dipti@couchbase.com21 © 2012 IBM Corporation