June 2012IBM Big DataThe Marriage of Hadoop and Data WarehousingJames KobielusSenior Program Director, Product Marketing, ...
Hadoop and DW are    fast being joined into a    new platform paradigm:       the Hadoop DW2                              ...
Agenda    §  Big Data: 3 Vs and myriad use cases    §  Big Data: diverse workloads    §  Big Data: emergence of the Had...
Agenda    §  Big Data: 3 Vs and myriad use cases    §  Big Data: diverse workloads    §  Big Data: emergence of the Had...
Scalability Imperative: 3 Vs Drive Big Data Everywhere       Information               Radical                     Extreme...
More Business Use Cases for Big Data Across Enterprise6                                                    © 2012 IBM Corp...
More Mission-Critical Apps Ride on Big Data Platforms      Advanced Analytic Applications                                 ...
Big Data: Business Crucible for Practical Data Science                            Business and IT Identify                ...
Big Data Initiatives: Fueled by Practical Data Science                                      Analyze a Variety of Informati...
Big Data: Marriage of Established & Emerging Approaches                 Established Approach                             E...
Agenda     §  Big Data: 3 Vs and myriad use cases     §  Big Data: diverse workloads     §  Big Data: emergence of the ...
Continuous Social Media Monitoring and Analytics                       Data Set                         Information extrac...
Content mining, natural language processing, & classification §  How it works                                         Uns...
Entity Extraction and Integration14                                  © 2012 IBM Corporation
Statistical Analysis, Predictive Modeling, & Machine Learning          Enables Machine learning (ML) on massive datasets  ...
Targeted E-Commerce and Next Best Action16                                         © 2012 IBM Corporation
Predictive Complex Event Processing17                                    © 2012 IBM Corporation
Intent and Sentiment Analysis                      Online flow: Data-in-motion analysis     Data Sources     Stream Comput...
Agenda     §  Big Data: 3 Vs and myriad use cases     §  Big Data: diverse workloads     §  Big Data: emergence of the ...
Big Data: DW & Hadoop are Married in Spirit                                             Cloud-facing                      ...
Hadoop is Core of Next-Gen Big Data DW     §  Vendor-agnostic framework for         massively parallel processing of     ...
Hadoop, DW, and other Databases Co-Exist in Big DataEcosystem              Hadoop &                                  In-me...
How Hadoop and DW Complement Each Other23                                        © 2012 IBM Corporation
Single Version of Big Data: Where Hadoop DW Will Excel                                                   Timely Insights  ...
Hadoop DW Integration: What to Look For                                                                             models...
Consider Big Data Platform Accelerators                  Telecommunications                              Retail Customer  ...
How Will You Do MDM on Your Hadoop DW?     (A1) Unstructured Entity Integration (on BigInsights)       –  Complex analytic...
IBM Big Data PlatformNew analytic applications drive the                         Analytic Applicationsrequirements for a b...
Thank You!29                © 2012 IBM Corporation
Upcoming SlideShare
Loading in...5
×

Ibm big data ibm marriage of hadoop and data warehousing

3,679

Published on

Published in: Technology, Business
1 Comment
17 Likes
Statistics
Notes
  • Why do you disable SAVE!!! Are you selling them?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,679
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
1
Likes
17
Embeds 0
No embeds

No notes for slide

Ibm big data ibm marriage of hadoop and data warehousing

  1. 1. June 2012IBM Big DataThe Marriage of Hadoop and Data WarehousingJames KobielusSenior Program Director, Product Marketing, Big Data, IBM © 2012 IBM Corporation
  2. 2. Hadoop and DW are fast being joined into a new platform paradigm: the Hadoop DW2 © 2012 IBM Corporation
  3. 3. Agenda §  Big Data: 3 Vs and myriad use cases §  Big Data: diverse workloads §  Big Data: emergence of the Hadoop DW3 © 2012 IBM Corporation
  4. 4. Agenda §  Big Data: 3 Vs and myriad use cases §  Big Data: diverse workloads §  Big Data: emergence of the Hadoop DW4 © 2012 IBM Corporation
  5. 5. Scalability Imperative: 3 Vs Drive Big Data Everywhere Information Radical Extreme from Everywhere Flexibility Scalability Volume Velocity Variety5 12 terabytes of Tweets created daily 5 million trade events per second 100’s from surveillance cameras video feeds © 2012 IBM Corporation
  6. 6. More Business Use Cases for Big Data Across Enterprise6 © 2012 IBM Corporation
  7. 7. More Mission-Critical Apps Ride on Big Data Platforms Advanced Analytic Applications §  Integrate and manage the full variety, velocity and volume of data §  Apply advanced analytics to information in its native form Big Data Platform §  Visualize all available data for ad-hoc analysis Process and analyze any type of data and discovery Accelerators §  Development environment for building new analytic applications §  Integration and deploy applications with enterprise grade availability, manageability, security, and performance •  Analyze data in motion •  Visualization and •  MapReduce / noSQL exploration •  Machine Learning •  Scalability •  Text Analytics •  Hardware •  Text Search acceleration •  Data Discovery •  Stream computing7 © 2012 IBM Corporation
  8. 8. Big Data: Business Crucible for Practical Data Science Business and IT Identify Information Sources Available New insights IT Delivers a drive integration Platform that to traditional enables creative technology exploration of all available data and content Business determines what questions to ask by exploring the data and relationships8 © 2012 IBM Corporation
  9. 9. Big Data Initiatives: Fueled by Practical Data Science Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Analyze Information in Motion Streaming data analysis Large volume data bursts and ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze PBs of information Manage & analyze high volumes of structured, relational data Discover and Experiment Ad-hoc analytics, data discovery and experimentation Manage and Plan Enforce data structure, integrity and control to9 ensure consistency for repeatable queries IBM Corporation © 2012
  10. 10. Big Data: Marriage of Established & Emerging Approaches Established Approach Emerging Approaches Structured, analytical, logical Creative, holistic thought, intuition DW Hadoop, etc. Transaction Data Web Logs Internal App Data Social Data Structured Unstructured Structured Enterprise Exploratory Exploratory Repeatable Repeatable Linear Integration Iterative Iterative Text Data: emails Mainframe Data Linear Monthly sales reports Brand sentiment Profitability analysis Product strategy OLTP SystemCustomer surveys Data Sensor data: images Maximum asset utilization ERP data Traditional New RFID Sources Sources10 © 2012 IBM Corporation
  11. 11. Agenda §  Big Data: 3 Vs and myriad use cases §  Big Data: diverse workloads §  Big Data: emergence of the Hadoop DW11 © 2012 IBM Corporation
  12. 12. Continuous Social Media Monitoring and Analytics Data Set Information extracted •  1.1B tweets •  Buzz and sentiment •  5.7M blog and forum posts •  Gender, Location and Occupation •  3.5M relevant messages •  Fans •  97K referencing Product A •  Intent to in purchase •  18K referencing Product B •  Specific attributes of products12 © 2012 IBM Corporation
  13. 13. Content mining, natural language processing, & classification §  How it works Unstructured text (document, email, etc) –  Parses text and detects meaning with extractors Football World Cup 2010, one team –  Understands the context in which the text is analyzed distinguished themselves well, losing to the eventual champions 1-0 in the Final. –  Hundreds of pre-built extractors for names, addresses, phone numbers, organizations, URL, Early in the second half, Netherlands’ Datetime, etc. striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas §  Accuracy made the save. Winger Andres Iniesta –  Highly accurate in deriving meaning from scored for Spain for the win. complex text §  Performance –  AQL language optimized for MapReduce Classification and Insight World Cup 2010 Highlights13 © 2012 IBM Corporation
  14. 14. Entity Extraction and Integration14 © 2012 IBM Corporation
  15. 15. Statistical Analysis, Predictive Modeling, & Machine Learning Enables Machine learning (ML) on massive datasets §  R and Matlab-like syntax for smooth adoption §  Optimizations to generate low-level executions plans §  Out-of-box and write-your-own analytic algorithms, e.g. Regression, Clustering, Classification, Pattern Mining, Ranking, etc. §  Scale to massively parallel clusters from 10s to 1000s of machines and from Terabytes to Petabytes What are people talking about in social media about a product? 1515 © 2012 IBM Corporation
  16. 16. Targeted E-Commerce and Next Best Action16 © 2012 IBM Corporation
  17. 17. Predictive Complex Event Processing17 © 2012 IBM Corporation
  18. 18. Intent and Sentiment Analysis Online flow: Data-in-motion analysis Data Sources Stream Computing and Analytics Timely Decisions Entity Predictive Data Ingest Text Analytics: Analytics: Analytics: and Prep Timely Insights Profile Action Resolution Determination Dashboard Hadoop System and Analytics Comprehensive Entity Social Media and Social Media Predictive Customer Text Analytics Analytics and Enterprise Data Customer Analytics Models Integration Profiles Offline flow: Data-at-rest analysis Reports18 © 2012 IBM Corporation
  19. 19. Agenda §  Big Data: 3 Vs and myriad use cases §  Big Data: diverse workloads §  Big Data: emergence of the Hadoop DW19 © 2012 IBM Corporation
  20. 20. Big Data: DW & Hadoop are Married in Spirit Cloud-facing architectures models Massively policies metadata aggregates parallel DQ MDM hubs marts processing cubes ETL databases DW In-database views storage memory staging production cache in-database analytics nodes tables analytics operational data stores Mixed workload management Hybrid storage layers20 © 2012 IBM Corporation
  21. 21. Hadoop is Core of Next-Gen Big Data DW §  Vendor-agnostic framework for massively parallel processing of advanced analytics against polystructured information §  Leverages extensible framework for building advanced analytics and data management functions §  Evolving rapidly in new directions §  Being commercialized and adopted rapidly in enterprises §  Vibrant open-source community and industry21 © 2012 IBM Corporation
  22. 22. Hadoop, DW, and other Databases Co-Exist in Big DataEcosystem Hadoop & In-memory NoSQL DW RDBMS Columnar OLAP Big Data staging, ETL, and Big Data SVOT and Big Data access preprocessing tier governance tier and interaction tier22 © 2012 IBM Corporation
  23. 23. How Hadoop and DW Complement Each Other23 © 2012 IBM Corporation
  24. 24. Single Version of Big Data: Where Hadoop DW Will Excel Timely Insights • Intent to see a movie title, buy a product • Current Location Life Events Products Interests • Life-changing events: relocation, having a • Personal preferences of product and services baby, getting married, getting divorced, • Product purchase history buying a house Personal Attributes Relationships Social media based • Personal relationships: family, friends • Identifiers: name, address, age, gender and roommates… • Interests: sports, pets, cuisine… 360-degree • Business relationships: co-workers and • Life Cycle Status: marital, parental consumer profiles work/interests network… Monetizable intent to see a Monetizable intent to buy Kinda feel like going to movies tonight… Any I need a new digital camera for my food pictures, and recommendations? @Texas Angelika Texas recommendations around 300? I don t think anyone understands how much I like What should I buy?? A mini laptop with Windows 7 OR a Apply watching movies. My 3rd trip to the threatre in 3 days. MacBook!??! Life Events Location announcements College: Off to Standard for my MBA! Bbye chicago! I m at Starbucks Parque Tezontle http://4sq.com/ fYReSj Looks like we ll be moving to New Orleans sooner than I24 thought. © 2012 IBM Corporation
  25. 25. Hadoop DW Integration: What to Look For models §  Hadoop distro functional depth policies metadata aggregates §  EDW HDFS connector DQ MDM hubs marts cubes ETL databases DW §  Software, appliance, and cloud form factors for views storage Hadoop offerings staging memory nodes production cache in-database §  Pluggable storage layer for Hadoop offerings tables operational analytics §  Bundled data management and analytics data stores offerings integrated with Hadoop solutions §  Modeling, management, acceleration, and optimization tools §  Real-time/low-latency capabilities integrated into Hadoop offerings §  Robust availability, security, and workload management tools integrated with Hadoop offerings §  And many more, focused on EDW-grade robustness, scalability, and flexibility!25 © 2012 IBM Corporation
  26. 26. Consider Big Data Platform Accelerators Telecommunications Retail Customer CDR streaming analytics Intelligence Deep Network Analytics Customer Behavior and Lifetime Value Analysis Finance Social Media Analytics Streaming options trading Sentiment Analytics, Intent to Insurance and banking DW purchase models Public transportation Data mining Real-time monitoring and Streaming statistical analysis routing optimization Over 100 sample User Defined Standard Toolkits Industry Data Models applications Toolkits Banking, Insurance, Telco, Healthcare, Retail26 © 2012 IBM Corporation
  27. 27. How Will You Do MDM on Your Hadoop DW? (A1) Unstructured Entity Integration (on BigInsights) –  Complex analytics to populate master data set –  Text Analytics: Rule language (AQL) for extracting entities, events, relationships from text and html documents –  Entity Integration: Rule language (HIL) to express & MDM DaaS customize the integration, cleansing, and aggregation of Applications the master entities and Views (A2) Entity Repository (on MDM) –  BigInsights Bridge: Generation of the MDM model for public master entities, from the BigInsights model; and select cik, Officers, Directors bulk-loading of master entities from Company Data services where name = Citigroup –  Query-based Application Development: Supports the generation of custom queries for individual applications Tooling based Queries on entity model A2 External data subscriptions (e.g., Acxiom) A1 Relational tables SELECT * FROM with master (SELECT t2.CIK as CIK, t2.NAME as NAME, t2.IS_FORMER_OFFICER as IS_FORMER_OFFICER, t2.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, t2.POSITION_NAME as POSITION_NAME, Text Analytics entities FROM tp.EARLIEST_DATE as EARLIEST_DATE, tp.IS_EARLIEST_EXACT as IS_EARLIEST_EXACT, tp.LATEST_DATE as LATEST_DATE, tp.IS_LATEST_EXACT as IS_LATEST_EXACT External public data and (SELECT t1.CIK as CIK, t1.NAME as NAME,t1.IS_FORMER_OFFICER as IS_FORMER_OFFICER, t1.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, p.NAME as POSITION_NAME, p.POSITIONSPK_ID as POSITIONSPK_ID sources Entity Integration FROM (SELECT o.CIK as CIK, o.NAME as NAME, o.IS_FORMER_OFFICER as IS_FORMER_OFFICER, o.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, o.OFFICERSPK_ID as OFFICERSPK_ID (e.g., SEC/FDIC, FROM DB2ADMIN.OFFICERS o WHERE o.OFFICER_OF = 567830643756635868 ) as t1 Twitter, Blogs, BigInsights InfoSphere MDM left outer join DB2ADMIN.POSITIONS p on t1.OFFICERSPK_ID= p.POSITIONOF ) as t2 Facebook) left outer join D2ADMIN.RANGEOFKNOWNDATES tp with Extensions UNION on t2.POSITIONSPK_ID = tp.RANGE_OF_KNOWN_DATES_FOR_POS ) // ( OUTER UNION) …27 © 2012 IBM Corporation
  28. 28. IBM Big Data PlatformNew analytic applications drive the Analytic Applicationsrequirements for a big data platform BI / Exploration / Functional Industry Predictive Content Reporting Visualization App App BI / Analytics Analytics Reporting •  Integrate and manage the full IBM Big Data Platform variety, velocity and volume of data Visualization Application Systems •  Apply advanced analytics to & Discovery Development Management information in its native form •  Visualize all available data for ad- Accelerators hoc analysis •  Development environment for Hadoop Stream Data System Computing Warehouse building new analytic applications •  Workload optimization and scheduling •  Security and Governance Information Integration & Governance © 2012 IBM Corporation
  29. 29. Thank You!29 © 2012 IBM Corporation

×