The Comprehensive Approach: A Unified Information Architecture


Published on

The Briefing Room with Richard Hackathorn and Teradata
Slides from the Live Webcast on May 29, 2012

The worlds of Business Intelligence (BI) and Big Data Analytics can seem at odds, but only because we have yet to fully experience comprehensive approach to managing big data – a Unified Big Data Architecture. The dynamics continue to change as vendors begin to emphasize the importance of leveraging SQL, engineering and operational skills, as well as incorporating novel uses of MapReduce to improve distributed analytic processing.

Register for this episode of The Briefing Room to learn the value of taking a strategic approach for managing big data from veteran BI and data warehouse consultant Richard Hackathorn. He'll be briefed by Chris Twogood of Teradata, who will outline his company's recent advances in bridging the gap between Hadoop and SQL to unlock deeper insights and explain the role of Teradata Aster and SQL-MapReduce as a Discovery Platform for Hadoop environments.

For more information visit:

Watch us on YouTube:

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Comprehensive Approach: A Unified Information Architecture

  1. 1. Eric.kavanagh@bloorgroup.comTwitter Tag: #briefr
  2. 2. !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers!Twitter Tag: #briefr
  3. 3. !   May: Analytics !   June: Intelligence !   July: Disruption !   August: Analytics !   September: Integration !   October: DatabaseTwitter Tag: #briefr
  4. 4. !   Analytics is, and always has been, about discovering insights that lead to better business decisions. The range of technologies and use cases that inhabit this area is wide: statistical analysis, data and process mining, predictive analytics and modeling, and complex event processing. !   What is now referred to as Big Data has pushed analytics beyond the capabilities of traditional solutions. “Big Analytics” has organizations diving into large heaps of data that previously was not available or usable. !   The growing volume, variety, velocity and complexity of data has proven to be a major challenge to organizations who leverage analytics to maintain a competitive edge.Twitter Tag: #briefr
  5. 5. Dr. Richard Hackathorn is a well-known industry analyst, technology innovator and international educator. He has pioneered innovations in database management, decision support and data warehousing. Richard has published numerous articles, presented at leading industry conferences, and conducted professional seminars in eighteen countries. He has written three books, entitled Enterprise Database Connectivity, Using the Data Warehouse (with William H. Inmon), and Web Farming for the Data Warehouse. Richard taught at the Wharton School and at the University of Colorado.Twitter Tag: #briefr
  6. 6. !   Teradata is known for its analytic data solutions with a focus on integrated data warehousing, big data analytics and business applications. !   It offers a broad suite of technology platforms and solutions, and a wide range of data management applications and data mining capabilities. !   Teradata features Teradata Aster is its MapReduce platform to handle big data and big analytics on multi-structured data.Twitter Tag: #briefr
  7. 7. Chris Twogood is Vice President of Product and Services Marketing for Teradata Corporation. He is responsible for marketing products (database, utilities, and platform), and services (professional and customer services), plus technical field sales support. Chris has twenty- five years of experience in the computer industry specializing in Data Warehousing, Decision Support, Customer Management and Appliance platforms. Chris has held roles that span Strategy, Application Definition, Marketing, Product Requirements/ Management, Platform Solutions and Product Marketing.Twitter Tag: #briefr
  8. 8. Unified Big Data Architecture
  9. 9. Big Data: From Transactions to Interactions BIG DATA User Generated Social Network Content Mobile Web External User Click Stream Sentiment Demographics Web logs WEB A/B testing Business Data Feeds Offer history Dynamic Pricing HD Video Affiliate Networks CRM Speech to Text Segmentation Search marketing Offer details Product/Service Logs ERP Behavioral Targeting Customer Touches Purchase detail Purchase record Support Contacts Dynamic Funnels SMS/MMS Payment record Increasing data variety and complexity10 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  10. 10. Unified Big Data ArchitectureBridging Classic & Big Data Worlds Classic BI Structured & Repeatable Analysis Business determines what IT structures the data to questions to ask answer those questions SQL performance and structure “Capture only what’s needed” MapReduce processing flexibility IT delivers a platform for Big Data Analytics storing, refining, and Business explores data for Multi-structured & Iterative Analysis analyzing all data sources questions worth answering “Capture in case it’s needed”11 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  11. 11. Need for a Unified Big Data Architecture for New InsightsEnabling All Users for Any Data Type from Data Capture to Analysis Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. Reporting and Execution Discover and Explore in the Enterprise Capture, Store and Refine Audio/ Web & Machine Images Docs Text CRM SCM ERP Video Social Logs12 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  12. 12. Unified Big Data Architecture for the Enterprise Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc. ANALYTICS Discovery Platform Active Data Warehouse Capture, Store, Refine Audio/ Web & Machine Images Text CRM SCM ERP Video Social Logs 13 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  13. 13. Analyst’s Goal: Get Insights from Data in Hadoop Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Teradata Analytics Portfolio Custom Code and Development SQL & MapReduce SQL MR, Pig, Hive Teradata Aster Teradata IT is the optimizer MapReduce Platform IDW HDFS 14 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  14. 14. Analytics on Hadoop Data Engineers Data Scientists Quants Business Analysts Aster MapReduce Portfolio Aster MapReduce Portfolio Teradata Analytics Portfolio SQL & MapReduce& MapReduce SQL SQL SQL Teradata Aster Teradata MapReduce Platform IDW HDFS 15 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  15. 15. What’s Technically Different in Big Data AnalyticsVariety of data types requires different schemas•  Data that uses a stable schema (structured) -  Data from packaged business processes with well-defined & known attributes (e.g., ERP data, Inventory Records, Supply Chain records, …)•  Data that has an evolving schema (semi-structured) -  Data generated by machine processes; known but changing set of attributes (e.g., Web logs, CDRs, Sensor logs, JSON, Social profiles, Twitter feeds, …)•  Data that has a format, but no schema (unstructured) -  Data captured by machines with well-defined format, but no semantics (e.g., images, videos, web pages, PDF documents, …) -  Semantics can be extracted from raw data by interpreting the format and extracting semantics (e.g., shapes from video, face recognition in images, logo detection, …) -  Sometimes format data is accompanied by meta-data that can have (Stable Schema or Evolving Schema) – that needs to be classified and treated separately16 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  16. 16. When to Use Which? The best approach by workload and data type Processing as a Function of Schema Requirements by Data Type Loading and Refining Low Cost Analytics Storage & Data Pre-Processing, Reporting (User-driven, Retention Prep, Cleansing Transformations interactive)Stable Teradata / Teradata Teradata Teradata TeradataSchema Hadoop (SQL analytics) Aster AsterEvolving Aster / Hadoop (joining with Aster (SQL + MapReduceSchema Hadoop structured data) Analytics) AsterFormat, Hadoop Hadoop Hadoop (MapReduceNo Schema Analytics) 17 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  17. 17. Architecture Flexibility – Stable Schema Solid State Drives 300-600 GB drives High Capacity Drives Extreme Data Data Warehouse Active Enterprise Data Appliance Appliance Warehouse Low Cost Storage & Retention Load, Data Prep & Refining Transformation Low Latency, Minimize Data High volume data CPU Intense Movement/Complexity, Benefits storage, light transformations, medium transformation aligned to transformations volume data storage reference data Automatic compression Compression Software compression Compress on cold engines18 List Price/TB 5/29/12 $4K $11K Teradata Copyright ©2012 $30K* * price/TB on cold storage only
  18. 18. Unified Big Data Architecture and Data FlowEnabling a Data-Driven Business Transaction Architecture Traditional Data Sources Business ETL SQL Analytics Applications Dimensional Data Analytic Results Interaction Architecture Multi-Structured Analytic Raw Data Tools & Users Sensors, Scientific Unified Analytic Access and Geospatial Data Iterative Store & Discovery Social Media Refine & Analytics Unified Big Data Architecture19 Confidential and proprietary. Copyright © 2012 Teradata Corporation.
  19. 19. Twitter Tag: #briefr
  20. 20. Thinking Beyond the Enterprise Data Warehouse Richard Hackathorn© Bolder Technology, Inc. 2012
  21. 21. A New Ballgame!•  Big Data is forcing us to rethink the goals and architecture for data warehousing•  Traditional EDW is no longer sufficient §  Exclusive collection of corporate information §  Striving toward a single version of truth §  Only structured data has business value §  Predefined questions are the norm•  We are now facing a new set of issues!© Bolder Technology, Inc. 2012 Slide 23
  22. 22. Issue: Exclusive to Inclusive•  All data can not be managed within the boundaries of the EDW §  Too much and too fast §  Too complex and changing §  Controlled by others §  New data sources are critical §  Short-lived data sources are also critical•  Need to be more agile, flexible, responsive•  Requiring ‘smart’ curating of new sources §  What should be captured, stored, and retained?•  Requiring ‘smart’ data exploration© Bolder Technology, Inc. 2012 Slide 24
  23. 23. Issue: Ever-Changing Multiple Truths•  “More things in heaven and earth than are dreamt of in your philosophy” §  IOW we do not know what we do not know!•  Example: multiple personalities for the same customer•  Business semantic analysis is critical and continuous activity© Bolder Technology, Inc. 2012 Slide 25
  24. 24. Issue: Discovering Structure•  Need for a constant refining of all data §  Constantly maturing data by enhancing, compressing, and structuring•  Business value comes from leveraging structured data into process variations §  What do you do differently with what you know? §  Analytics and data mining add structure© Bolder Technology, Inc. 2012 Slide 26
  25. 25. •  An interesting (and seldom discussed) facet of Big Data is the emerging applications that are NOT social networking analytics on web logs and website behaviors. What are the ‘killer’ apps in this area? Do they involve the “Internet of Things”? •  Big Data is big in volume and in variety. It is also big in velocity. There is a lot per second…per minute…per day. How should a unifying architecture handle the velocity of Big Data? •  Many are trying to “Capture in case it is needed” as their approach to Big Data. But, can you capture all the data? At what point does cost of data capture/storage exceed the business benefits? How do you decide what to capture, store, and retain? •  Data exploration is an increasingly popular term. How does it differ from data analysis? Can you really find useful information through data exploration when you do not know what you are looking for? Examples?Twitter Tag: #briefr
  26. 26. •  When you unify the architecture for Big Data (as contrasted with isolated islands of Big Data applications), the data needs to move through several physical stores. Given the volume and velocity of data flows, can/should Big Data be duplicated in multiple stores? •  What is the difference between the Hadoop (Hive, etc) system and the Teradata Aster system? Could you use both for analytics? Do you need both in your unifying architecture? •  Are the ‘traditional’ BI tools (like BusinessObjects, Cognos) relevant to Big Data analytics? Are they needed in companies that are heavily Big Data? Are they evolving and expanding to incorporate the new approaches and techniques required for Big Data? •  A key requirement in any unifying Big Data architecture is managing the complexity of schemas. It seems that we need a new generation of semantic analysis tools to assist with schema management. What tools are emerging to support this requirement?Twitter Tag: #briefr
  27. 27. •  Gregory Piatetsky-Shapiro of KDnuggets ran a recent poll on the largest dataset that his audience of data miners has so far analyzed. The median size for 2012 was in the range 10-100 GB. If most of the data for half of the analytics projects can fit into main memory on a server platform, why is there such a need for expensive architectures supporting MPP, MapReduce, and the like? • data-mined.htmlTwitter Tag: #briefr
  28. 28. !   June: Intelligence !   July: Disruption !   August: Analytics !   September: Integration !   October: Database !   November: CloudTwitter Tag: #briefr