Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Wrangling and the Art of Big Data Discovery

1,237 views

Published on

The Briefing Room with Dr. Robin Bloor, Trifacta and Zoomdata
Live Webcast March 10, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=dd9fed3c7c476ae3a0f881ae6b53dcc5

Square pegs and round holes don't get along, which is one reason why traditional data management approaches simply won't work for Big Data. The variety and velocity of data types flying at us today require a new strategy for identifying, streamlining and utilizing information assets and processes. Decades-old technology won’t cut it – a combination of new tools and techniques must be used to enable effective discovery of insights in a timely fashion.

Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why today's data landscape calls for a much different data management approach. He'll be briefed by Trifacta and Zoomdata, who will show how their technologies use a range of functionality – including machine learning – to help companies "wrangle" their data. They'll also demonstrate the optimal step-by-step process of working with new data types.

Visit InsideAnalysis.com for more information.

Published in: Technology
  • Be the first to comment

Data Wrangling and the Art of Big Data Discovery

  1. 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  2. 2. The Briefing Room Data Wrangling and the Art of Big Data Discovery
  3. 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  4. 4. Twitter Tag: #briefr The Briefing Room   Reveal the essential characteristics of enterprise software, good and bad   Provide a forum for detailed analysis of today s innovative technologies   Give vendors a chance to explain their product to savvy analysts   Allow audience members to pose serious questions... and get answers! Mission
  5. 5. Twitter Tag: #briefr The Briefing Room Topics March: BI/ANALYTICS April: BIG DATA May: CLOUD
  6. 6. Twitter Tag: #briefr The Briefing Room Should I Bring My Tools? Ø  Hammers aren’t good for plumbing! Ø  Big Data requires a new set of tools Ø  Preparing and Exploring are very different Ø  Don’t throw out your old tool box!
  7. 7. Twitter Tag: #briefr The Briefing Room Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group robin.bloor@bloorgroup.com @robinbloor
  8. 8. Twitter Tag: #briefr The Briefing Room Trifacta and Zoomdata Trifacta offers a platform for data transformation and preparation   The interface is rich in visualization and provides previews and recommendations   The platform also includes a learning layer which employs machine learning algorithms to facilitate automation and self- learning Zoomdata is a Big Data exploration, visualization and analytics platform   The platform offers a wide range of analytics and BI tools, such as dashboards, stream processing and IoT analytics   Its pre-built connectors allow the Zoomdata server to connect directly to data sources
  9. 9. Twitter Tag: #briefr The Briefing Room Guests: Russ Cosentino is Vice President of Marketing & Business Development at Zoomdata. Throughout his career he has focused on developing solutions that leverage technology to solve business problems. His experience includes application development for mission critical systems for the DoD, automated recruitment programs for the intelligence community and the application of text analytics for commercial VOC programs. Dr. Joe Hellerstein is Trifacta’s Chief Strategy Officer and a Professor of Computer Science at Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. In 2010, Fortune Magazine included him in their list of 50 smartest people in technology, and MIT Technology Review magazine included his Bloom language for cloud computing on their TR10 list of the 10 technologies “most likely to change our world.”
  10. 10. Data Wrangling and the Art of Big Data Discovery
  11. 11. Dr. Joe Hellerstein Professor, EECS Computer Science Division, UC Berkeley Co-founder & Chief Strategy Officer, Trifacta DATA WRANGLING AND THE ART OF BIG DATA DISCOVERY Russ Cosentino Vice President Marketing & Business Development, Zoomdata
  12. 12. Founded in 2012, from Berkeley/Stanford research roots dp = data to the people “facilitating interactions between people and data throughout the analytic lifecycle” Stanford Visualization Group’s “Data Wrangler” Elegant solutions for a messy world: The 80% problem of preparing data for exploratory analytics
  13. 13. TRADITIONAL APPROACH TO DATA MANAGEMENT Enterprise  Data  Warehouse   Implement  Data  Sources   ETL   Structured   Ingest Storage  #1,  2,  N   ELT   Store  &  Process   EDW   Archive   ETL   Access  Data   Analyze  Data   Search Statistical Machine Learning SQL Serve Serve Optimize Implement Custom Application Point Solution ELT   ELT  
  14. 14. MANY PEOPLE INVOLVED IN THE PROCESS DATA ARCHITECT DATABASE ADMINISTRATOR SYSTEM ADMINISTRATOR BUSINESS ANALYST BI ADMINISTRATOR SYSTEM ADMINISTRATOR
  15. 15. IT COULD BE SIMPLER DATABASE ADMINISTRATOR BUSINESS ANALYST
  16. 16. MODERN DATA AND VISUALIZATION ENVIRONMENT Visualiza8on  Data  Sources   Structured   Ingest Store  &  Process   Data  Prepara8on   Serve Unstructured   Ingest Serve
  17. 17. REAL BENEFITS OF A SELF-SERVICE APPROACH +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction Real-Time
  18. 18. +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction +48% Speed of Delivery +42% Self-Service Access +40% Decision Quality Real-Time Big Data REAL BENEFITS OF A SELF-SERVICE APPROACH
  19. 19. +15% Cash Increase +26% Pipeline Growth -67% Cost Reduction +70% Collaboration +64% Decision Speed +61% User Adoption +48% Speed of Delivery +42% Self-Service Access +40% Decision Quality Real-Time InteractiveBig Data REAL BENEFITS OF A SELF-SERVICE APPROACH
  20. 20. DEMONSTRATION
  21. 21. MODERN DATA ARCHITECTURE FOR SELF-SERVICE INTELLIGENCE
  22. 22. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Robin Bloor
  23. 23. I am not a number! To Round-Up & Wrangle Robin Bloor, PhD
  24. 24. The Flow of Data The movement of data: from ACQUISITION through PREPARATION to ANALYSIS Is not necessarily simple…
  25. 25. The General Picture Data Sources Analytics Service Mgt Life Cycle Mgt MetaData Discovery MDM MetaData Mgt Data Cleansing Data Lineage R O U N D | U P W R A N G L I N G Staging Area (Hadoop) Data Warehouse or other location Data Streams ETL ETL
  26. 26. Immediate Analytics & the Rest §  Metadata discovery §  Metadata management §  Data cleansing §  Data lineage IMMEDIATE ANALYTICS Data Sources Analytics Service Mgt Life Cycle Mgt MetaData Discovery MDM MetaData Mgt Data Cleansing Data Lineage R O U N D | U P W R A N G L I N G Staging Area (Hadoop) Data Warehouse or other location Data Streams ETL ETL §  MDM §  Service mgt §  Lifecycle mgt §  ETL DOWNSTREAM
  27. 27. The Analytics Business Process §  The main point to note is that it is iterative §  It has morphed, because of: o  Data availability o  Parallel technology o  Scalable software o  Open source tools o  M/C learning Data Access Data Prep Model Analyze Deploy Execute
  28. 28. Analytical Latencies 1.  Data access 2.  Data preparation 3.  Model development 4.  Execution 5.  Implementation 6.  Model audit & update This is where the rubber meets the road: Speed = Value
  29. 29. The Impending Reality Technology is speeding up analytics by TWO ORDERS OF MAGNITUDE (on the IT side) This is changing analytics
  30. 30. u  Is your capability only relevant to analytics or does it have broader areas of application? u  Technically, what makes it fast? u  Please comment on analytical workloads: - What do you see as the natural IT bottlenecks? - What do you see as the natural business bottlenecks? u  Do we want business analysts to become ersatz data scientists?
  31. 31. u  In respect to scale, what is your largest implementation by data volume, and what was the industry sector/problem space? u  Who do you partner with? u  What do you see as the largest barrier to adoption of Trifacta?
  32. 32. Twitter Tag: #briefr The Briefing Room
  33. 33. Twitter Tag: #briefr The Briefing Room Upcoming Topics www.insideanalysis.com March: BI/ANALYTICS April: BIG DATA May: CLOUD
  34. 34. Twitter Tag: #briefr The Briefing Room THANK YOU for your ATTENTION! Some images provided courtesy of Wikimedia Commons and Wikipedia, including: "Multiple pliers" by Ed Stevenhagen from nl. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Multiple_pliers.jpg#mediaviewer/File:Multiple_pliers.jpg

×