Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A
#DenodoDataFest
RAPID, AGILE DATA STRATEGIES
For A...
Comparing and Contrasting Data
Virtualization with Data Prep, Data
Blending, Data Catalog and Other
Technologies
Paul Moxo...
Agenda
1.Business Intelligence ‘Swim Lanes’
2.Data Prep – What is it and how does it work?
3.All you want to know about Da...
Business Intelligence ‘Swim Lanes’
4
• Task focused
• Productivity
• Self-service
• Quick and easy access
to data
• Automa...
Data Prep
What is it and how does it work?
5
Data preparation is the process of gathering,
combining, structuring and organizing data so it
can be analyzed as part of ...
Leading Data Prep Vendors
• Trifacta
• Paxata
• Alteryx
• Datameer
• Talend Data Preparation Desktop
• Informatica Rev
• S...
How Does It Work?
Interactive Data Prep process:
1. First data is ingested from data sources (or just a sample of data)
2....
Data Prep Tools
9
Pros:
• Ease of use
• Iterative data transformation
• Very good with delimited files
• Sampling makes tools responsive
• D...
Data Prep is great for ad-hoc discovery
and analytics
• “I need to combine this with that and run
it through my analytics ...
Data Prep and Virtual Sandboxes
12
Data Blending
13
All you want to know about…
Data blending is about working with multiple sources of
data by preparing them and joining them together for a
specific us...
Leading Data Blending ‘Vendors’
• Tableau
• Microstrategy
• SAP Business Objects
• IBM Cognos
• Qlik View
• etc.
15
How Does it Work?
Defining the data blending ‘model’:
1. Connect to data sources
a. Databases, Data Warehouse (via ODBC or...
Data Blending
17
Pros:
• Built into BI/visualization tools
• Graphical query designer
• Provides semantic layer on top of
data sources
• Qu...
Francois Ajenstat, Chief Product Officer, Tableau
There are two flows; the ad-hoc and the operational…where we are
coming ...
Data Catalogs
What, when, and how?
20
Data Catalogs provide capabilities that enable any user –
from analysts to data scientists to developers – to discover,
un...
Leading Data Catalog Vendors
• Alation/Teradata
• Cambridge Semantics Anzo Platform
• Informatica Enterprise Information C...
How Does it Work?
Building catalog:
1. Connect to data sources and consumers
a. Extract and analyze ‘technical’ metadata
b...
Data Catalog
24
Pros:
• Great for analyzing data source and
inferring meaning from technical
metadata
• Gather ‘tribal knowledge’ about da...
Summary
Back to the swim lanes…
26
Business Intelligence ‘Swim Lanes’
27
Data Blending
Data Catalog
Data Prep
Data Virtualization
Q&A
Thank you!
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may b...
Upcoming SlideShare
Loading in …5
×

Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

816 views

Published on

Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/Bvmvc9

Data prep and data blending are terms that have come to prominence over the last year or two. On the surface, they appear to offer functionality similar to data virtualization…but there are important differences!

In this session, you will learn:

• How data virtualization complements or contrasts technologies such as data prep and data blending
• Pros and cons of functionality provided by data prep, data catalog and data blending tools
• When and how to use these different technologies to be most effective

This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6

Published in: Data & Analytics
  • Be the first to comment

Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data Prep, Data Blending, and Other Technologies

  1. 1. O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A #DenodoDataFest RAPID, AGILE DATA STRATEGIES For Accelerating Analytics, Cloud, and Big Data Initiatives.
  2. 2. Comparing and Contrasting Data Virtualization with Data Prep, Data Blending, Data Catalog and Other Technologies Paul Moxon Head of Product Management, Denodo
  3. 3. Agenda 1.Business Intelligence ‘Swim Lanes’ 2.Data Prep – What is it and how does it work? 3.All you want to know about Data Blending 4.Data Catalogs – What, When, and How 5.Mapping to the Swim Lanes 6.Where Does Data Virtualization Fit? 7.Q&A 3
  4. 4. Business Intelligence ‘Swim Lanes’ 4 • Task focused • Productivity • Self-service • Quick and easy access to data • Automation (or simplification) of data gathering • Tactical • Team/Departmental • Drives business operations • Shared data • Process oriented • Strategic • Executive and KPI dashboards • Drives strategic decisions • Managed, governed data • Consistent data
  5. 5. Data Prep What is it and how does it work? 5
  6. 6. Data preparation is the process of gathering, combining, structuring and organizing data so it can be analyzed as part of business intelligence or analytics process.
  7. 7. Leading Data Prep Vendors • Trifacta • Paxata • Alteryx • Datameer • Talend Data Preparation Desktop • Informatica Rev • SAS Data Loader • IBM Watson 7
  8. 8. How Does It Work? Interactive Data Prep process: 1. First data is ingested from data sources (or just a sample of data) 2. The user can define transformations to prepare the data a. De-duplication, cleansing, combining data, pivoting, splitting rows/columns, etc. 3. Run the transformation and export the data a. Local file (typically CSV) or into Hadoop (Hive table or CSV file) b. Alternatively export to BI Tool (e.g. Tableau Data Extract file) Operationalize: 1. Schedule data prep transformations to generate new data files (à la ETL) 2. Publish results to collaboration environment 8
  9. 9. Data Prep Tools 9
  10. 10. Pros: • Ease of use • Iterative data transformation • Very good with delimited files • Sampling makes tools responsive • Data profiling help detect ‘suspect’ data Cons: • Ad-hoc rather than operational • Reuse is limited to collaborative data sets • Performance • Consistency and governance – data chaos? Pros and Cons of Data Prep 10
  11. 11. Data Prep is great for ad-hoc discovery and analytics • “I need to combine this with that and run it through my analytics application…” Not so good for consistent, repeatable integration • (Think: BI swim lanes) But… • Data Prep provides valuable knowledge that can be used in systematic data integration Data Prep and Systematic Data Integration 11
  12. 12. Data Prep and Virtual Sandboxes 12
  13. 13. Data Blending 13 All you want to know about…
  14. 14. Data blending is about working with multiple sources of data by preparing them and joining them together for a specific use case at a specific time. It’s different from data integration, because data blending is about solving a specific use case, whereas data integration typically gives you a single source of truth…
  15. 15. Leading Data Blending ‘Vendors’ • Tableau • Microstrategy • SAP Business Objects • IBM Cognos • Qlik View • etc. 15
  16. 16. How Does it Work? Defining the data blending ‘model’: 1. Connect to data sources a. Databases, Data Warehouse (via ODBC or JDBC), Files (Excel, CSV, etc.), Hadoop, NoSQL, etc. 2. Select data you want to use – a sample is usually loaded 3. Build model using graphical tool to create Joins, Unions, etc. 4. Run the model for the full data set 5. Build your report or dashboard Operationalize: 1. Model can be saved and expose as a ‘data source’ (usually in a ‘server’) 2. Accessed by other users 16
  17. 17. Data Blending 17
  18. 18. Pros: • Built into BI/visualization tools • Graphical query designer • Provides semantic layer on top of data sources • Quick time from ‘data to analysis’ i.e. removes wait for IT to provision a data mart or similar Cons: • Ad-hoc rather than operational • Specific to each BI/visualization tool • Performance • Consistency and governance Pros and Cons of Data Blending 18
  19. 19. Francois Ajenstat, Chief Product Officer, Tableau There are two flows; the ad-hoc and the operational…where we are coming from is…I just want to integrate these two sources. It's not formalized, per se, it's not a project. I just want to connect this and this and I want to analyze it. How do we go from data to analysis as quickly as possible? And when you want to formalize it, operationalize it, make it repeatable, then [you use other tools]. 19
  20. 20. Data Catalogs What, when, and how? 20
  21. 21. Data Catalogs provide capabilities that enable any user – from analysts to data scientists to developers – to discover, understand, and consume data sources. Data Catalogs typically include a crowdsourcing model of metadata and annotations, and allow all users to contribute their knowledge to build a community and culture of data.
  22. 22. Leading Data Catalog Vendors • Alation/Teradata • Cambridge Semantics Anzo Platform • Informatica Enterprise Information Catalog • Microsoft Azure Data Catalog • Waterline Data 22
  23. 23. How Does it Work? Building catalog: 1. Connect to data sources and consumers a. Extract and analyze ‘technical’ metadata b. Sample data and build data profile 2. Use NLP and ML for ‘auto-titling’ – based on defined business glossary 3. Use expert sourcing to validate catalog entries 4. Use crowd sourcing to build veracity profile Accessing catalog: 1. Search tools for ‘natural language’ searches 2. APIs for tool integration 23
  24. 24. Data Catalog 24
  25. 25. Pros: • Great for analyzing data source and inferring meaning from technical metadata • Gather ‘tribal knowledge’ about data within organization • Allow curation of metadata • Provide single tool to find – and understand - data Cons: • Do not address ‘data provisioning’ – you need another tool for this • File-based data? Pros and Cons of Data Blending 25
  26. 26. Summary Back to the swim lanes… 26
  27. 27. Business Intelligence ‘Swim Lanes’ 27 Data Blending Data Catalog Data Prep Data Virtualization
  28. 28. Q&A
  29. 29. Thank you! © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies. O C T O B E R 1 8 , 2 0 1 6 S A N F R A N C I S C O B A Y A R E A , C A #DenodoDataFest

×