The Role of Data                                                 Virtualization in a                                      ...
Big Data    You keep using that word.    I do not think it means    what you think it means.
What makes data “big”?        Hierarchical structures        Nested structures        Encoded values        Non‐standard (...
You could store this data in the data warehouse but…Old database technology has so many problems
“Big Data”New technology has so many problems
Reality is multiple data stores and platformsSeparate, purpose-built databases and processing systems fordifferent types o...
Example “big data”: Web tracking data    USER_ID              301212631165031    SESSION_ID           590387153892659     ...
There are two architectural approaches to  facilitating analysis, depending on where the  analyst works in the environment...
Alternative: data virtualization to enable accessA data virtualization layer can be used to make other sources (OLTP, the ...
Data virtualization can simplify access across the entire                data environment, “big” or not DV also enables sh...
About the Presenter                                       Mark Madsen is president of Third                               ...
Upcoming SlideShare
Loading in …5
×

Using Data Virtualization to Integrate With Big Data

1,389 views

Published on

Hadoop and big data don't sit as an island in organizations. To analyze event streams and similar data requires integrating with other data from systems in the organization. This isn't easy with big data systems today because there are disparities in the technoogies and environments when compared to traditional IT. Data virtualization is one way to smooth over the integration and allow Hadoop to access other data, or allow SQL-oriented tools to access Hadoop

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,389
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Using Data Virtualization to Integrate With Big Data

  1. 1. The Role of Data Virtualization in a World of Big Data June 6, 2012 Mark Madsen @markmadsen www.ThirdNature.net Information Management Through Human History New technology development (innovation) creates New methods to cope (maturation) creates New information scale and availability (saturation) creates…Copyright Third Nature, Inc.
  2. 2. Big Data You keep using that word. I do not think it means what you think it means.
  3. 3. What makes data “big”? Hierarchical structures Nested structures Encoded values Non‐standard (for a  database) types Deep structure Very large amounts Human authored text “big” is better off being defined as “complex” or “hard to manage”Copyright Third Nature, Inc.
  4. 4. You could store this data in the data warehouse but…Old database technology has so many problems
  5. 5. “Big Data”New technology has so many problems
  6. 6. Reality is multiple data stores and platformsSeparate, purpose-built databases and processing systems fordifferent types of data and query / computing workloads is thenorm for information delivery. Data flows between most of theseenvironments. BI, Reporting,  Dashboards 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge I novera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 2 Anit a  Bath $120,000 Sewer i nspector 2 Anita Bath $120,000 Sew er i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 3 vI an Awfulti ch $160,000 Derm atologist 3 Ivan Awfulit ch $160,000 Dermatologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 4 Nadia Geddit $36,000 DBA 4 N daia  Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA Data Warehouse 1 Marge I novera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 2 Anita Bath $120,000 Sew er i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 3 Ivan Awfulit ch $160,000 Dermatologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 4 N daia  Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA Databases Documents Flat Files XML Queues ERP Applications Source Environments Example “big data”: Web tracking data USER_ID 301212631165031 SESSION_ID 590387153892659 VISIT_DATE 1/10/2010 0:00 SESSION_START_DATE 1:41:44 AM PAGE_VIEW_DATE 1/10/2010 9:59 https://www.phisherking.com/gifts/store/LogonForm?mmc= link‐src‐email‐_‐m100109‐_‐44IOJ1‐_‐shop&langId=‐ DESTINATION_URL 1&storeId=1055&URL=BECGiftListItemDisplay REFERRAL_NAME Direct REFERRAL_URL ‐ PAGE_ID PROD_24259_CARD REL_PRODUCTS PROD_24654_CARD, PROD_3648_FLOWERS SITE_LOCATION_NAME VALENTINES DAY MICROSITE SITE_LOCATION_ID SHOP‐BY‐HOLIDAY VALENTINES DAY IP_ADDRESS 67.189.110.179 MOZILLA/4.0 (COMPATIBLE; MSIE 7.0; AOL 9.0; WINDOWS  BROWSER_OS_NAME NT 5.1; TRIDENT/4.0; GTB6; .NET CLR 1.1.4322)
  7. 7. Example “big data”: Web tracking data USER_ID 301212631165031 SESSION_ID 590387153892659 The event stream VISIT_DATE 1/10/2010 0:00 contains IDs, but no SESSION_START_DATE 1:41:44 AM reference data… PAGE_VIEW_DATE 1/10/2010 9:59 https://www.phisherking.com/gifts/store/LogonForm?mmc= link‐src‐email‐_‐m100109‐_‐44IOJ1‐_‐shop&langId=‐ DESTINATION_URL 1&storeId=1055&URL=BECGiftListItemDisplay REFERRAL_NAME Direct REFERRAL_URL ‐ PAGE_ID PROD_24259_CARD REL_PRODUCTS PROD_24654_CARD, PROD_3648_FLOWERS SITE_LOCATION_NAME VALENTINES DAY MICROSITE SITE_LOCATION_ID SHOP‐BY‐HOLIDAY VALENTINES DAY IP_ADDRESS 67.189.110.179 MOZILLA/4.0 (COMPATIBLE; MSIE 7.0; AOL 9.0; WINDOWS  BROWSER_OS_NAME NT 5.1; TRIDENT/4.0; GTB6; .NET CLR 1.1.4322) Reference data, aka dimensions, master data. This isn’t an OLTP DB, there is no reference data available from the source. I need that It would be logical data now. to keep all the It will take. data in one place. 6 months The typical situation for analysts
  8. 8. There are two architectural approaches to  facilitating analysis, depending on where the  analyst works in the environment: 1. Back end integration: For analysts working within  the BD environment ‐ Reaching out from the  environment to get other data thats needed to  make sense of information. 2. Front end integration: For analysts working in a  more conventional BI / analysis environment ‐ reaching in to the BD environment from other tools. Solution: copy the data into Hadoop?Just load it from the DW. If it’s there. Otherwise, dump and loadthe data from the sources.Great for one-time analysis, but if you need to do it again nextweek, or if you need current values on a regular basis?You can build custom extracts from each source. But… Data warehouse • Poor tool support OLTP Sources • Problem of on-demand / current values • Minimal data management possible in the Hadoop environment • The analyst waits
  9. 9. Alternative: data virtualization to enable accessA data virtualization layer can be used to make other sources (OLTP, the data warehouse) appear locally accessible to the analyst or Hadoop programmer. Then, two choices are possible: ▪ extract the data and load it into the local environment ▪ access it dynamically from within the environment  Data warehouse OLTP SourcesAlternative: data virtualization to bridge storesA data virtualization layer can be used to bridge the database and big data environments, hiding the back end complexities.Allows one to access raw or processed data from Hadoop alongside data from other environments with some benefits: no limited Hive connectors, no client‐side data merging, no difficult metadata layer integrations. Data warehouse OLTP Sources
  10. 10. Data virtualization can simplify access across the entire  data environment, “big” or not DV also enables shared metadata across environments, avoiding  the costs of model integration and burying it in source code. BI, Reporting,  Dashboards Data virtualization layer (front end) 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge I novera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 2 Anit  aBath $120,000 Sewer i nspector 2 Anita Bath $120,000 Sew er i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 3 Iv an Awfulti ch $160,000 Derm atologist 3 Ivan Awfulit ch $160,000 Dermatologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 4 Nadia Geddit $36,000 DBA 4 N daia  Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA Data Warehouse 1 Marge I novera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 2 Anita Bath $120,000 Sew er i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 3 Ivan Awfulit ch $160,000 Dermatologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 4 N daia  Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA DV  layer (back end) Databases Documents Flat Files XML Queues ERP Applications Source Environments Bridge the data environment to uses beyond BIThe use cases are now interactive applications, lower latency data, complex analytics and extend beyond read‐only queries.
  11. 11. About the Presenter Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity & Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net. About Third NatureThird Nature is a research and consulting firm focused on new andemerging technology and practices in business intelligence, analytics andperformance management. If your question is related to BI, analytics,information strategy and data then you‘re at the right place.Our goal is to help companies take advantage of information-drivenmanagement practices and applications. We offer education, consultingand research services to support business and IT organizations as well astechnology vendors.We fill the gap between what the industry analyst firms cover and what ITneeds. We specialize in product and technology analysis, so we look atemerging technologies and markets, evaluating technology and hw it isapplied rather than vendor market positions.

×