A Unified DataModeler in the Worldof Big DataWilliam Luk, CA Technologies Inc                                             ...
Speaker BioSenior Director of software  development in the Data  Management BU, head of  ERwin engineering and level  2 su...
A Unified Data Modeler in the World of Big Data Session Agenda — Where are we & how do we get here? — Overview of the Big ...
Data Modeling Past 30 Years— Entity-Relationship (ER) modeling has served us well  since mid-70’s— Data architects / model...
Internet & Social Networks— Early Internet used the classical LAMP stack – Linux,  Apache Web Server, MySQL Database, and ...
Arrival of Big Data— Wealth of valuable data to collect:     − Users entered information     − History / logs of users int...
Enterprise Big Data / Hadoop Workflow  Customer Data Source              HQL (Hive SQL),             JSON, XML, …etc      ...
Problem of Non-Relational Data Stores— NoSQL and unstructured data store performance has  a price:     − Denormalized data...
The New World of Data Modeling with Relationaland Big Data— The new enterprise data landscape:     − Different relational ...
ERwin Tapping into Hadoop                                                               Data Sources                JSON /...
CA Internal Proof of ConceptBig Data of CA Enterprise Products                                                APM, Clarity...
What We Learn So Far— Most non-relational data store will be a simple entity / box in  ERwin  − Attributes in each non-rel...
The Future of Data Modeling— Presented a (but not the only) direction that data  modeling can be evolved to model both rel...
Thank You – Questions? William Luk (650)298-3111 William.luk@ca.com http://www.linkedin.com/pub/william-luk/1/818/bb1
Legal noticeCopyright © 2012 CA. All rights reserved. All trademarks, trade names, service marks and logos referencedherei...
Upcoming SlideShare
Loading in …5
×

A unified data modeler in the world of big data

1,131 views

Published on

Presentation I gave at ERworld 2012

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,131
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A unified data modeler in the world of big data

  1. 1. A Unified DataModeler in the Worldof Big DataWilliam Luk, CA Technologies Inc 2012Sr Director, Software Engineering – Data Modeling CollaboratioSession Code: HT01 n By Design
  2. 2. Speaker BioSenior Director of software development in the Data Management BU, head of ERwin engineering and level 2 support Experience in databases, data security, and data management; BS & MS in CS;
  3. 3. A Unified Data Modeler in the World of Big Data Session Agenda — Where are we & how do we get here? — Overview of the Big Data world — Challenges to enterprises and data architect — Extending data modeling to include Big Data —Q&A 3
  4. 4. Data Modeling Past 30 Years— Entity-Relationship (ER) modeling has served us well since mid-70’s— Data architects / modelers have used ER tools to ensure data consistencies and integrities for very large enterprises— Ability to integrate new databases from mergers and acquisitions;— A map of where all your data;— Ability to handle large & complex data model;— Then, the Internet & social networks 4
  5. 5. Internet & Social Networks— Early Internet used the classical LAMP stack – Linux, Apache Web Server, MySQL Database, and Perl/PHP/Python— Basic web servers & DB’s served us well for basic web portal— Internet growth + social networks changed the scale of database / data store— Traditional relational databases have difficulties handling the scale & required (sub second) response time for web— Emergences of NoSQL data store 5
  6. 6. Arrival of Big Data— Wealth of valuable data to collect: − Users entered information − History / logs of users interaction— Not always fit nicely into structured data stores (relational or NoSQL)— Need to harvest / analyze the data to compete— Challenges of capturing, storing, searching, anlayzing, and visualizing very large and complex data sets— Large, distributed, analytical platforms (Hadoop) emerged 6
  7. 7. Enterprise Big Data / Hadoop Workflow Customer Data Source HQL (Hive SQL), JSON, XML, …etc Unstructured Data / Files HDFS Structured Data Semi-structured Data Unstructured Data JSON Hive HBas XML JSON e MapReduce / Analytics Hadoop Framework (Pig, Cloudera, (Clusters) Datameer, …etc) A Unified Data Modeler in the World of Big Data
  8. 8. Problem of Non-Relational Data Stores— NoSQL and unstructured data store performance has a price: − Denormalized data − Data consistencies & integrities – only guarantee “eventual” consistency— Some data (such as user comments) can tolerate these drawbacks— Some data (such as financial, transactional) cannot— Enterprises conclusions: − NoSQL & Big Data are good for business intelligences data − Financial & transactional data still require relational databases − Compliance requirements / regulations 8
  9. 9. The New World of Data Modeling with Relationaland Big Data— The new enterprise data landscape: − Different relational databases − Distributed hadoop cluster with structured, semi-structured, and unstructured data which is constantly changing— Challenges to the data architects / modelers: − Identify potential relationships between different data stores − Automated way to track and update the unified view— Data Modeling tools, such as ERwin, need to evolve to present a single unified view of ALL enterprise data 9
  10. 10. ERwin Tapping into Hadoop Data Sources JSON / XML Headers HQL (Hive SQL), JSON, XML, Unstructured Data / Files HDFS Structured Data Semi-structured Data Unstructured Data JSON Hive HBas XML JSON HQL e MapReduce / Analytics Hadoop Framework (Pig, Cloudera, (Clusters) Datameer, …etc) A Unified Data Modeler in the World of Big Data
  11. 11. CA Internal Proof of ConceptBig Data of CA Enterprise Products APM, Clarity, Nimsoft, • CA Hadoop test framework WatchMouse, …etc with 7 Dell 2950’s Unified View of CQL (Cassandra SQL), • Dump / store logs & data from All Models HQL (Hive SQL), mongoDB, various CA products into JSON, XML HDFS Reverse Engineer • Transform logs & data into JSON / XML Headers structured or semi-structured data stores CA Hadoop Test Semi-structured Data • Reverse engineer to build Framework (HDFS / Cassandra FS) logical model of different CA JSON products XML JSON • Identify potential relationships Cassandra / between data storesReverse Engineer Hive / Hbase / CQL / HQL / Mongo Query mongoDB (JSON) A Unified Data Modeler in the World of Big Data
  12. 12. What We Learn So Far— Most non-relational data store will be a simple entity / box in ERwin − Attributes in each non-relational entity include key indices and columns − Supercolumns or nested structures can be expanded in the same entity or depict as hierarchy— Metadata are important: − Describes the kind of information / data − Structure of the columns in a supercomlumn— There are relationships between non-relational data stores and relational databases— So far, we only investigated reverse engineering of data stores into logical model. Forward engineering of logical model into physical non-relational data stores may be useful— We are not there yet, but a unified data modeler of relational and Big Data is definitely possible 12
  13. 13. The Future of Data Modeling— Presented a (but not the only) direction that data modeling can be evolved to model both relational and non-relational data stores— Data explosion will continue and accelerate at a much faster rate— Business must rely more and more on collected data to gather business intelligence to compete— Role of data architect and modeler will become more important – to analyze Big Data, enterprises must first understand what they have! 13
  14. 14. Thank You – Questions? William Luk (650)298-3111 William.luk@ca.com http://www.linkedin.com/pub/william-luk/1/818/bb1
  15. 15. Legal noticeCopyright © 2012 CA. All rights reserved. All trademarks, trade names, service marks and logos referencedherein belong to their respective companies. No unauthorized use, copying or distribution permitted.THIS PRESENTATION IS FOR YOUR INFORMATIONAL PURPOSES ONLY. CA assumes no responsibility for the accuracy or completeness of the information. TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS DOCUMENT ―AS IS‖ WITHOUT WARRANTY OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event will CA be liable for any loss or damage, direct or indirect, in connection with this presentation, including, without limitation, lost profits, lost investment, business interruption, goodwill, or lost data, even if CA is expressly advised of the possibility of such damages.Certain information in this presentation may outline CA’s general product direction. This presentation shall not serve to (i) affect the rights and/or obligations of CA or its licensees under any existing or future written license agreement or services agreement relating to any CA software product; or (ii) amend any product documentation or specifications for any CA software product. The development, release and timing of any features or functionality described in this presentation remain at CA’s sole discretion.Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in this presentation, CA may make such release available (i) for sale to new licensees of such product; and (ii) in the form of a regularly scheduled major product release. Such releases may be made available to current licensees of such product who are current subscribers to CA maintenance and support on a when and if-available basis.

×