SlideShare a Scribd company logo
A Unified Data
Modeler in the World
of Big Data
William Luk, CA Technologies Inc
                                                    2012
Sr Director, Software Engineering – Data Modeling   Collaboratio
Session Code: HT01                                  n By
                                                    Design
Speaker Bio


Senior Director of software
  development in the Data
  Management BU, head of
  ERwin engineering and level
  2 support
 Experience in
  databases, data
  security, and data
  management;
 BS & MS in CS;
A Unified Data Modeler in the World of Big Data



 Session Agenda
 — Where are we & how do we get here?
 — Overview of the Big Data world
 — Challenges to enterprises and data architect
 — Extending data modeling to include Big Data
 —Q&A




 3
Data Modeling Past 30 Years



— Entity-Relationship (ER) modeling has served us well
  since mid-70’s
— Data architects / modelers have used ER tools to
  ensure data consistencies and integrities for very
  large enterprises
— Ability to integrate new databases from mergers and
  acquisitions;
— A map of where all your data;
— Ability to handle large & complex data model;
— Then, the Internet & social networks
 4
Internet & Social Networks



— Early Internet used the classical LAMP stack – Linux,
  Apache Web Server, MySQL Database, and
  Perl/PHP/Python
— Basic web servers & DB’s served us well for basic
  web portal
— Internet growth + social networks changed the scale
  of database / data store
— Traditional relational databases have difficulties
  handling the scale & required (sub second) response
  time for web
— Emergences of NoSQL data store
 5
Arrival of Big Data



— Wealth of valuable data to collect:
     − Users entered information
     − History / logs of users interaction

— Not always fit nicely into structured data stores
  (relational or NoSQL)
— Need to harvest / analyze the data to compete
— Challenges of
  capturing, storing, searching, anlayzing, and
  visualizing very large and complex data sets
— Large, distributed, analytical platforms (Hadoop)
  emerged
 6
Enterprise Big Data / Hadoop Workflow

  Customer Data Source




              HQL (Hive SQL),
             JSON, XML, …etc                                   Unstructured Data / Files


                                                                             HDFS
          Structured Data                Semi-structured Data          Unstructured Data
                                               JSON
          Hive HBas                                XML JSON


                e

                                                  MapReduce /
                                                   Analytics
      Hadoop Framework                               (Pig, Cloudera,
          (Clusters)                                Datameer, …etc)

             A Unified Data Modeler in the World of Big Data
Problem of Non-Relational Data Stores


— NoSQL and unstructured data store performance has
  a price:
     − Denormalized data
     − Data consistencies & integrities – only guarantee “eventual”
       consistency
— Some data (such as user comments) can tolerate
  these drawbacks
— Some data (such as financial, transactional) cannot
— Enterprises conclusions:
     − NoSQL & Big Data are good for business intelligences data
     − Financial & transactional data still require relational databases
     − Compliance requirements / regulations
 8
The New World of Data Modeling with Relational
and Big Data

— The new enterprise data landscape:
     − Different relational databases
     − Distributed hadoop cluster with structured, semi-structured,
       and unstructured data which is constantly changing

— Challenges to the data architects / modelers:
     − Identify potential relationships between different data stores
     − Automated way to track and update the unified view

— Data Modeling tools, such as ERwin, need to evolve
  to present a single unified view of ALL enterprise
  data


 9
ERwin Tapping into Hadoop

                                                               Data Sources

                JSON / XML Headers
                                                                              HQL (Hive SQL),
                                                                                 JSON, XML,
                                                                           Unstructured Data / Files



                                                                                      HDFS
                   Structured Data                  Semi-structured Data          Unstructured Data
                                                            JSON
                    Hive HBas                                   XML JSON

    HQL                   e

                                                             MapReduce /
                                                              Analytics
            Hadoop Framework                                    (Pig, Cloudera,
                (Clusters)                                     Datameer, …etc)


          A Unified Data Modeler in the World of Big Data
CA Internal Proof of Concept
Big Data of CA Enterprise Products

                                                APM, Clarity,
                                                  Nimsoft,                •   CA Hadoop test framework
                                              WatchMouse, …etc                with 7 Dell 2950’s
           Unified View of
                                      CQL (Cassandra SQL),                •   Dump / store logs & data from
             All Models                    HQL (Hive SQL),
                                                mongoDB,                      various CA products into
                                               JSON, XML                      HDFS
                         Reverse Engineer                                 •   Transform logs & data into
                        JSON / XML Headers
                                                                              structured or semi-structured
                                                                              data stores
                     CA Hadoop Test              Semi-structured Data     •   Reverse engineer to build
                       Framework                (HDFS / Cassandra FS)         logical model of different CA
                                                       JSON                   products
                                                           XML JSON

                                                                          •   Identify potential relationships
                      Cassandra /                                             between data stores
Reverse Engineer     Hive / Hbase /
  CQL / HQL /
 Mongo Query
                       mongoDB
    (JSON)

                        A Unified Data Modeler in the World of Big Data
What We Learn So Far


— Most non-relational data store will be a simple entity / box in
  ERwin
  − Attributes in each non-relational entity include key indices and columns
  − Supercolumns or nested structures can be expanded in the same entity
    or depict as hierarchy
— Metadata are important:
  − Describes the kind of information / data
  − Structure of the columns in a supercomlumn
— There are relationships between non-relational data stores and
  relational databases
— So far, we only investigated reverse engineering of data stores
  into logical model. Forward engineering of logical model into
  physical non-relational data stores may be useful
— We are not there yet, but a unified data modeler of relational and
  Big Data is definitely possible

 12
The Future of Data Modeling


— Presented a (but not the only) direction that data
  modeling can be evolved to model both relational
  and non-relational data stores
— Data explosion will continue and accelerate at a
  much faster rate
— Business must rely more and more on collected data
  to gather business intelligence to compete
— Role of data architect and modeler will become more
  important – to analyze Big Data, enterprises must
  first understand what they have!


 13
Thank You – Questions?




 William Luk
 (650)298-3111
 William.luk@ca.com
 http://www.linkedin.com/pub/william-luk/1/818/bb1
Legal notice


Copyright © 2012 CA. All rights reserved. All trademarks, trade names, service marks and logos referenced
herein belong to their respective companies. No unauthorized use, copying or distribution permitted.

THIS PRESENTATION IS FOR YOUR INFORMATIONAL PURPOSES ONLY. CA assumes no responsibility for
    the accuracy or completeness of the information. TO THE EXTENT PERMITTED BY APPLICABLE LAW,
    CA PROVIDES THIS DOCUMENT ―AS IS‖ WITHOUT WARRANTY OF ANY KIND, INCLUDING,
    WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
    PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event will CA be liable for any loss or
    damage, direct or indirect, in connection with this presentation, including, without limitation, lost profits,
    lost investment, business interruption, goodwill, or lost data, even if CA is expressly advised of the
    possibility of such damages.

Certain information in this presentation may outline CA’s general product direction. This presentation shall not
    serve to (i) affect the rights and/or obligations of CA or its licensees under any existing or future written
    license agreement or services agreement relating to any CA software product; or (ii) amend any product
    documentation or specifications for any CA software product. The development, release and timing of any
    features or functionality described in this presentation remain at CA’s sole discretion.

Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA
    product release referenced in this presentation, CA may make such release available (i) for sale to new
    licensees of such product; and (ii) in the form of a regularly scheduled major product release. Such
    releases may be made available to current licensees of such product who are current subscribers to CA
    maintenance and support on a when and if-available basis.

More Related Content

What's hot

Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
Amr Awadallah
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage SystemHugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
qlw5
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
Xplenty
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
Max Neunhöffer
 
SQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond RelationalSQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond Relational
Michael Rys
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
Ahmed Salman
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
BigDataCloud
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
Lucidworks (Archived)
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Cloudera, Inc.
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
SAP Technology
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for robotics
João Gabriel Lima
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
Michael Rys
 
Nosql
NosqlNosql
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32
jujukoko
 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
DataminingTools Inc
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
Jonathan Bloom
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
Anuja Gunale
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
Nicolas Morales
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
Scott Gray
 

What's hot (20)

Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage SystemHugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
SQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond RelationalSQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond Relational
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for robotics
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
 
Nosql
NosqlNosql
Nosql
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32
 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 

Similar to A unified data modeler in the world of big data

Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
MapR Technologies
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
AshishRathore72
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Stephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Stephen Alex
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
cdmaxime
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
Jongwook Woo
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
Mohammadhasan Farazmand
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
Roby Chen
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
Khanderao Kand
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Apache Drill
Apache DrillApache Drill
Apache Drill
Ted Dunning
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
DataWorks Summit
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
samthemonad
 

Similar to A unified data modeler in the world of big data (20)

Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 

Recently uploaded

“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 

Recently uploaded (20)

“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 

A unified data modeler in the world of big data

  • 1. A Unified Data Modeler in the World of Big Data William Luk, CA Technologies Inc 2012 Sr Director, Software Engineering – Data Modeling Collaboratio Session Code: HT01 n By Design
  • 2. Speaker Bio Senior Director of software development in the Data Management BU, head of ERwin engineering and level 2 support  Experience in databases, data security, and data management;  BS & MS in CS;
  • 3. A Unified Data Modeler in the World of Big Data Session Agenda — Where are we & how do we get here? — Overview of the Big Data world — Challenges to enterprises and data architect — Extending data modeling to include Big Data —Q&A 3
  • 4. Data Modeling Past 30 Years — Entity-Relationship (ER) modeling has served us well since mid-70’s — Data architects / modelers have used ER tools to ensure data consistencies and integrities for very large enterprises — Ability to integrate new databases from mergers and acquisitions; — A map of where all your data; — Ability to handle large & complex data model; — Then, the Internet & social networks 4
  • 5. Internet & Social Networks — Early Internet used the classical LAMP stack – Linux, Apache Web Server, MySQL Database, and Perl/PHP/Python — Basic web servers & DB’s served us well for basic web portal — Internet growth + social networks changed the scale of database / data store — Traditional relational databases have difficulties handling the scale & required (sub second) response time for web — Emergences of NoSQL data store 5
  • 6. Arrival of Big Data — Wealth of valuable data to collect: − Users entered information − History / logs of users interaction — Not always fit nicely into structured data stores (relational or NoSQL) — Need to harvest / analyze the data to compete — Challenges of capturing, storing, searching, anlayzing, and visualizing very large and complex data sets — Large, distributed, analytical platforms (Hadoop) emerged 6
  • 7. Enterprise Big Data / Hadoop Workflow Customer Data Source HQL (Hive SQL), JSON, XML, …etc Unstructured Data / Files HDFS Structured Data Semi-structured Data Unstructured Data JSON Hive HBas XML JSON e MapReduce / Analytics Hadoop Framework (Pig, Cloudera, (Clusters) Datameer, …etc) A Unified Data Modeler in the World of Big Data
  • 8. Problem of Non-Relational Data Stores — NoSQL and unstructured data store performance has a price: − Denormalized data − Data consistencies & integrities – only guarantee “eventual” consistency — Some data (such as user comments) can tolerate these drawbacks — Some data (such as financial, transactional) cannot — Enterprises conclusions: − NoSQL & Big Data are good for business intelligences data − Financial & transactional data still require relational databases − Compliance requirements / regulations 8
  • 9. The New World of Data Modeling with Relational and Big Data — The new enterprise data landscape: − Different relational databases − Distributed hadoop cluster with structured, semi-structured, and unstructured data which is constantly changing — Challenges to the data architects / modelers: − Identify potential relationships between different data stores − Automated way to track and update the unified view — Data Modeling tools, such as ERwin, need to evolve to present a single unified view of ALL enterprise data 9
  • 10. ERwin Tapping into Hadoop Data Sources JSON / XML Headers HQL (Hive SQL), JSON, XML, Unstructured Data / Files HDFS Structured Data Semi-structured Data Unstructured Data JSON Hive HBas XML JSON HQL e MapReduce / Analytics Hadoop Framework (Pig, Cloudera, (Clusters) Datameer, …etc) A Unified Data Modeler in the World of Big Data
  • 11. CA Internal Proof of Concept Big Data of CA Enterprise Products APM, Clarity, Nimsoft, • CA Hadoop test framework WatchMouse, …etc with 7 Dell 2950’s Unified View of CQL (Cassandra SQL), • Dump / store logs & data from All Models HQL (Hive SQL), mongoDB, various CA products into JSON, XML HDFS Reverse Engineer • Transform logs & data into JSON / XML Headers structured or semi-structured data stores CA Hadoop Test Semi-structured Data • Reverse engineer to build Framework (HDFS / Cassandra FS) logical model of different CA JSON products XML JSON • Identify potential relationships Cassandra / between data stores Reverse Engineer Hive / Hbase / CQL / HQL / Mongo Query mongoDB (JSON) A Unified Data Modeler in the World of Big Data
  • 12. What We Learn So Far — Most non-relational data store will be a simple entity / box in ERwin − Attributes in each non-relational entity include key indices and columns − Supercolumns or nested structures can be expanded in the same entity or depict as hierarchy — Metadata are important: − Describes the kind of information / data − Structure of the columns in a supercomlumn — There are relationships between non-relational data stores and relational databases — So far, we only investigated reverse engineering of data stores into logical model. Forward engineering of logical model into physical non-relational data stores may be useful — We are not there yet, but a unified data modeler of relational and Big Data is definitely possible 12
  • 13. The Future of Data Modeling — Presented a (but not the only) direction that data modeling can be evolved to model both relational and non-relational data stores — Data explosion will continue and accelerate at a much faster rate — Business must rely more and more on collected data to gather business intelligence to compete — Role of data architect and modeler will become more important – to analyze Big Data, enterprises must first understand what they have! 13
  • 14. Thank You – Questions? William Luk (650)298-3111 William.luk@ca.com http://www.linkedin.com/pub/william-luk/1/818/bb1
  • 15. Legal notice Copyright © 2012 CA. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. No unauthorized use, copying or distribution permitted. THIS PRESENTATION IS FOR YOUR INFORMATIONAL PURPOSES ONLY. CA assumes no responsibility for the accuracy or completeness of the information. TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS DOCUMENT ―AS IS‖ WITHOUT WARRANTY OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event will CA be liable for any loss or damage, direct or indirect, in connection with this presentation, including, without limitation, lost profits, lost investment, business interruption, goodwill, or lost data, even if CA is expressly advised of the possibility of such damages. Certain information in this presentation may outline CA’s general product direction. This presentation shall not serve to (i) affect the rights and/or obligations of CA or its licensees under any existing or future written license agreement or services agreement relating to any CA software product; or (ii) amend any product documentation or specifications for any CA software product. The development, release and timing of any features or functionality described in this presentation remain at CA’s sole discretion. Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in this presentation, CA may make such release available (i) for sale to new licensees of such product; and (ii) in the form of a regularly scheduled major product release. Such releases may be made available to current licensees of such product who are current subscribers to CA maintenance and support on a when and if-available basis.