SlideShare a Scribd company logo
1 of 15
A Unified Data
Modeler in the World
of Big Data
William Luk, CA Technologies Inc
                                                    2012
Sr Director, Software Engineering – Data Modeling   Collaboratio
Session Code: HT01                                  n By
                                                    Design
Speaker Bio


Senior Director of software
  development in the Data
  Management BU, head of
  ERwin engineering and level
  2 support
 Experience in
  databases, data
  security, and data
  management;
 BS & MS in CS;
A Unified Data Modeler in the World of Big Data



 Session Agenda
 — Where are we & how do we get here?
 — Overview of the Big Data world
 — Challenges to enterprises and data architect
 — Extending data modeling to include Big Data
 —Q&A




 3
Data Modeling Past 30 Years



— Entity-Relationship (ER) modeling has served us well
  since mid-70’s
— Data architects / modelers have used ER tools to
  ensure data consistencies and integrities for very
  large enterprises
— Ability to integrate new databases from mergers and
  acquisitions;
— A map of where all your data;
— Ability to handle large & complex data model;
— Then, the Internet & social networks
 4
Internet & Social Networks



— Early Internet used the classical LAMP stack – Linux,
  Apache Web Server, MySQL Database, and
  Perl/PHP/Python
— Basic web servers & DB’s served us well for basic
  web portal
— Internet growth + social networks changed the scale
  of database / data store
— Traditional relational databases have difficulties
  handling the scale & required (sub second) response
  time for web
— Emergences of NoSQL data store
 5
Arrival of Big Data



— Wealth of valuable data to collect:
     − Users entered information
     − History / logs of users interaction

— Not always fit nicely into structured data stores
  (relational or NoSQL)
— Need to harvest / analyze the data to compete
— Challenges of
  capturing, storing, searching, anlayzing, and
  visualizing very large and complex data sets
— Large, distributed, analytical platforms (Hadoop)
  emerged
 6
Enterprise Big Data / Hadoop Workflow

  Customer Data Source




              HQL (Hive SQL),
             JSON, XML, …etc                                   Unstructured Data / Files


                                                                             HDFS
          Structured Data                Semi-structured Data          Unstructured Data
                                               JSON
          Hive HBas                                XML JSON


                e

                                                  MapReduce /
                                                   Analytics
      Hadoop Framework                               (Pig, Cloudera,
          (Clusters)                                Datameer, …etc)

             A Unified Data Modeler in the World of Big Data
Problem of Non-Relational Data Stores


— NoSQL and unstructured data store performance has
  a price:
     − Denormalized data
     − Data consistencies & integrities – only guarantee “eventual”
       consistency
— Some data (such as user comments) can tolerate
  these drawbacks
— Some data (such as financial, transactional) cannot
— Enterprises conclusions:
     − NoSQL & Big Data are good for business intelligences data
     − Financial & transactional data still require relational databases
     − Compliance requirements / regulations
 8
The New World of Data Modeling with Relational
and Big Data

— The new enterprise data landscape:
     − Different relational databases
     − Distributed hadoop cluster with structured, semi-structured,
       and unstructured data which is constantly changing

— Challenges to the data architects / modelers:
     − Identify potential relationships between different data stores
     − Automated way to track and update the unified view

— Data Modeling tools, such as ERwin, need to evolve
  to present a single unified view of ALL enterprise
  data


 9
ERwin Tapping into Hadoop

                                                               Data Sources

                JSON / XML Headers
                                                                              HQL (Hive SQL),
                                                                                 JSON, XML,
                                                                           Unstructured Data / Files



                                                                                      HDFS
                   Structured Data                  Semi-structured Data          Unstructured Data
                                                            JSON
                    Hive HBas                                   XML JSON

    HQL                   e

                                                             MapReduce /
                                                              Analytics
            Hadoop Framework                                    (Pig, Cloudera,
                (Clusters)                                     Datameer, …etc)


          A Unified Data Modeler in the World of Big Data
CA Internal Proof of Concept
Big Data of CA Enterprise Products

                                                APM, Clarity,
                                                  Nimsoft,                •   CA Hadoop test framework
                                              WatchMouse, …etc                with 7 Dell 2950’s
           Unified View of
                                      CQL (Cassandra SQL),                •   Dump / store logs & data from
             All Models                    HQL (Hive SQL),
                                                mongoDB,                      various CA products into
                                               JSON, XML                      HDFS
                         Reverse Engineer                                 •   Transform logs & data into
                        JSON / XML Headers
                                                                              structured or semi-structured
                                                                              data stores
                     CA Hadoop Test              Semi-structured Data     •   Reverse engineer to build
                       Framework                (HDFS / Cassandra FS)         logical model of different CA
                                                       JSON                   products
                                                           XML JSON

                                                                          •   Identify potential relationships
                      Cassandra /                                             between data stores
Reverse Engineer     Hive / Hbase /
  CQL / HQL /
 Mongo Query
                       mongoDB
    (JSON)

                        A Unified Data Modeler in the World of Big Data
What We Learn So Far


— Most non-relational data store will be a simple entity / box in
  ERwin
  − Attributes in each non-relational entity include key indices and columns
  − Supercolumns or nested structures can be expanded in the same entity
    or depict as hierarchy
— Metadata are important:
  − Describes the kind of information / data
  − Structure of the columns in a supercomlumn
— There are relationships between non-relational data stores and
  relational databases
— So far, we only investigated reverse engineering of data stores
  into logical model. Forward engineering of logical model into
  physical non-relational data stores may be useful
— We are not there yet, but a unified data modeler of relational and
  Big Data is definitely possible

 12
The Future of Data Modeling


— Presented a (but not the only) direction that data
  modeling can be evolved to model both relational
  and non-relational data stores
— Data explosion will continue and accelerate at a
  much faster rate
— Business must rely more and more on collected data
  to gather business intelligence to compete
— Role of data architect and modeler will become more
  important – to analyze Big Data, enterprises must
  first understand what they have!


 13
Thank You – Questions?




 William Luk
 (650)298-3111
 William.luk@ca.com
 http://www.linkedin.com/pub/william-luk/1/818/bb1
Legal notice


Copyright © 2012 CA. All rights reserved. All trademarks, trade names, service marks and logos referenced
herein belong to their respective companies. No unauthorized use, copying or distribution permitted.

THIS PRESENTATION IS FOR YOUR INFORMATIONAL PURPOSES ONLY. CA assumes no responsibility for
    the accuracy or completeness of the information. TO THE EXTENT PERMITTED BY APPLICABLE LAW,
    CA PROVIDES THIS DOCUMENT ―AS IS‖ WITHOUT WARRANTY OF ANY KIND, INCLUDING,
    WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
    PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event will CA be liable for any loss or
    damage, direct or indirect, in connection with this presentation, including, without limitation, lost profits,
    lost investment, business interruption, goodwill, or lost data, even if CA is expressly advised of the
    possibility of such damages.

Certain information in this presentation may outline CA’s general product direction. This presentation shall not
    serve to (i) affect the rights and/or obligations of CA or its licensees under any existing or future written
    license agreement or services agreement relating to any CA software product; or (ii) amend any product
    documentation or specifications for any CA software product. The development, release and timing of any
    features or functionality described in this presentation remain at CA’s sole discretion.

Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA
    product release referenced in this presentation, CA may make such release available (i) for sale to new
    licensees of such product; and (ii) in the form of a regularly scheduled major product release. Such
    releases may be made available to current licensees of such product who are current subscribers to CA
    maintenance and support on a when and if-available basis.

More Related Content

What's hot

Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage SystemHugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage Systemqlw5
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionXplenty
 
SQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond RelationalSQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond RelationalMichael Rys
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...Lucidworks (Archived)
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Cloudera, Inc.
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for roboticsJoão Gabriel Lima
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataMichael Rys
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32jujukoko
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sqlAnuja Gunale
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperScott Gray
 

What's hot (20)

Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
HugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage SystemHugeTable:Application-Oriented Structure Data Storage System
HugeTable:Application-Oriented Structure Data Storage System
 
Hive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly CompetitionHive vs Hbase, a Friendly Competition
Hive vs Hbase, a Friendly Competition
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
SQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond RelationalSQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond Relational
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ..."A Study of I/O and Virtualization Performance with a Search Engine based on ...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for robotics
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
 
Nosql
NosqlNosql
Nosql
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32
 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 

Similar to A unified data modeler in the world of big data

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldJongwook Woo
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQLPhilippe Julio
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlKhanderao Kand
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 

Similar to A unified data modeler in the world of big data (20)

Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 

Recently uploaded

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 

Recently uploaded (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 

A unified data modeler in the world of big data

  • 1. A Unified Data Modeler in the World of Big Data William Luk, CA Technologies Inc 2012 Sr Director, Software Engineering – Data Modeling Collaboratio Session Code: HT01 n By Design
  • 2. Speaker Bio Senior Director of software development in the Data Management BU, head of ERwin engineering and level 2 support  Experience in databases, data security, and data management;  BS & MS in CS;
  • 3. A Unified Data Modeler in the World of Big Data Session Agenda — Where are we & how do we get here? — Overview of the Big Data world — Challenges to enterprises and data architect — Extending data modeling to include Big Data —Q&A 3
  • 4. Data Modeling Past 30 Years — Entity-Relationship (ER) modeling has served us well since mid-70’s — Data architects / modelers have used ER tools to ensure data consistencies and integrities for very large enterprises — Ability to integrate new databases from mergers and acquisitions; — A map of where all your data; — Ability to handle large & complex data model; — Then, the Internet & social networks 4
  • 5. Internet & Social Networks — Early Internet used the classical LAMP stack – Linux, Apache Web Server, MySQL Database, and Perl/PHP/Python — Basic web servers & DB’s served us well for basic web portal — Internet growth + social networks changed the scale of database / data store — Traditional relational databases have difficulties handling the scale & required (sub second) response time for web — Emergences of NoSQL data store 5
  • 6. Arrival of Big Data — Wealth of valuable data to collect: − Users entered information − History / logs of users interaction — Not always fit nicely into structured data stores (relational or NoSQL) — Need to harvest / analyze the data to compete — Challenges of capturing, storing, searching, anlayzing, and visualizing very large and complex data sets — Large, distributed, analytical platforms (Hadoop) emerged 6
  • 7. Enterprise Big Data / Hadoop Workflow Customer Data Source HQL (Hive SQL), JSON, XML, …etc Unstructured Data / Files HDFS Structured Data Semi-structured Data Unstructured Data JSON Hive HBas XML JSON e MapReduce / Analytics Hadoop Framework (Pig, Cloudera, (Clusters) Datameer, …etc) A Unified Data Modeler in the World of Big Data
  • 8. Problem of Non-Relational Data Stores — NoSQL and unstructured data store performance has a price: − Denormalized data − Data consistencies & integrities – only guarantee “eventual” consistency — Some data (such as user comments) can tolerate these drawbacks — Some data (such as financial, transactional) cannot — Enterprises conclusions: − NoSQL & Big Data are good for business intelligences data − Financial & transactional data still require relational databases − Compliance requirements / regulations 8
  • 9. The New World of Data Modeling with Relational and Big Data — The new enterprise data landscape: − Different relational databases − Distributed hadoop cluster with structured, semi-structured, and unstructured data which is constantly changing — Challenges to the data architects / modelers: − Identify potential relationships between different data stores − Automated way to track and update the unified view — Data Modeling tools, such as ERwin, need to evolve to present a single unified view of ALL enterprise data 9
  • 10. ERwin Tapping into Hadoop Data Sources JSON / XML Headers HQL (Hive SQL), JSON, XML, Unstructured Data / Files HDFS Structured Data Semi-structured Data Unstructured Data JSON Hive HBas XML JSON HQL e MapReduce / Analytics Hadoop Framework (Pig, Cloudera, (Clusters) Datameer, …etc) A Unified Data Modeler in the World of Big Data
  • 11. CA Internal Proof of Concept Big Data of CA Enterprise Products APM, Clarity, Nimsoft, • CA Hadoop test framework WatchMouse, …etc with 7 Dell 2950’s Unified View of CQL (Cassandra SQL), • Dump / store logs & data from All Models HQL (Hive SQL), mongoDB, various CA products into JSON, XML HDFS Reverse Engineer • Transform logs & data into JSON / XML Headers structured or semi-structured data stores CA Hadoop Test Semi-structured Data • Reverse engineer to build Framework (HDFS / Cassandra FS) logical model of different CA JSON products XML JSON • Identify potential relationships Cassandra / between data stores Reverse Engineer Hive / Hbase / CQL / HQL / Mongo Query mongoDB (JSON) A Unified Data Modeler in the World of Big Data
  • 12. What We Learn So Far — Most non-relational data store will be a simple entity / box in ERwin − Attributes in each non-relational entity include key indices and columns − Supercolumns or nested structures can be expanded in the same entity or depict as hierarchy — Metadata are important: − Describes the kind of information / data − Structure of the columns in a supercomlumn — There are relationships between non-relational data stores and relational databases — So far, we only investigated reverse engineering of data stores into logical model. Forward engineering of logical model into physical non-relational data stores may be useful — We are not there yet, but a unified data modeler of relational and Big Data is definitely possible 12
  • 13. The Future of Data Modeling — Presented a (but not the only) direction that data modeling can be evolved to model both relational and non-relational data stores — Data explosion will continue and accelerate at a much faster rate — Business must rely more and more on collected data to gather business intelligence to compete — Role of data architect and modeler will become more important – to analyze Big Data, enterprises must first understand what they have! 13
  • 14. Thank You – Questions? William Luk (650)298-3111 William.luk@ca.com http://www.linkedin.com/pub/william-luk/1/818/bb1
  • 15. Legal notice Copyright © 2012 CA. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. No unauthorized use, copying or distribution permitted. THIS PRESENTATION IS FOR YOUR INFORMATIONAL PURPOSES ONLY. CA assumes no responsibility for the accuracy or completeness of the information. TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS DOCUMENT ―AS IS‖ WITHOUT WARRANTY OF ANY KIND, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. In no event will CA be liable for any loss or damage, direct or indirect, in connection with this presentation, including, without limitation, lost profits, lost investment, business interruption, goodwill, or lost data, even if CA is expressly advised of the possibility of such damages. Certain information in this presentation may outline CA’s general product direction. This presentation shall not serve to (i) affect the rights and/or obligations of CA or its licensees under any existing or future written license agreement or services agreement relating to any CA software product; or (ii) amend any product documentation or specifications for any CA software product. The development, release and timing of any features or functionality described in this presentation remain at CA’s sole discretion. Notwithstanding anything in this presentation to the contrary, upon the general availability of any future CA product release referenced in this presentation, CA may make such release available (i) for sale to new licensees of such product; and (ii) in the form of a regularly scheduled major product release. Such releases may be made available to current licensees of such product who are current subscribers to CA maintenance and support on a when and if-available basis.