SlideShare a Scribd company logo
1 of 31
Download to read offline
Data Integration @ Scale                                                                7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                     Michaelbrodie.com




                                                                              Michael L. Brodie
                                                                                     STI Fellow
                                                                             Michaelbrodie.com
                       ! Copyright 2011   STI - INTERNATIONAL www.sti2.org




                              The following is for
                           Fortune 100 organizations




                                                                                                              1
Data Integration @ Scale                               7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                    Michaelbrodie.com




                   Based on 2010 industry data verified by
                     4      analyst organizations
                     10+    Fortune 100 enterprises




                             Did you know…

                           Information Systems?




                                                                             2
Data Integration @ Scale                                     7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                          Michaelbrodie.com




                  Information Systems in a Fortune 100 company
                    ~10 thousand
                    100s per business area (e.g., billing)
                    30% mission critical
                    30% modified significantly annually




                              Did you know…

                                 Databases?




                                                                                   3
Data Integration @ Scale                                           7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                Michaelbrodie.com




                     Logical databases in a Fortune 100 …
                      ~10,000
                      100s     per business area (e.g., billing)
                      ~90% relational
                      100s     new / year




                     Database size …
                      20% large     1-10 TB
                      40% medium 10 GB-1 TB
                      40% small     <10GB




                                                                                         4
Data Integration @ Scale                                 7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                      Michaelbrodie.com




                    Database growth…
                     2X every 18 mo OLTP / transactional
                     2X every 12 mo OLAP / decision support




                   Logical database complexity …
                    Tables/database: 100-200
                    Attributes/table: 50-200
                    Stored procedures/database: ~# tables
                    Web services/database: < # tables but !!
                    Views: ~3X # tables




                                                                               5
Data Integration @ Scale                           7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                Michaelbrodie.com




                             Did you know…

                           Database Design?




                    Database design level …
                     ~100% table (SQL)
                     ~0%     Entity Relationship




                                                                         6
Data Integration @ Scale                               7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                    Michaelbrodie.com




                              Did you know…
                                Relational
                           Database Integration?




                     Data integration accounts for …
                      ~40% software project costs




                                                                             7
Data Integration @ Scale                                          7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                               Michaelbrodie.com




                    Circa 1985
                     Tables: 2+
                     Dominant operation: join
                     Design: schema-based




                    Circa 2010
                     Tables: 10s-1000s
                     Dominant operations
                           !  Extract, Translate, Load (ETL)
                           !  File transfer
                           !  Then … join

                       Design
                           !  Evidence gathering (often manual)
                           !  60-75% schema-less




                                                                                        8
Data Integration @ Scale                             7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                  Michaelbrodie.com




                               Did you know…

                           Information Ecosystems?




                    Information systems that interact to
                    support domain-specific end-to-end
                       processes, e.g., supply chain,
                          billing, marketing, sales,
                    engineering, operations, finance, …




                                                                           9
Data Integration @ Scale                                   7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                        Michaelbrodie.com




                     Information Ecosystems involve …
                       100s-1,000s Information Systems
                       100s-1,000s Databases




                     Conclusion
                      Information systems & databases live in
                       complex, deeply inter-connected worlds




                                                                                10
Data Integration @ Scale                                       7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                            Michaelbrodie.com




                              Did you know…
                           Database Integration
                                @ Scale?




                  Database integration in an Information Ecosystem …
                   100s-1,000s      source databases
                   1,000s-millions function-specific views




                                                                                    11
Data Integration @ Scale                         7/8/11 Michael L. Brodie
2011 STI Semantic Summit                              Michaelbrodie.com




                        Every organization with a
                      database depends critically on
                                   …




                                Data
                                   +
                           Data Integration


                                                                      12
Data Integration @ Scale                      7/8/11 Michael L. Brodie
2011 STI Semantic Summit                           Michaelbrodie.com




                             Boring huh?




                           Then how about …




                                                                   13
Data Integration @ Scale                                   7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                        Michaelbrodie.com




                           Deeply sophisticated analysis
                                 [data integration]
                                     across …




                             28 intelligence databases
                                 connected across
                            17 US intelligence agencies
                                    over 5 years
                                  totally missed …




                                                                                14
Data Integration @ Scale                      7/8/11 Michael L. Brodie
2011 STI Semantic Summit                           Michaelbrodie.com




                           Data Integration

                              @ Scale




                                                                   15
Data Integration @ Scale                                                                      7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                           Michaelbrodie.com




                    What is Data Integration?
                              Combine 2+ data items to form a meaningful result




                                 Source1
                                  …                     Mapping                  Target

                                 Sourcen




                    1980’s Data Integration

                    Account                                     Charge
                   Name      Address   Acct#      …             Acct#   Charge   Date     …
                   Bill      43 Main   123…                     123…    $2.32    2/1/10
                   Sally     12 Peel   489…                     149     $1.17    1/7/10




                          TelcoSum (Charge) where Account.Acct# = Charge.Acct#


                          DI computation                                   Entity Mapping
                                        Monthly Bill
                                           Name       Address   Acct#   Amount
                                           Bill       43 Main   123…    $2.32
                                           Sally      12 Peel   489…    $0.00




                                                                                                                   16
Data Integration @ Scale                                                                                7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                     Michaelbrodie.com




                    What does DI mean?
                      "  Determined       by application “semantics”

                      "  Financial     Close for a Large Enterprise
                           •  What: Combine data from 100s-1,000s of systems
                           •  Meaning: Defined by financial accounting practices, jurisdictional
                              regulations, industry standards, …
                           •  Speed: 1-2 days after end of period
                           •  Solutions: e.g., SAP Financial, PeopleSoft Financial, Hyperion
                           •  Governance: laws and regulations
                           •  Correctness:
                                o  US= Generally Accepted Accounting Principles (US GAAP)
                                o  GB = Generally Accepted Accounting Practice (UK GAAP)
                                o  Etc.



                      "  Data   Integration =
                           •  Entity mapping
                           •  Data integration computation




                    Definitions (cont.)
                      "  Semantically       homogeneous relational data descriptions
                           •  Verified by an authoritative expert
                           •  To describe the same entity (considering set inclusion)
                           •  For which there is one and only one simple, complete entity mapping
                              between the relational schemas for the data descriptions
                           •  The entity mapping can be defined in relational expressions


                      "  Semantically       heterogeneous relational data descriptions
                           •  Verified by an authoritative expert
                           •  To describe the same entity (considering set inclusion)
                           •  For which there is not a complete entity mapping between the relational
                              schemas for the data descriptions
                           •  Entity mappings cannot be defined in relational expressions




                                                                                                                             17
Data Integration @ Scale                                                                       7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                            Michaelbrodie.com




                    Deeper Thoughts
                      "  Schemas:      sufficient but not necessary

                      "  Simple    map: 1:N, … atomicity

                      "  Degree     of semantic heterogeneity



                    Semantic                                                         Semantic
                   homogeneity                                                     heterogeneity


                   Complete                                                                 No
                   mapping                                                                mapping




                    Consequences
                      "  Semantically      homogeneous relational data descriptions
                           •  Data independence + updatable => consistent views
                           •  Provide basis for
                                o  Correct
                                o  Efficient
                           •  Schema level design and integration


                      "  Semantically      heterogeneous relational data descriptions
                           •  No guarantee of: Data independence + updatable views
                           •  Provide no basis for
                                o  Correctness
                                o  Efficiency
                           •  Schema level design and integration not always possible


                      "  Data   modelling and integration best practices
                           •  Semantic homogeneity
                           •  Radical simplification
                           •  Standardization: ontologies, schemas, application solutions, …




                                                                                                                    18
Data Integration @ Scale                                                                                                    7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                                         Michaelbrodie.com




                   Semantic Heterogeneity …
                      "  Is   unavoidable @ scale
                           •  Autonomous design
                           •  Ontologically heterogeneous, e.g., Enterprise Customer
                                o  Legal entity-based ontology
                                o  Location-based ontology



                      "  Dominates      semantic homogeneity




                    Information Ecosystem Integration (3D)

                                           Telco Service Provider
                                                                VZW



                                                                Worldcom



                                                                MCI

                                                                BA


                                                                GTE
                                                                                                                  Business Domains


                                                                       Ordering           Marketing           Finance
                                                                  Billing        Repair               Sales
                   Enterprise Customer
                                                       Wells Fargo
                                                                            Wells Fargo
                                                     Wachovia

                                                Bank of America
                                            Merril Lynch              Bank of America




                                                                                                                                                 19
Data Integration @ Scale                                                                                  7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                       Michaelbrodie.com




                    Correctness & Efficiency
                    "  Legally   Binding Requirements
                       •  Law governing communications, financial transactions, privacy, …
                            o  E.g.,: Customer Proprietary Network Information (CPNI)
                       •  Regulation: rules defining operational details
                            o  E.g.,: FCC CPNI rules –customer views require manual verification
                       •  Customer agreements: SLAs for all services
                       •  Telco agreements: SLAs for all production systems


                    "  Mission   Critical: Operational & Information Correctness

                    "  Complex    Views
                       •  Customer Defined Views
                       •  Inconsistent & Conflicting Views
                             o  Business perspectives, manually defined, no ultimate truth or agreement
                             o  Conflicting telephone #: Long-distance vs. local




                    Data Modelling: Enterprise Customer
                    "  Enterprise       Customer data description
                       •  Enterprise customer identifier (ECID)
                       •  Enterprise customer data                           Level of abstraction
                             o  Telco information, e.g., billing, must be mapped (integrated into)
                               Enterprise Customer organizational units


                    "  Enterprise       Customer Ontologies = Conceptualizations
                       •    Legal entity
                       •    Locations
                       •    Telco contracts
                                                                                     Perspective
                       •    ~40 others




                                                                                                                               20
Data Integration @ Scale                                                              7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                   Michaelbrodie.com




                    Enterprise A

                  Legal entity structure (partial)
                  •  ~400 legal entities
                  •  14 levels deep
                  •  20,000 accounts (~50/entity)




                  Industry Standard Legal-entity Ontology
                                                                 Industry Standard Ontology




                                                                 Radically Simplified Schema




                  Legal-entity based ECID instance (DUNS code)




                                                                                                           21
Data Integration @ Scale                                                        7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                             Michaelbrodie.com




                    Location-based Ontology
                                                       Industry Standard Ontology




                                                       Location-based ECID instance

                     Radically Simplified Schema




                    Contract-based Ontology
                                                   Industry Standard Ontology




                                                      Contract-based ECID Instance


                    Radically Simplified Schema




                                                                                                     22
Data Integration @ Scale                                                                           7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                Michaelbrodie.com



                    Entity maps for semantically heterogeneous
                    Enterprise Customer Entities are schema-less

                     Location-based ECID schema                   Legal entity-based ECID schema




                                        Contract-based ECID schema




                            Enterprise Customer: 3 Departments in 2 locations


                                  les
                             Sa

                                 l- C
                              ga
                           Le

                                 l- L
                              ga
                           Le
                                                Ma
                                                     rke
                                                           tin
                                               Ma             g
                                                    rke
                                                          tin
                                                             g
                                                Le
                                                   ga
                                                      l-  L




                                                                                                                        23
Data Integration @ Scale                                                         7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                              Michaelbrodie.com



                           Location-Based Enterprise Customer IDs: 2 locations




                                                     Two Entities



                            Legal entity-based Enterprise Customer IDs:
                            3 organizations with 1, 2 and 3 departments




                                             les
                                        Sa

                                             l-    CMa
                                          ga             rke
                                       Le                      tin
                                                                  g
                                                    Le
                                                       ga
                                                          l-   L




                                                          Four Entities




                                                                                                      24
Data Integration @ Scale                                                                                                                                           7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                                                                                Michaelbrodie.com




                               Multiple Versions of Truth
                       1st Amendment                                       Civil
                                                                         Liberties                                                                                   Bigamy
                      Religious Freedom                                                                              LDS Law repealed
                                                                                                                          1990

                                                 Fundamental                     Against
                                                LDS Law 1838                     Women’s
                                                                                                                                        Statutory
                                                                                  rights
                                                                                                                                          rape
                                                                                                        Women’s
                                                                                                         rights
                                                                                       46%                                                             54%
                                                         Govt out of
                                                          bedroom




                     US           Fund.                                                                                                 State                               US
                                                        Gay                 Anti-        US                               LDS                          US
                   Federal         LDS                            ACLU
                                                                          Feminists
                                                                                                         Feminists                       of                               Federal
                                                       Rights                         Citizens                           Church                     Citizens
                    Govt          Church                                                                                                Utah                               Govt




                                                                                      Married
                                             Age: 48                   Male                             Female                    Age: 13

                                                                                             Nov 2002




                              www.sti2.org                                                                                                           5/12/2007 - Vienna




                     Data Integration Observations
                             "  Each              view is unique
                                    •  Entity map
                                    •  Data integration computation


                             "  Views                  often conflict or mutually inconsistent

                             "  Data             integration for semantically heterogeneous relations
                                    •  Schema-less
                                    •  By evidence gathering
                                    •  At instance level


                             "  Views                  are
                                    •  Dominant means of accessing data
                                    •  Lenses into Information Ecosystems




                                                                                                                                                                                        25
Data Integration @ Scale                                                                                7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                     Michaelbrodie.com




                    Codd’s Rules Extended
                      "    0: System must be relational, a     "    7: High-level insert, update, and
                           database, a management system            delete

                      "    1: Information rule (tabular)       "    8: Physical data independence

                      "    2: Guaranteed access rule (keys)    "    9: Logical data independence

                      "    3: Systematic treatment of nulls    "    10: Integrity independence

                      "    4: Active online catalog based on   "  11: Distribution independence
                           the relational model                   extra
                                                               "  12: Nonsubversion rule
                      "    5: Comprehensive data
                           sublanguage rule                    "    13: Relational power rule: The full
                                                                    power of relational technology
                      "    6: View updating rule                    applies to semantically
                                                                    homogeneous relational
                                                                    databases.




                   For Semantically Heterogeneous Databases
                      "  Cannot      be guaranteed
                            •  Relational model
                            •  Codd’s rules
                            •  Relational properties
                                 o  Updateable views
                            •  Relational assumptions
                                 o  Global schema
                                 o  Single version of truth



                      "  No    basis for correctness or efficiency

                      "  Unfortunately,         like the real world




                                                                                                                             26
Data Integration @ Scale                                                                                                                                                                         7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                                                                                                              Michaelbrodie.com




                    Shadow Approach
                          "  Database                        Approach: power and limits
                                   •     Data structures
                                   •     Schemas
                                   •     Data models
                                   •     Logic


                          "  Shadow                      Approach
                                   •  Meta-tagging
                                       o  Equivalence (E-tags) for entity mapping after resolution
                                       o  What (W-tags) for “meaning”
                                       o  Projection (P-tags) for ETL over logical data structures




                                                                 (Traditional) Data Integration
                              Billing                         Ordering                               Repair                       Sales                   Marketing                       Service
                               view                            view                                   view                        view                     view                            view

                             Evidence 1                                                              Evidence 3                   Evidence 4                  Evidence 5               Evidence N
                                                             Evidence 2


                                                                                                      Business                    In SQL,
                                                                                                                                  Java, C/C+
                                                                          Ontology                     Rules                      +…

                                              Total Semantic Heterogeneity
                                                        Schema
                                                                           Single Version
                                                                           of Truth ??



                                                                                                 Enterprise portal


                     Ontology 1               Ontology 2                 Ontology 3              Ontology 4


                    Schema                                                                                                                                                                  Metadata
                                              Schema
                                                                    Schema
                         DB                       DB                     DB                                                                                                                          data
                                                                           Single Version of Truth      Single Version of Truth     Single Version of Truth                Single Version of Truth
                    Single Version of Truth    Single Version of Truth




                                                                                                                                                                                                                      27
Data Integration @ Scale                                                                                                                       7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                                                            Michaelbrodie.com



                                        Shadow Theory-based Data Integration
                              Billing              Ordering                         Repair            Sales             Marketing          Service
                               view                 view                             view             view               view               view

                          Evidence 1                                                Evidence 3        Evidence 4             Evidence 5   Evidence N
                                                   Evidence 2


                                                                                     Business              Need a new language
                                                                                                           to implement rules by
                                                          Ontologies                  Rules                tags
                                                     Explicitly represented w/
                                                     Multiple Version of Truth.
                                                                                        Schema

                                                                                  Enterprise portal


                      Ontology 1        Ontology 2            Ontology 3          Ontology 4


                     Schema                                                                                                                 Metadata
                                       Schema
                                                          Schema
                        DB                 DB                 DB                                                                                 data
                                                                                                 Shadows
                                                               Shadows             Shadows                         Shadows                   Shadows
                    Shadows              Shadows




                              Additional slides




                                                                                                                                                                    28
Data Integration @ Scale                                                                                                7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                                     Michaelbrodie.com



                    Data Integration Pragmatics
                      "    Codd’s Rules violated                       "    Views conflict
                            •  Un-natural fit                                •  For good reason!
                            •  Primary keys                                         o  Sales vs. marketing vs. repair

                                  o  Problems: technical, managerial         •  Maintain conflicts or lose value
                                  o  Autonomous design
                                  o  Semantic heterogeneity
                                                                       "    Best practices rare
                            •  Few views updatable
                                                                             •  Semantic homogeneity
                                                                             •  Radical simplification
                      "    Schemas largely unavailable                       •  Standardization
                            •  10% not RDBMSs
                            •  30% legal
                                                                       "    Data Integration
                            •  60% autonomy                                  •    Table-level (Not ER-level)
                                                                             •    Instance-based
                      "    Federation often prohibited                       •    Seldom schema-based
                            •  Legal: privacy, CPNI, …                       •    Manual
                                                                             •    Evidence gathering

                      "    Single Source of Truth
                            •  Makes sense in specific cases
                            •  Usually not!




                      Current Data Integration Technologies
                      "    Relational DBMS                             "  Limitations
                                                                             •  Schema-based
                      "    Extract, Transform, & Load (ETL)
                                                                             •  Federation issues
                      "    Federation aka Enterprise                         •  Limited (no!) support for
                           information integration (EII)                             o  Schema-less integration
                                                                                     o  Instance-level integration
                      "    DI Platforms: 50+, e.g.,                                  o  Evidence gathering
                            "    IBM InfoSphere Information Server
                                                                                     o  Conflicting views
                            "    Informatica Platform
                            "    Microsoft SQL Server Integration                    o  Semantic heterogeneity
                                 Services                                    •  50+ ad hoc tools
                            "    Oracle Data Integrator, …
                                                                             •  Expensive to migrate
                            "    SAP Data Integrator/Federator
                                                                                Information Ecosystem onto
                                                                                data integration platform
                      "    Instance-level integration
                            •  XML-based integration
                            •  Custom tools: Verizon
                            •  Dun & Bradstreet ECID maps




                                                                                                                                             29
Data Integration @ Scale                                                                                                                                        7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                                                                             Michaelbrodie.com



                             Emerging Data Integration Technologies
                        "    DBMS Architecture                                                    "    Data Discovery
                               •  In-Database Analytics                                           "    Data Quality Services




                                                                            s   )




                                                                                                                                                             s
                               •  Database Appliances




                                                                        tic
                                                                                                       Instance-level integration




                                                                                                                                                       tic
                                                                                                  " 




                                                                      an
                             Distributed DBMS Architecture




                                                                                                                                                     an
                        "                                                                                •  XML-based integration
                                    Change Data Capture (CDC)




                                                                    m
                               • 




                                                                                                                                                 em
                                    Data Access
                                                                                                         •  Custom tools




                                                               se
                               • 
                                                                                                         •  Dun & Bradstreet ECID maps




                                                                                                                                            s
                               •    Data Replication



                                                                o




                                                                                                                                         l;
                               •    Distributed Data Caching
                                                                                                       Schema-less Tools
                                                              (n
                                                                                                  " 




                                                                                                                                      ua
                        "    Service Orientation                                                         •  Evidence gathering: Web, …
                                                       ce




                                                                                                                                    an
                               •  Data Services Platforms                                                •  Entity / Identity Resolution
                                                     an




                                                                                                                                 m
                               •  Information-As-A-Service                                               •  Social Networking
                                               rm




                                                                                                                               –
                               •  Enterprise Service Buses (ESBs)                                               o  Collaboration




                                                                                                                            g
                                                                                                                o  Web 2.0
                                            rfo




                        "    Master Data Management (MDM)




                                                                                                                          in
                                                                                                                o  Social Search




                                                                                                                       lv
                                        Pe




                                                                                                         •  Data & Process Mining




                                                                                                                     So
                                                                                                         •  Linguistics / Ontologies / Taxonomies
                                     –




                                                                                                                o    Derivation & Analysis




                                                                                                           m
                                 n




                                                                                                         •  Semantic Technologies
                              tio




                                                                                                        le
                                                                                                         •  Content Analytics


                                                                                                      ob
                         ta




                                                                                                  "    Shadow Theory
                       pu




                                                                                                   Pr
                                                                                                         •  Plato’s Cave
                    m




                                                                                                         •  Monadic 2nd Order Logic
                  Co




                                     Heterogeneous meanings of a telephone number
                                                                          North America Numbering Plan
                                                                           (NANP) telephone number
                                                                                 (NPA) NXX-XXXX
                                                                             NPA - Number Plan Area
                                                                             NXX - Central office code
                                                                              XXXX – telephone line #

                     Telephone attribute in customer                                                                              Domestic long distance services, charges
                              database.                                                                                                     determined by NPA.




                  Telephone entity = a physical phone.
                                                                                   ?
                                                                                                                                    International long distance services,
                                                                                                                                   charges determined by country code.




                  Owner identified uniquely with additional
                  attribute NPA+NXX+XXX+CustomerID             Ported to cell or internet phone         Routed to / from another phone number (e.g. 1 800 toll free)




                                                                                                                                                                                     30
Data Integration @ Scale                                                                                            7/8/11 Michael L. Brodie
2011 STI Semantic Summit                                                                                                 Michaelbrodie.com




                  Limitations of Current Solutions
                    "    Data Model Principles Don’t Apply       "    Tools Don’t Meet Requirements
                          •  Unique IDs (keys) are limiting            •  No single model of data
                          •  Modelling Choices: entities,              •  Inconsistent / conflicting views
                             properties, or relationships
                                                                 "    Meta-data and Multi-database Scale
                    "    Schemas Are Seldom Available                  •  Information ecosystem (vs. database)
                          •  10% not RDBMSs                               management
                                                                       •  Millions of views
                          •  30% legal
                                                                       •  10,000s of cross database dependencies
                          •  60% owners maintain autonomy
                               o  Biz: impact operations
                                                                 "    Migrating Information Ecosystems
                               o  Tech: impact schedule & cost
                                                                      to Vendor Solutions Is Prohibitive

                    "    Schema-based Integration Challenges
                                                                 "    Pragmatics
                          •  Inconsistent data models                  •  Bootstrapping in a legacy environment
                          •  View Conflicts
                          •  Cascaded Views                      "    Conceptual Modelling Support
                                                                       •  Different models required for different
                    "    Federation Seldom Allowed (CPNI)                 workers
                                                                       •  No round-trip engineering

                    "    No Single Source of Truth




                                                                                                                                         31

More Related Content

Viewers also liked (6)

STI Summit 2011 - Linked Data & Ontologies
STI Summit 2011 - Linked Data & OntologiesSTI Summit 2011 - Linked Data & Ontologies
STI Summit 2011 - Linked Data & Ontologies
 
STI2 Board Meeting 2011 - Financials
STI2 Board Meeting 2011 - FinancialsSTI2 Board Meeting 2011 - Financials
STI2 Board Meeting 2011 - Financials
 
STI2 Organisation 2012
STI2 Organisation 2012STI2 Organisation 2012
STI2 Organisation 2012
 
STI Summit 2011 - Diversity
STI Summit 2011 - DiversitySTI Summit 2011 - Diversity
STI Summit 2011 - Diversity
 
STI Summit 2011 - Shortipedia
STI Summit 2011 - ShortipediaSTI Summit 2011 - Shortipedia
STI Summit 2011 - Shortipedia
 
STI International Marketing Presentation
STI International Marketing PresentationSTI International Marketing Presentation
STI International Marketing Presentation
 

Similar to STI Summit 2011 - di@scale

Opening Keynote: Putting IBM Watson to Work
Opening Keynote: Putting IBM Watson to WorkOpening Keynote: Putting IBM Watson to Work
Opening Keynote: Putting IBM Watson to WorkInnoTech
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntelAPAC
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...DATAVERSITY
 
Big Data: Industry trends and key players
Big Data: Industry trends and key playersBig Data: Industry trends and key players
Big Data: Industry trends and key playersCM Research
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentStrategy 2 Market, Inc,
 
Customer summit - big data (final)
Customer summit  - big data (final)Customer summit  - big data (final)
Customer summit - big data (final)Anand Deshpande
 
Intel Cloud Summit: Big Data
Intel Cloud Summit: Big DataIntel Cloud Summit: Big Data
Intel Cloud Summit: Big DataIntelAPAC
 
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...Rana ZEINE, MD, PhD, MBA
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Matthew Petrillo
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloudtdwiindia
 
Dave Kellogg at MarkLogic 2010 Government Summit
Dave Kellogg at MarkLogic 2010 Government SummitDave Kellogg at MarkLogic 2010 Government Summit
Dave Kellogg at MarkLogic 2010 Government SummitDave Kellogg
 
DataCyte - The Future of Data Storage & Retrieval
DataCyte - The Future of Data Storage & RetrievalDataCyte - The Future of Data Storage & Retrieval
DataCyte - The Future of Data Storage & RetrievalDaniel Opland
 

Similar to STI Summit 2011 - di@scale (20)

Opening Keynote: Putting IBM Watson to Work
Opening Keynote: Putting IBM Watson to WorkOpening Keynote: Putting IBM Watson to Work
Opening Keynote: Putting IBM Watson to Work
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
STI Summit 2011 - Conclusion
STI Summit 2011 - ConclusionSTI Summit 2011 - Conclusion
STI Summit 2011 - Conclusion
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
 
Big Data: Industry trends and key players
Big Data: Industry trends and key playersBig Data: Industry trends and key players
Big Data: Industry trends and key players
 
Big Data: A Big Trap for Product Development
Big Data: A Big Trap for Product DevelopmentBig Data: A Big Trap for Product Development
Big Data: A Big Trap for Product Development
 
Customer summit - big data (final)
Customer summit  - big data (final)Customer summit  - big data (final)
Customer summit - big data (final)
 
Intel Cloud Summit: Big Data
Intel Cloud Summit: Big DataIntel Cloud Summit: Big Data
Intel Cloud Summit: Big Data
 
MFW12: Dirk deRoos (IBM)
MFW12: Dirk deRoos (IBM)MFW12: Dirk deRoos (IBM)
MFW12: Dirk deRoos (IBM)
 
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, Opportunities
 
Convergence SOA & BI Presentation June 2010
Convergence SOA & BI Presentation June 2010Convergence SOA & BI Presentation June 2010
Convergence SOA & BI Presentation June 2010
 
STI Summit 2011 - Global data integration and global data mining
STI Summit 2011 - Global data integration and global data miningSTI Summit 2011 - Global data integration and global data mining
STI Summit 2011 - Global data integration and global data mining
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloud
 
Dave Kellogg at MarkLogic 2010 Government Summit
Dave Kellogg at MarkLogic 2010 Government SummitDave Kellogg at MarkLogic 2010 Government Summit
Dave Kellogg at MarkLogic 2010 Government Summit
 
DataCyte - The Future of Data Storage & Retrieval
DataCyte - The Future of Data Storage & RetrievalDataCyte - The Future of Data Storage & Retrieval
DataCyte - The Future of Data Storage & Retrieval
 

More from Semantic Technology Institute International

More from Semantic Technology Institute International (20)

Summit2013 sw in russian universities
Summit2013   sw in russian universitiesSummit2013   sw in russian universities
Summit2013 sw in russian universities
 
Summit2013 semantic web in russia
Summit2013   semantic web in russiaSummit2013   semantic web in russia
Summit2013 semantic web in russia
 
Summit2013 john domingue - introduction
Summit2013   john domingue - introductionSummit2013   john domingue - introduction
Summit2013 john domingue - introduction
 
Summit2013 john domingue - horizon2020
Summit2013   john domingue - horizon2020Summit2013   john domingue - horizon2020
Summit2013 john domingue - horizon2020
 
Summit2013 ho-jin choi - summit2013
Summit2013   ho-jin choi - summit2013Summit2013   ho-jin choi - summit2013
Summit2013 ho-jin choi - summit2013
 
Summit2013 georg gottlob and tim furche - diadem
Summit2013   georg gottlob and tim furche - diademSummit2013   georg gottlob and tim furche - diadem
Summit2013 georg gottlob and tim furche - diadem
 
Summit2013 eventos onto quad
Summit2013   eventos onto quadSummit2013   eventos onto quad
Summit2013 eventos onto quad
 
Summit2013 choi - wise kb-introd
Summit2013   choi - wise kb-introdSummit2013   choi - wise kb-introd
Summit2013 choi - wise kb-introd
 
Summit2013 choi - kaist-cs-intro
Summit2013   choi - kaist-cs-introSummit2013   choi - kaist-cs-intro
Summit2013 choi - kaist-cs-intro
 
STI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic webSTI Summit 2011 - Dynamic web
STI Summit 2011 - Dynamic web
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
STI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streamsSTI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streams
 
STI Summit 2011 - Linked services
STI Summit 2011 - Linked servicesSTI Summit 2011 - Linked services
STI Summit 2011 - Linked services
 
STI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic TechnologiesSTI Summit 2011 - A personal look at the future of Semantic Technologies
STI Summit 2011 - A personal look at the future of Semantic Technologies
 
STI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked dataSTI Summit 2011 - Visual analytics and linked data
STI Summit 2011 - Visual analytics and linked data
 
STI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS KhaosSTI Summit 2011 - LS4 LS Khaos
STI Summit 2011 - LS4 LS Khaos
 
STI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data workSTI Summit 2011 - Making linked data work
STI Summit 2011 - Making linked data work
 
STI Summit 2011 - Beyond privacy
STI Summit 2011 - Beyond privacySTI Summit 2011 - Beyond privacy
STI Summit 2011 - Beyond privacy
 
STI Summit 2011 - Social semantics
STI Summit 2011 - Social semanticsSTI Summit 2011 - Social semantics
STI Summit 2011 - Social semantics
 
STI Summit 2011 - Monetizing the Semantic Web
STI Summit 2011 - Monetizing the Semantic WebSTI Summit 2011 - Monetizing the Semantic Web
STI Summit 2011 - Monetizing the Semantic Web
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

STI Summit 2011 - di@scale

  • 1. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Michael L. Brodie STI Fellow Michaelbrodie.com ! Copyright 2011 STI - INTERNATIONAL www.sti2.org The following is for Fortune 100 organizations 1
  • 2. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Based on 2010 industry data verified by 4 analyst organizations 10+ Fortune 100 enterprises Did you know… Information Systems? 2
  • 3. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Information Systems in a Fortune 100 company ~10 thousand 100s per business area (e.g., billing) 30% mission critical 30% modified significantly annually Did you know… Databases? 3
  • 4. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Logical databases in a Fortune 100 … ~10,000 100s per business area (e.g., billing) ~90% relational 100s new / year Database size … 20% large 1-10 TB 40% medium 10 GB-1 TB 40% small <10GB 4
  • 5. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Database growth… 2X every 18 mo OLTP / transactional 2X every 12 mo OLAP / decision support Logical database complexity … Tables/database: 100-200 Attributes/table: 50-200 Stored procedures/database: ~# tables Web services/database: < # tables but !! Views: ~3X # tables 5
  • 6. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Did you know… Database Design? Database design level … ~100% table (SQL) ~0% Entity Relationship 6
  • 7. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Did you know… Relational Database Integration? Data integration accounts for … ~40% software project costs 7
  • 8. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Circa 1985 Tables: 2+ Dominant operation: join Design: schema-based Circa 2010 Tables: 10s-1000s Dominant operations !  Extract, Translate, Load (ETL) !  File transfer !  Then … join Design !  Evidence gathering (often manual) !  60-75% schema-less 8
  • 9. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Did you know… Information Ecosystems? Information systems that interact to support domain-specific end-to-end processes, e.g., supply chain, billing, marketing, sales, engineering, operations, finance, … 9
  • 10. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Information Ecosystems involve … 100s-1,000s Information Systems 100s-1,000s Databases Conclusion Information systems & databases live in complex, deeply inter-connected worlds 10
  • 11. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Did you know… Database Integration @ Scale? Database integration in an Information Ecosystem … 100s-1,000s source databases 1,000s-millions function-specific views 11
  • 12. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Every organization with a database depends critically on … Data + Data Integration 12
  • 13. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Boring huh? Then how about … 13
  • 14. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Deeply sophisticated analysis [data integration] across … 28 intelligence databases connected across 17 US intelligence agencies over 5 years totally missed … 14
  • 15. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Data Integration @ Scale 15
  • 16. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com What is Data Integration? Combine 2+ data items to form a meaningful result Source1 … Mapping Target Sourcen 1980’s Data Integration Account Charge Name Address Acct# … Acct# Charge Date … Bill 43 Main 123… 123… $2.32 2/1/10 Sally 12 Peel 489… 149 $1.17 1/7/10 TelcoSum (Charge) where Account.Acct# = Charge.Acct# DI computation Entity Mapping Monthly Bill Name Address Acct# Amount Bill 43 Main 123… $2.32 Sally 12 Peel 489… $0.00 16
  • 17. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com What does DI mean? "  Determined by application “semantics” "  Financial Close for a Large Enterprise •  What: Combine data from 100s-1,000s of systems •  Meaning: Defined by financial accounting practices, jurisdictional regulations, industry standards, … •  Speed: 1-2 days after end of period •  Solutions: e.g., SAP Financial, PeopleSoft Financial, Hyperion •  Governance: laws and regulations •  Correctness: o  US= Generally Accepted Accounting Principles (US GAAP) o  GB = Generally Accepted Accounting Practice (UK GAAP) o  Etc. "  Data Integration = •  Entity mapping •  Data integration computation Definitions (cont.) "  Semantically homogeneous relational data descriptions •  Verified by an authoritative expert •  To describe the same entity (considering set inclusion) •  For which there is one and only one simple, complete entity mapping between the relational schemas for the data descriptions •  The entity mapping can be defined in relational expressions "  Semantically heterogeneous relational data descriptions •  Verified by an authoritative expert •  To describe the same entity (considering set inclusion) •  For which there is not a complete entity mapping between the relational schemas for the data descriptions •  Entity mappings cannot be defined in relational expressions 17
  • 18. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Deeper Thoughts "  Schemas: sufficient but not necessary "  Simple map: 1:N, … atomicity "  Degree of semantic heterogeneity Semantic Semantic homogeneity heterogeneity Complete No mapping mapping Consequences "  Semantically homogeneous relational data descriptions •  Data independence + updatable => consistent views •  Provide basis for o  Correct o  Efficient •  Schema level design and integration "  Semantically heterogeneous relational data descriptions •  No guarantee of: Data independence + updatable views •  Provide no basis for o  Correctness o  Efficiency •  Schema level design and integration not always possible "  Data modelling and integration best practices •  Semantic homogeneity •  Radical simplification •  Standardization: ontologies, schemas, application solutions, … 18
  • 19. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Semantic Heterogeneity … "  Is unavoidable @ scale •  Autonomous design •  Ontologically heterogeneous, e.g., Enterprise Customer o  Legal entity-based ontology o  Location-based ontology "  Dominates semantic homogeneity Information Ecosystem Integration (3D) Telco Service Provider VZW Worldcom MCI BA GTE Business Domains Ordering Marketing Finance Billing Repair Sales Enterprise Customer Wells Fargo Wells Fargo Wachovia Bank of America Merril Lynch Bank of America 19
  • 20. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Correctness & Efficiency "  Legally Binding Requirements •  Law governing communications, financial transactions, privacy, … o  E.g.,: Customer Proprietary Network Information (CPNI) •  Regulation: rules defining operational details o  E.g.,: FCC CPNI rules –customer views require manual verification •  Customer agreements: SLAs for all services •  Telco agreements: SLAs for all production systems "  Mission Critical: Operational & Information Correctness "  Complex Views •  Customer Defined Views •  Inconsistent & Conflicting Views o  Business perspectives, manually defined, no ultimate truth or agreement o  Conflicting telephone #: Long-distance vs. local Data Modelling: Enterprise Customer "  Enterprise Customer data description •  Enterprise customer identifier (ECID) •  Enterprise customer data Level of abstraction o  Telco information, e.g., billing, must be mapped (integrated into) Enterprise Customer organizational units "  Enterprise Customer Ontologies = Conceptualizations •  Legal entity •  Locations •  Telco contracts Perspective •  ~40 others 20
  • 21. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Enterprise A Legal entity structure (partial) •  ~400 legal entities •  14 levels deep •  20,000 accounts (~50/entity) Industry Standard Legal-entity Ontology Industry Standard Ontology Radically Simplified Schema Legal-entity based ECID instance (DUNS code) 21
  • 22. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Location-based Ontology Industry Standard Ontology Location-based ECID instance Radically Simplified Schema Contract-based Ontology Industry Standard Ontology Contract-based ECID Instance Radically Simplified Schema 22
  • 23. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Entity maps for semantically heterogeneous Enterprise Customer Entities are schema-less Location-based ECID schema Legal entity-based ECID schema Contract-based ECID schema Enterprise Customer: 3 Departments in 2 locations les Sa l- C ga Le l- L ga Le Ma rke tin Ma g rke tin g Le ga l- L 23
  • 24. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Location-Based Enterprise Customer IDs: 2 locations Two Entities Legal entity-based Enterprise Customer IDs: 3 organizations with 1, 2 and 3 departments les Sa l- CMa ga rke Le tin g Le ga l- L Four Entities 24
  • 25. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Multiple Versions of Truth 1st Amendment Civil Liberties Bigamy Religious Freedom LDS Law repealed 1990 Fundamental Against LDS Law 1838 Women’s Statutory rights rape Women’s rights 46% 54% Govt out of bedroom US Fund. State US Gay Anti- US LDS US Federal LDS ACLU Feminists Feminists of Federal Rights Citizens Church Citizens Govt Church Utah Govt Married Age: 48 Male Female Age: 13 Nov 2002 www.sti2.org 5/12/2007 - Vienna Data Integration Observations "  Each view is unique •  Entity map •  Data integration computation "  Views often conflict or mutually inconsistent "  Data integration for semantically heterogeneous relations •  Schema-less •  By evidence gathering •  At instance level "  Views are •  Dominant means of accessing data •  Lenses into Information Ecosystems 25
  • 26. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Codd’s Rules Extended "  0: System must be relational, a "  7: High-level insert, update, and database, a management system delete "  1: Information rule (tabular) "  8: Physical data independence "  2: Guaranteed access rule (keys) "  9: Logical data independence "  3: Systematic treatment of nulls "  10: Integrity independence "  4: Active online catalog based on "  11: Distribution independence the relational model extra "  12: Nonsubversion rule "  5: Comprehensive data sublanguage rule "  13: Relational power rule: The full power of relational technology "  6: View updating rule applies to semantically homogeneous relational databases. For Semantically Heterogeneous Databases "  Cannot be guaranteed •  Relational model •  Codd’s rules •  Relational properties o  Updateable views •  Relational assumptions o  Global schema o  Single version of truth "  No basis for correctness or efficiency "  Unfortunately, like the real world 26
  • 27. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Shadow Approach "  Database Approach: power and limits •  Data structures •  Schemas •  Data models •  Logic "  Shadow Approach •  Meta-tagging o  Equivalence (E-tags) for entity mapping after resolution o  What (W-tags) for “meaning” o  Projection (P-tags) for ETL over logical data structures (Traditional) Data Integration Billing Ordering Repair Sales Marketing Service view view view view view view Evidence 1 Evidence 3 Evidence 4 Evidence 5 Evidence N Evidence 2 Business In SQL, Java, C/C+ Ontology Rules +… Total Semantic Heterogeneity Schema Single Version of Truth ?? Enterprise portal Ontology 1 Ontology 2 Ontology 3 Ontology 4 Schema Metadata Schema Schema DB DB DB data Single Version of Truth Single Version of Truth Single Version of Truth Single Version of Truth Single Version of Truth Single Version of Truth 27
  • 28. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Shadow Theory-based Data Integration Billing Ordering Repair Sales Marketing Service view view view view view view Evidence 1 Evidence 3 Evidence 4 Evidence 5 Evidence N Evidence 2 Business Need a new language to implement rules by Ontologies Rules tags Explicitly represented w/ Multiple Version of Truth. Schema Enterprise portal Ontology 1 Ontology 2 Ontology 3 Ontology 4 Schema Metadata Schema Schema DB DB DB data Shadows Shadows Shadows Shadows Shadows Shadows Shadows Additional slides 28
  • 29. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Data Integration Pragmatics "  Codd’s Rules violated "  Views conflict •  Un-natural fit •  For good reason! •  Primary keys o  Sales vs. marketing vs. repair o  Problems: technical, managerial •  Maintain conflicts or lose value o  Autonomous design o  Semantic heterogeneity "  Best practices rare •  Few views updatable •  Semantic homogeneity •  Radical simplification "  Schemas largely unavailable •  Standardization •  10% not RDBMSs •  30% legal "  Data Integration •  60% autonomy •  Table-level (Not ER-level) •  Instance-based "  Federation often prohibited •  Seldom schema-based •  Legal: privacy, CPNI, … •  Manual •  Evidence gathering "  Single Source of Truth •  Makes sense in specific cases •  Usually not! Current Data Integration Technologies "  Relational DBMS "  Limitations •  Schema-based "  Extract, Transform, & Load (ETL) •  Federation issues "  Federation aka Enterprise •  Limited (no!) support for information integration (EII) o  Schema-less integration o  Instance-level integration "  DI Platforms: 50+, e.g., o  Evidence gathering "  IBM InfoSphere Information Server o  Conflicting views "  Informatica Platform "  Microsoft SQL Server Integration o  Semantic heterogeneity Services •  50+ ad hoc tools "  Oracle Data Integrator, … •  Expensive to migrate "  SAP Data Integrator/Federator Information Ecosystem onto data integration platform "  Instance-level integration •  XML-based integration •  Custom tools: Verizon •  Dun & Bradstreet ECID maps 29
  • 30. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Emerging Data Integration Technologies "  DBMS Architecture "  Data Discovery •  In-Database Analytics "  Data Quality Services s ) s •  Database Appliances tic Instance-level integration tic "  an Distributed DBMS Architecture an "  •  XML-based integration Change Data Capture (CDC) m •  em Data Access •  Custom tools se •  •  Dun & Bradstreet ECID maps s •  Data Replication o l; •  Distributed Data Caching Schema-less Tools (n "  ua "  Service Orientation •  Evidence gathering: Web, … ce an •  Data Services Platforms •  Entity / Identity Resolution an m •  Information-As-A-Service •  Social Networking rm – •  Enterprise Service Buses (ESBs) o  Collaboration g o  Web 2.0 rfo "  Master Data Management (MDM) in o  Social Search lv Pe •  Data & Process Mining So •  Linguistics / Ontologies / Taxonomies – o  Derivation & Analysis m n •  Semantic Technologies tio le •  Content Analytics ob ta "  Shadow Theory pu Pr •  Plato’s Cave m •  Monadic 2nd Order Logic Co Heterogeneous meanings of a telephone number North America Numbering Plan (NANP) telephone number (NPA) NXX-XXXX NPA - Number Plan Area NXX - Central office code XXXX – telephone line # Telephone attribute in customer Domestic long distance services, charges database. determined by NPA. Telephone entity = a physical phone. ? International long distance services, charges determined by country code. Owner identified uniquely with additional attribute NPA+NXX+XXX+CustomerID Ported to cell or internet phone Routed to / from another phone number (e.g. 1 800 toll free) 30
  • 31. Data Integration @ Scale 7/8/11 Michael L. Brodie 2011 STI Semantic Summit Michaelbrodie.com Limitations of Current Solutions "  Data Model Principles Don’t Apply "  Tools Don’t Meet Requirements •  Unique IDs (keys) are limiting •  No single model of data •  Modelling Choices: entities, •  Inconsistent / conflicting views properties, or relationships "  Meta-data and Multi-database Scale "  Schemas Are Seldom Available •  Information ecosystem (vs. database) •  10% not RDBMSs management •  Millions of views •  30% legal •  10,000s of cross database dependencies •  60% owners maintain autonomy o  Biz: impact operations "  Migrating Information Ecosystems o  Tech: impact schedule & cost to Vendor Solutions Is Prohibitive "  Schema-based Integration Challenges "  Pragmatics •  Inconsistent data models •  Bootstrapping in a legacy environment •  View Conflicts •  Cascaded Views "  Conceptual Modelling Support •  Different models required for different "  Federation Seldom Allowed (CPNI) workers •  No round-trip engineering "  No Single Source of Truth 31