The Rationale for
Semantic Technologies




      Michael K. Bergman

          July 2012
Outline
§   Nature of the World
§   Knowledge Representation, Not Transactions
§   The New Open World Paradigm
§   Integrating All Forms of Information
§   Connections Create Graphs
§   Network Analysis is the New Algebra
§   Information and Interaction is Distributed
§   The Web is the Perfect Medium
§   Leveraging – Not Replacing – Existing IT Assets
§   Democratizing the Knowledge Function
§   Seven Pillars of the Semantic Enterprise
§   Summary of Semantic Technology Benefits


                                                      2
Some Caveats
 Semantic technologies are NOT:
      Cloud computing
      Big data
      Necessarily open data
      “One ring to rule them all”
      A replacement for current IT systems
 These ideas are mostly orthogonal to semantics




                                                   3
Nature of the World
 Messy
 Complicated
 Interconnected
 Changing
 Interdependent
 Uncertain
 Diverse




                      4
Nature of Knowledge
 Knowledge is never complete
 Knowledge is found in structured, semi-structured
  and unstructured forms
 Knowledge can be found anywhere
 Knowledge structure evolves with the incorporation of
  more information
 Knowledge is contextual
 Knowledge should be coherent
 Knowledge is about its users defining its structure
  and use
           Knowledge ≡ Nature of the World


                                                          5
Knowledge Representation, Not Transactions
 KR functions:
      Search
      Business intelligence
      Competitive intelligence
      Planning
      Data federation
      Data warehousing
      Knowledge management
      Enterprise information integration
      Master data management
 Traditional IT has been transaction-oriented
    e.g., “Seats on a plane”



                                                 6
Current Approaches Have Failed
   Relational databases:
     Structured data only
     Inflexible, fragile
     Constant re-architecture
   Business intelligence:
     Slow, inflexible
     Structured data only
     IT-constrained, not user-driven
   Extract, Transfer, Load (ETL):
     Structured data only
     Inflexible, fragile
 High $$$, incomplete, not adaptable


                                        7
A 30-yr Quest to Integrate Content

    Content and data federation has been insolvable for
     30 years since IT systems first adopted:
        Structured + semi-structured + unstructured content
        Data “silos” and unconnected systems
        Incompatible protocols and hardware
        85% of content not in databases
        Semantic heterogeneities
        No universal data model




                                                               8
The New Open World Paradigm
 Opposite logic of closed-world transactions
 The open world assumption (OWA) means:
    Lack of a given assertion does not imply whether it is true or
     false: it simply is not known
    A lack of knowledge does not imply falsity
    Everything is permitted until it is prohibited
    Schema can be incremental without re-architecting prior
     schema (“extensible”)
    Information at various levels of incompleteness can be
     combined
 The right logic for KR problems




                                                                      9
Integrating All Forms of Information
 Uses a “canonical” data model (RDF)
 RDF is a universal solvent for all information:
    Unstructured data – text, images
    Semi-structured data – markup, metadata
    Structured data – databases, tables
 “Soft” (social, opinion) + “hard” (facts) information
 RDF can represent simple assertions (“Jane runs fast”)
  to complex vocabularies and languages
 Generic tools can be driven by the RDF data model




                                                           10
Integrated Data and Tools using RDF




                                      11
Connections Create Graphs
 Things and concepts create nodes
 Relationships between things create connections
  (“edges”)
 Adding things leads to more connections
 More connections leads to more structure
 Coherent structure leads to more knowledge and
  understanding
 The natural structure of
  knowledge domains is a
  graph




                                                    12
Graphs Grow Naturally with Knowledge




                                       13
Benefits of Graphs (ontologies)
 Coherent navigation
 Flexible entry points
 Inferencing
 Reasoning
 Connections to related information
 Ability to represent any form of information
 Concept matching  integrate external content
 A framework for disambiguation
 A common vocabulary to drive content “tagging”




                                                   14
Network Analysis is the New Algebra
 Network analysis provides new tools for gauging:
      Influence
      Relatedness
      Proximity
      Centrality
      Inference
      Shortest paths
      Diffusion
 Graphs can represent any structure
 Many structures can only be represented by graphs




                                                      15
Information and Interaction is Distributed
 Knowledge is everywhere
 People and stakeholders are everywhere
 External information needs to be integrated with
  internal information
 A uniform access protocol/framework is desirable to:
    Preserve existing information assets
    Reflect the diversity of data formats




                                                         16
The Web is the Perfect Medium
 All information may be accessed via the Web
 All information may be given Web identifiers (URIs)
 All Web tools are available for use and integration
 All Web information may be integrated
 Web-oriented architectures (WOA) have proven:
    Scalability
    Robustness
    Substitutability
 Most Web technologies are open source




                                                        17
A Distributed Web-oriented Architecture




                                          18
Leveraging – Not Replacing – Existing IT Assets
 Existing IT assets represent:
      Massive sunk costs
      Legacy knowledge and expertise
      Stakeholder consensus
      Yet, still stovepiped
 Semantic technologies are an interoperability layer
  over existing IT assets
 Preserve prior investments while enabling
  interoperability




                                                        19
Democratizing the Knowledge Function
 Move from bespoke software to knowledge graphs
 Knowledge graphs can be constructed and modified
  by:
      Subject matter experts
      Employees
      Partners
      Stakeholders
      General public
 Graph-driven applications can be made generic by
  function, visualization
 Graph-driven applications democratize KR



                                                     20
Seven Pillars of the Semantic Enterprise




                                           21
Summary of Semantic Technology Benefits
 Can deploy incrementally
    lower risks
    lower costs
 Excellent integration approach
 No need to re-do schema because of changed
  circumstances
 Leverages existing information assets
 Well-suited for knowledge applications
 Can accommodate multiple viewpoints, stakeholders
 Leadership visibility to the Forum



                                                      22
The Rationale for Semantic Technologies

The Rationale for Semantic Technologies

  • 1.
    The Rationale for SemanticTechnologies Michael K. Bergman July 2012
  • 2.
    Outline § Nature of the World § Knowledge Representation, Not Transactions § The New Open World Paradigm § Integrating All Forms of Information § Connections Create Graphs § Network Analysis is the New Algebra § Information and Interaction is Distributed § The Web is the Perfect Medium § Leveraging – Not Replacing – Existing IT Assets § Democratizing the Knowledge Function § Seven Pillars of the Semantic Enterprise § Summary of Semantic Technology Benefits 2
  • 3.
    Some Caveats  Semantictechnologies are NOT:  Cloud computing  Big data  Necessarily open data  “One ring to rule them all”  A replacement for current IT systems  These ideas are mostly orthogonal to semantics 3
  • 4.
    Nature of theWorld  Messy  Complicated  Interconnected  Changing  Interdependent  Uncertain  Diverse 4
  • 5.
    Nature of Knowledge Knowledge is never complete  Knowledge is found in structured, semi-structured and unstructured forms  Knowledge can be found anywhere  Knowledge structure evolves with the incorporation of more information  Knowledge is contextual  Knowledge should be coherent  Knowledge is about its users defining its structure and use Knowledge ≡ Nature of the World 5
  • 6.
    Knowledge Representation, NotTransactions  KR functions:  Search  Business intelligence  Competitive intelligence  Planning  Data federation  Data warehousing  Knowledge management  Enterprise information integration  Master data management  Traditional IT has been transaction-oriented  e.g., “Seats on a plane” 6
  • 7.
    Current Approaches HaveFailed  Relational databases:  Structured data only  Inflexible, fragile  Constant re-architecture  Business intelligence:  Slow, inflexible  Structured data only  IT-constrained, not user-driven  Extract, Transfer, Load (ETL):  Structured data only  Inflexible, fragile  High $$$, incomplete, not adaptable 7
  • 8.
    A 30-yr Questto Integrate Content  Content and data federation has been insolvable for 30 years since IT systems first adopted:  Structured + semi-structured + unstructured content  Data “silos” and unconnected systems  Incompatible protocols and hardware  85% of content not in databases  Semantic heterogeneities  No universal data model 8
  • 9.
    The New OpenWorld Paradigm  Opposite logic of closed-world transactions  The open world assumption (OWA) means:  Lack of a given assertion does not imply whether it is true or false: it simply is not known  A lack of knowledge does not imply falsity  Everything is permitted until it is prohibited  Schema can be incremental without re-architecting prior schema (“extensible”)  Information at various levels of incompleteness can be combined  The right logic for KR problems 9
  • 10.
    Integrating All Formsof Information  Uses a “canonical” data model (RDF)  RDF is a universal solvent for all information:  Unstructured data – text, images  Semi-structured data – markup, metadata  Structured data – databases, tables  “Soft” (social, opinion) + “hard” (facts) information  RDF can represent simple assertions (“Jane runs fast”) to complex vocabularies and languages  Generic tools can be driven by the RDF data model 10
  • 11.
    Integrated Data andTools using RDF 11
  • 12.
    Connections Create Graphs Things and concepts create nodes  Relationships between things create connections (“edges”)  Adding things leads to more connections  More connections leads to more structure  Coherent structure leads to more knowledge and understanding  The natural structure of knowledge domains is a graph 12
  • 13.
    Graphs Grow Naturallywith Knowledge 13
  • 14.
    Benefits of Graphs(ontologies)  Coherent navigation  Flexible entry points  Inferencing  Reasoning  Connections to related information  Ability to represent any form of information  Concept matching  integrate external content  A framework for disambiguation  A common vocabulary to drive content “tagging” 14
  • 15.
    Network Analysis isthe New Algebra  Network analysis provides new tools for gauging:  Influence  Relatedness  Proximity  Centrality  Inference  Shortest paths  Diffusion  Graphs can represent any structure  Many structures can only be represented by graphs 15
  • 16.
    Information and Interactionis Distributed  Knowledge is everywhere  People and stakeholders are everywhere  External information needs to be integrated with internal information  A uniform access protocol/framework is desirable to:  Preserve existing information assets  Reflect the diversity of data formats 16
  • 17.
    The Web isthe Perfect Medium  All information may be accessed via the Web  All information may be given Web identifiers (URIs)  All Web tools are available for use and integration  All Web information may be integrated  Web-oriented architectures (WOA) have proven:  Scalability  Robustness  Substitutability  Most Web technologies are open source 17
  • 18.
  • 19.
    Leveraging – NotReplacing – Existing IT Assets  Existing IT assets represent:  Massive sunk costs  Legacy knowledge and expertise  Stakeholder consensus  Yet, still stovepiped  Semantic technologies are an interoperability layer over existing IT assets  Preserve prior investments while enabling interoperability 19
  • 20.
    Democratizing the KnowledgeFunction  Move from bespoke software to knowledge graphs  Knowledge graphs can be constructed and modified by:  Subject matter experts  Employees  Partners  Stakeholders  General public  Graph-driven applications can be made generic by function, visualization  Graph-driven applications democratize KR 20
  • 21.
    Seven Pillars ofthe Semantic Enterprise 21
  • 22.
    Summary of SemanticTechnology Benefits  Can deploy incrementally  lower risks  lower costs  Excellent integration approach  No need to re-do schema because of changed circumstances  Leverages existing information assets  Well-suited for knowledge applications  Can accommodate multiple viewpoints, stakeholders  Leadership visibility to the Forum 22