Ontology2 Platform
Paul Houle, Founder Ontology2
Bill Freeman, President KMSolutions
(774) 301-1301
O2
kms
OUR PLATFORM
For organizations handling complex, heterogeneous, and big data from
a large number of sources, structured, unstructured and
semistructured.
We rapidly (in terms of computer time and configuration time)
combine, curate, and index your data, both in batch and in real-time.
Based on our experience with Freebase (the basis for the Google
Knowledge Graph), we combine Hadoop technology with SQL and
NoSQL databases on a next generation cloud technology;
Focus: quality, usability, cross-domain integration and inference,
standards-driven interoperability, open-source components
Current State as we understand it
Technical: Need for extreme agility
• High-quality, curated data is important
• Limited by MySQL speed/scalability (and slow schema changes because of row store)
• Difficulty of handling taxonomy/ontology/schema changes
• Dealing with data loss and broken inter-concept links caused by changes
• Difficulty of linking entity between silos; inability to infer accurate, high quality relationships
between collections
• Need for clean, normalized data for input to machine learning algorithms
• Need ability to manage spatial and temporal data
• To keep up with competition: It must be easy has to make changes, fast to implement changes
• Need for data typing beyond SQL (currency, length, time interval, etc.) to support inference and
user interfaces
• Infrastructure built ad-hoc is difficult to document, maintain, expand
Business Challenges
• To be discussed
Benefits from cloud-native Infovore™ platform
Index construction does not interfere
with user-facing real-time services
Development, Test and Staging do
not interfere with production
Batch Jobs Don’t Interfere with
Interactive Services
Next Generation Cloud
• Near Bare Metal Performance
Hardware
Virtualization
• Incredible Speed
• Predictable Response Time
SSD Drives
• Take advantage of competition between cloud
provider
• Use existing on premise capacity; control physical
security, flexible options
Hybrid cloud
Files
Databases
Hadoop Mappers
Hadoop Reducers
Hadoop Powered Index Construction
We deliver the exact data
required by your index
builders, partitioned, sorted
and filtered for maximum
efficiency.
Index Construction in Hybrid Cloud
New Index Construction Never Conflicts With Production
time
Old index (multiple copies for throughput & availability)
Source
data
Test
Clone
New Index
Terminate and
recover
resources
Batch Index plus Real-Time Index
Effortless and efficient scalability
Message
Queue
Bulk Data time stamped
master data
small real-time index
large bulk
index
merger
RESULTS
New approach to data management
A FRAMEWORK FOR DATA QUALITY
Multiple sources of instance data
Facts
classifications
Reference data…
Examples
Test Data
Training Data
Requirements
Quality metrics
WE DELIVER FAST CYCLE TIME
HYBRID CLOUD: No waiting for hardware
PARALLEL DATA PROCESSING: Handle large data sets quickly
DEVOPS AUTOMATION: Little system administration overhead
EFFICIENT DATA REPRESENTATION: Rapid turnaround, low hardware cost
COMPETITIVE
ADVANTAGE
MINIMIZE WASTED CYCLES
automation eliminates errors
MINIMIZE TIME AROUND CYCLE
Ontology2 Spatial Hierarchy
Freebase data enriched for Language+Contextual Performance
Global coverage
30+ languages
250 countries
36,000 regions
1.5M names
400,000 cites & towns
8M names
Large alternative name bank + hierarchical constraint =
• Resolution of jurisdictions in international business listings
• Resolution of place names in free text
Extensive Graph-Based Schema
META-MODEL SYSTEMATICAL DESCRIBES PROCESSES AND THINGS
RDFS
types + properties
XML SCHEMA
Data types
EXTENDED
Data types
DECLARATIVE MAPPINGS
CSV RDBMS XML …
DECLARATIVE
HINTS
formatting
editing
…
LINGUISTIC +
CONTEXTUAL
Knowledge
Representation
SOLVES ISSUES, SEE
SLIDE 3 !
Compiled
representation
databases
COMMON TEXT
FORMATS
CSV, XML, JSON, RDF
FAST BINARY
FORMATS
THRIFT, AVRO
PROTOCOL BUFFERS
RAW DATA
Event-driven real-time pipeline
applications
MERGED
PRODUCTION
INDEX
batch pipeline
MODEL-DRIVEN ARCHITECTURE
HANDLING CONTENT AND DATA WITH CONTEXTUAL UNDERSTANDING
SUMMARY
For organizations handling complex, heterogeneous, and big data from
a large number of sources, structured, unstructured and
semistructured.
We rapidly (in terms of computer time and configuration time)
combine, curate, and index your data, both in batch and in real-time.
Based on our experience with Freebase (the basis for the Google
Knowledge Graph), we combine Hadoop technology with SQL and
NoSQL databases on a next generation cloud technology;
Focus: quality, usability, cross-domain integration and inference,
standards-driven interoperability, open-source components
Bill Freeman, President KMSolutions
william.freeman3@outlook.com (774) 301-1301

Ontology2 platform

  • 1.
    Ontology2 Platform Paul Houle,Founder Ontology2 Bill Freeman, President KMSolutions (774) 301-1301 O2 kms
  • 2.
    OUR PLATFORM For organizationshandling complex, heterogeneous, and big data from a large number of sources, structured, unstructured and semistructured. We rapidly (in terms of computer time and configuration time) combine, curate, and index your data, both in batch and in real-time. Based on our experience with Freebase (the basis for the Google Knowledge Graph), we combine Hadoop technology with SQL and NoSQL databases on a next generation cloud technology; Focus: quality, usability, cross-domain integration and inference, standards-driven interoperability, open-source components
  • 3.
    Current State aswe understand it Technical: Need for extreme agility • High-quality, curated data is important • Limited by MySQL speed/scalability (and slow schema changes because of row store) • Difficulty of handling taxonomy/ontology/schema changes • Dealing with data loss and broken inter-concept links caused by changes • Difficulty of linking entity between silos; inability to infer accurate, high quality relationships between collections • Need for clean, normalized data for input to machine learning algorithms • Need ability to manage spatial and temporal data • To keep up with competition: It must be easy has to make changes, fast to implement changes • Need for data typing beyond SQL (currency, length, time interval, etc.) to support inference and user interfaces • Infrastructure built ad-hoc is difficult to document, maintain, expand Business Challenges • To be discussed
  • 4.
    Benefits from cloud-nativeInfovore™ platform Index construction does not interfere with user-facing real-time services Development, Test and Staging do not interfere with production Batch Jobs Don’t Interfere with Interactive Services
  • 5.
    Next Generation Cloud •Near Bare Metal Performance Hardware Virtualization • Incredible Speed • Predictable Response Time SSD Drives • Take advantage of competition between cloud provider • Use existing on premise capacity; control physical security, flexible options Hybrid cloud
  • 6.
    Files Databases Hadoop Mappers Hadoop Reducers HadoopPowered Index Construction We deliver the exact data required by your index builders, partitioned, sorted and filtered for maximum efficiency.
  • 7.
    Index Construction inHybrid Cloud New Index Construction Never Conflicts With Production time Old index (multiple copies for throughput & availability) Source data Test Clone New Index Terminate and recover resources
  • 8.
    Batch Index plusReal-Time Index Effortless and efficient scalability Message Queue Bulk Data time stamped master data small real-time index large bulk index merger RESULTS
  • 9.
    New approach todata management A FRAMEWORK FOR DATA QUALITY Multiple sources of instance data Facts classifications Reference data… Examples Test Data Training Data Requirements Quality metrics
  • 10.
    WE DELIVER FASTCYCLE TIME HYBRID CLOUD: No waiting for hardware PARALLEL DATA PROCESSING: Handle large data sets quickly DEVOPS AUTOMATION: Little system administration overhead EFFICIENT DATA REPRESENTATION: Rapid turnaround, low hardware cost COMPETITIVE ADVANTAGE MINIMIZE WASTED CYCLES automation eliminates errors MINIMIZE TIME AROUND CYCLE
  • 11.
    Ontology2 Spatial Hierarchy Freebasedata enriched for Language+Contextual Performance Global coverage 30+ languages 250 countries 36,000 regions 1.5M names 400,000 cites & towns 8M names Large alternative name bank + hierarchical constraint = • Resolution of jurisdictions in international business listings • Resolution of place names in free text
  • 12.
    Extensive Graph-Based Schema META-MODELSYSTEMATICAL DESCRIBES PROCESSES AND THINGS RDFS types + properties XML SCHEMA Data types EXTENDED Data types DECLARATIVE MAPPINGS CSV RDBMS XML … DECLARATIVE HINTS formatting editing … LINGUISTIC + CONTEXTUAL Knowledge Representation SOLVES ISSUES, SEE SLIDE 3 !
  • 13.
    Compiled representation databases COMMON TEXT FORMATS CSV, XML,JSON, RDF FAST BINARY FORMATS THRIFT, AVRO PROTOCOL BUFFERS
  • 14.
    RAW DATA Event-driven real-timepipeline applications MERGED PRODUCTION INDEX batch pipeline MODEL-DRIVEN ARCHITECTURE HANDLING CONTENT AND DATA WITH CONTEXTUAL UNDERSTANDING
  • 15.
    SUMMARY For organizations handlingcomplex, heterogeneous, and big data from a large number of sources, structured, unstructured and semistructured. We rapidly (in terms of computer time and configuration time) combine, curate, and index your data, both in batch and in real-time. Based on our experience with Freebase (the basis for the Google Knowledge Graph), we combine Hadoop technology with SQL and NoSQL databases on a next generation cloud technology; Focus: quality, usability, cross-domain integration and inference, standards-driven interoperability, open-source components Bill Freeman, President KMSolutions william.freeman3@outlook.com (774) 301-1301