SlideShare a Scribd company logo
1 of 22
Exploration of large and complex data
estates to gain an accurate understanding
of the data structures and data quality
Zen, and the art of Datanauting
Carl Bray
Product Manager, Ontology Systems
Matt Clark
Design Authority, BSkyB
Datanauting
Boldly going where no data integrator has gone before…
2
3
15 years of transaction data
10 million+ customers
900 engineers making changes
30 TB of data
20+ Applications
Q) How do you start to understand this data estate?
The company
• UK subsidiary of a global media organisation
• Provides fixed line telephone, Internet and television entertainment services to UK residents
• 10 million+ customers, trading for 15 years
Business drivers:
• Driven by marketing innovation
• Extend and upsell to customer base
• React to competitive threats
• Technical infrastructure impacting commercial agility
The motivation behind the project
Background and Business Drivers
4
Objective
• Significantly reduce the time to capture new business strategies in IT systems
Significant change in IT delivery
• Embrace Agile delivery of new functionality
• Develop new payment and sales systems
• Access and extend existing data
• Multiple SCRUM teams using test-driven development
• Phased delivery
Short-term technical drivers
• Quickly understand the structure, nature and consistency of the existing data
Longer term technical drivers
• Introduce a service-based semantic agent to access software services
Fundamentally changing the way IT functionality is delivered
A new IT Strategy
5
Subject matter experts (SMEs)
• Understanding the data means interfacing with SMEs
• Multiple SCRUM teams need access to SMEs
• Knowledge is in Silos and not co-located with SCRUM teams
• SMEs may not know the answers
Bottleneck / Choke point
• SCRUM teams need quick answers to data / process questions
• SME bandwidth stifles SCRUM agility
• Introduces a single project bottleneck/choke point
Overwhelming the SMEs
• Free and unfettered access to the SMEs would create chaos
• Need to filter questions to the SMEs
Challenges
Many technical challenges stood in their way
6
CRM
Billing
Ref Data
Debt
Orders
Ticketing
Content
Product
SME
SCRUM
SCRUM
SCRUM
SCRUM
SME
SME
SME
SME
SME
SME
SME
Many systems with complex interdependencies
• CRM
• Billing
• Reference Data
• Debt processing
• Order handling
• Trouble ticketing systems
• Subscriber card management systems
• Content access entitlements
• Product catalogue
Fragmentation
• Business entities fragmented
• “Customer” properties in many systems
The Scope and Scale of the Problem
Payments and sales system involving 20+ systems and legacy data
7
Data estate problems
• Data quality isn’t consistent
• Data fragmentation is high
• Understanding the data is complex
• How are business entities stored in different applications and
data sources?
• What impact should processes have on the data –
flags, statuses, etc.
• When data is duplicated, which data sources should take
preference?
• Scale of data
• 30+ TB of historic trading data
• 3 Vs - The Variety and Volume of data are very high
The Data
30TB of transactional data over 15 years of system changes
8
?
Non-semantic alternatives
• Train more SMEs
• Work around SME’s other priorities
• Educational workshops
• Take time to document systems
Data-profiling alternatives
• Reverse engineering schemas
• ETL Tooling
• Didn’t want to create yet another data warehouse
Chose a datanauting approach
• Supports their commitment to Agile development
• Allows SCRUM teams to explore and ask questions of the data
without overloading SMEs
Alternatives
Alternative approaches to solving the problem were considered
9
What we do, and why we’re different
• Ontology leverages graph and semantic search technologies to address enterprise data issues
• We address complex data integration problems
• Data Acquisition
• Data Correlation
• Data Migration
• We produce fully fledged operational applications that use semantic search in
• Telecommunications
• Media
• Financial services
• The Ontology Difference
• Inherently agile – no schema
• Datanauting: data-first, structure later
• Just enough modelling
• Structured and unstructured data
How we approached the problem
The Ontology Approach
10
Exploration of data sources…
The Ontology Approach - Datanaughting
Identify sources
Connect to sources
•Index source
Search for entities
•Refactor entities
•Create URI pattern matching
•Map entities to RDF
Search for linked
entities
•Add references
Search for equivalent
entities
•Create matching URIs
•Map entities to RDF
• DBs
• SPARQL Endpoints
• Structured files
• MS Excel, CSV, XML, RDF
• CISCO and other device configurations
• Propriety formats
• Unstructured files
• MS Word, PDFs, etc.
The Ontology Approach - Datanaughting
Identify sources
Identify sources
Connect to
sources
Search for
entities
Search for linked
entities
Search for
equivalent
entities
• Setup the connection
• Index sources
• Add search facets
• Tokenise compound values e.g.
• Service names are concatenated “Service-LON/01”
• Product names use “CamelCase”
The Ontology Approach - Datanaughting
Connect to sources
Identify sources
Connect to
sources
Search for
entities
Search for linked
entities
Search for
equivalent
entities
• Search for business entities
• Refactor “denormalised” data
• Choose a URI pattern to represent instances
• Set a type for the entity
• Map properties to owl:DatatypeProperty
The Ontology Approach - Datanaughting
Search for entities
Identify sources
Connect to
sources
Search for
entities
Search for linked
entities
Search for
equivalent
entities
• Search for entities that should be linked
• Add references (owl:ObjectProperty) between entities that are to
be linked
The Ontology Approach - Datanaughting
Search for linked entities
Identify sources
Connect to
sources
Search for
entities
Search for linked
entities
Search for
equivalent
entities
• Search for semantically equivalent entities in other data sources
• Search based on property names
• Search based on strict value matching/weighting
• Search based on sub-string matching/weighting
• Reuse the URI pattern
• Create references
The Ontology Approach - Datanaughting
Search for equivalent entities
Identify sources
Connect to
sources
Search for
entities
Search for linked
entities
Search for
equivalent
entities
High-level solution to the problems the organisation faced
• Removed the SME bottleneck - a key enabler for the Agile / SCRUM approach
• Creates a searchable domain model, breaking the data into discrete “chunks”
• Ontology allows the SCRUM teams to understand the legacy data through ad-hoc queries
• Can understand how business concepts are mapped across multiple contradictory data repositories
• The quality and suitability of data can more easily be assessed
• Provides a definitive view of the commercial position for a given subscriber or set of subscribers
• Backlog and sprint priorities are based on a complete understanding of the complexity of the task
• Provide data to facilitate mock ups and test harnesses
Ontology provides SCRUM members with insight into the data
Project Results
17
Project Results
SCRUM teams gain insight into data
18
CRM
Billing
Ref Data
Debt
Orders
Ticketing
Content
Product
SME
SCRUM
SCRUM
SCRUM
SCRUM
SME
SME
SME
SME
SME
SME
SME
Project Results
Product Architecture
19
Modeller
External
Event
Sources
Web UI
Ontology Intelligent 360 Ontology Integrity
Manager
Semantic Graph
Store
Query API
Universal
Search Core
Semantic Processing Core
Universal
Search Core
Authenticationand
Notification
LDAP
Server
(optional)
Mail Server
(optional)
HTTPS
RTIA
Fully Modelled Data Sources
CSV
RDBMS
XML
JDBC
XLS
Other Data Sources
DOC PDF XLS MAIL
XML
Ontology 4 Modeller Ontology 4 RuntimeHTTPS
End Users
(Browser Access)
Variety
• Ability to access data in a variety of formats
• Avoid integration to live systems
• Possible to work from database - dumps avoids politics
• Embracing change – inherently agile
Volume
• Ontology techniques for managing data scale
• Partial index of data
• Partial modelling
• Semantic search with SQL query to live systems
Velocity
VarietyVolume
Project Results
Dealing with two large Vees
20
Why Ontology?
• Agile response through inherently agile technology
• Datanauting provides agile response to SCRUM teams
• SME time can now be used for valuable queries
Technical advantages
• No Schema, No Integration, No Big Bang, No Search
Restrictions, No Upfront Risk
Benefits delivered
• Speed – Greatly accelerated the analysis phase of the project
• Risk – Project is not viable without an understanding of the data
No
Upfront
Risk
No Schema
No
Integration
No Big Bang
No Search
Restrictions
Zen, and the art of Datanauting
Advantages of the Ontology approach to Data Integration
21
Learn More
To learn more about Ontology
Systems, or to access more detailed
information about our products and
services, please either:
Call +44 20 7239 4949
Visit ontology.com
Email info@ontology.com
Subject to change. All rights reserved. © 2013
No part of this document may be reproduced in any
form or by any means for any purpose without our
written permission. All other trademarks appearing
in this document are acknowledged as the trademarks
of their respective owners.
Ontology-Partners Limited trading as Ontology
Systems
Ontology Systems
Phoenix Yard,
65 Kings Cross Road,
London WC1X 9LW,
UNITED KINGDOM
Registered in England No. 5794201.
Registered Office.
Dalton House,
60 Windsor Avenue,
London SW19 2RR
UNITED KINGDOM

More Related Content

What's hot

The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsSemantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsLinked Enterprise Date Services
 
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...semanticsconference
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?DATAVERSITY
 
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateFour Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateEdgar Alejandro Villegas
 
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov OsloCore banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov OsloAlexander Petrov
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
Gartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit BrochureGartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit BrochureNadia Smith
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platformHaoran Du
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics OverviewKevin Dingle
 
Big data and you
Big data and you Big data and you
Big data and you IBM
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic TechnologiesPeter Haase
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsCaserta
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case studyNandita Nityanandam
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021Dendej Sawarnkatat
 

What's hot (20)

The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Data modeling
Data modelingData modeling
Data modeling
 
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsSemantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web Applications
 
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?Analytics, Business Intelligence, and Data Science - What's the Progression?
Analytics, Business Intelligence, and Data Science - What's the Progression?
 
Four Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by ActuateFour Pillars of Business Analytics by Actuate
Four Pillars of Business Analytics by Actuate
 
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov OsloCore banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
Core banking Closure bank day OSWA meetup 2018-Alexander Petrov Oslo
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
Gartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit BrochureGartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit Brochure
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics Overview
 
Big data and you
Big data and you Big data and you
Big data and you
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
Adding Hadoop to Your Analytics Mix?
Adding Hadoop to Your Analytics Mix?Adding Hadoop to Your Analytics Mix?
Adding Hadoop to Your Analytics Mix?
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 

Similar to Zen and the Art of Datanauting

Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Concept Searching, Inc
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseJesus Rodriguez
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesDATAVERSITY
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatiaSatish Bhatia
 
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
84% of Migration Projects Fail – Getting it Right in SharePoint WebinarConcept Searching, Inc
 
Best Practices for Meeting State Data Management Objectives
Best Practices for Meeting State Data Management ObjectivesBest Practices for Meeting State Data Management Objectives
Best Practices for Meeting State Data Management ObjectivesEmbarcadero Technologies
 
Drive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event ProcessingDrive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event ProcessingPerficient, Inc.
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
Big data
Big dataBig data
Big dataRiya
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...Neo4j
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Precisely
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationDenodo
 

Similar to Zen and the Art of Datanauting (20)

Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
 
Democratizing Data Science in the Enterprise
Democratizing Data Science in the EnterpriseDemocratizing Data Science in the Enterprise
Democratizing Data Science in the Enterprise
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
 
Best Practices for Meeting State Data Management Objectives
Best Practices for Meeting State Data Management ObjectivesBest Practices for Meeting State Data Management Objectives
Best Practices for Meeting State Data Management Objectives
 
Drive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event ProcessingDrive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event Processing
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
Big data
Big dataBig data
Big data
 
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
An Agile & Adaptive Approach to Addressing Financial Services Regulations and...
 
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data Virtualization
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Zen and the Art of Datanauting

  • 1. Exploration of large and complex data estates to gain an accurate understanding of the data structures and data quality Zen, and the art of Datanauting Carl Bray Product Manager, Ontology Systems Matt Clark Design Authority, BSkyB
  • 2. Datanauting Boldly going where no data integrator has gone before… 2
  • 3. 3 15 years of transaction data 10 million+ customers 900 engineers making changes 30 TB of data 20+ Applications Q) How do you start to understand this data estate?
  • 4. The company • UK subsidiary of a global media organisation • Provides fixed line telephone, Internet and television entertainment services to UK residents • 10 million+ customers, trading for 15 years Business drivers: • Driven by marketing innovation • Extend and upsell to customer base • React to competitive threats • Technical infrastructure impacting commercial agility The motivation behind the project Background and Business Drivers 4
  • 5. Objective • Significantly reduce the time to capture new business strategies in IT systems Significant change in IT delivery • Embrace Agile delivery of new functionality • Develop new payment and sales systems • Access and extend existing data • Multiple SCRUM teams using test-driven development • Phased delivery Short-term technical drivers • Quickly understand the structure, nature and consistency of the existing data Longer term technical drivers • Introduce a service-based semantic agent to access software services Fundamentally changing the way IT functionality is delivered A new IT Strategy 5
  • 6. Subject matter experts (SMEs) • Understanding the data means interfacing with SMEs • Multiple SCRUM teams need access to SMEs • Knowledge is in Silos and not co-located with SCRUM teams • SMEs may not know the answers Bottleneck / Choke point • SCRUM teams need quick answers to data / process questions • SME bandwidth stifles SCRUM agility • Introduces a single project bottleneck/choke point Overwhelming the SMEs • Free and unfettered access to the SMEs would create chaos • Need to filter questions to the SMEs Challenges Many technical challenges stood in their way 6 CRM Billing Ref Data Debt Orders Ticketing Content Product SME SCRUM SCRUM SCRUM SCRUM SME SME SME SME SME SME SME
  • 7. Many systems with complex interdependencies • CRM • Billing • Reference Data • Debt processing • Order handling • Trouble ticketing systems • Subscriber card management systems • Content access entitlements • Product catalogue Fragmentation • Business entities fragmented • “Customer” properties in many systems The Scope and Scale of the Problem Payments and sales system involving 20+ systems and legacy data 7
  • 8. Data estate problems • Data quality isn’t consistent • Data fragmentation is high • Understanding the data is complex • How are business entities stored in different applications and data sources? • What impact should processes have on the data – flags, statuses, etc. • When data is duplicated, which data sources should take preference? • Scale of data • 30+ TB of historic trading data • 3 Vs - The Variety and Volume of data are very high The Data 30TB of transactional data over 15 years of system changes 8 ?
  • 9. Non-semantic alternatives • Train more SMEs • Work around SME’s other priorities • Educational workshops • Take time to document systems Data-profiling alternatives • Reverse engineering schemas • ETL Tooling • Didn’t want to create yet another data warehouse Chose a datanauting approach • Supports their commitment to Agile development • Allows SCRUM teams to explore and ask questions of the data without overloading SMEs Alternatives Alternative approaches to solving the problem were considered 9
  • 10. What we do, and why we’re different • Ontology leverages graph and semantic search technologies to address enterprise data issues • We address complex data integration problems • Data Acquisition • Data Correlation • Data Migration • We produce fully fledged operational applications that use semantic search in • Telecommunications • Media • Financial services • The Ontology Difference • Inherently agile – no schema • Datanauting: data-first, structure later • Just enough modelling • Structured and unstructured data How we approached the problem The Ontology Approach 10
  • 11. Exploration of data sources… The Ontology Approach - Datanaughting Identify sources Connect to sources •Index source Search for entities •Refactor entities •Create URI pattern matching •Map entities to RDF Search for linked entities •Add references Search for equivalent entities •Create matching URIs •Map entities to RDF
  • 12. • DBs • SPARQL Endpoints • Structured files • MS Excel, CSV, XML, RDF • CISCO and other device configurations • Propriety formats • Unstructured files • MS Word, PDFs, etc. The Ontology Approach - Datanaughting Identify sources Identify sources Connect to sources Search for entities Search for linked entities Search for equivalent entities
  • 13. • Setup the connection • Index sources • Add search facets • Tokenise compound values e.g. • Service names are concatenated “Service-LON/01” • Product names use “CamelCase” The Ontology Approach - Datanaughting Connect to sources Identify sources Connect to sources Search for entities Search for linked entities Search for equivalent entities
  • 14. • Search for business entities • Refactor “denormalised” data • Choose a URI pattern to represent instances • Set a type for the entity • Map properties to owl:DatatypeProperty The Ontology Approach - Datanaughting Search for entities Identify sources Connect to sources Search for entities Search for linked entities Search for equivalent entities
  • 15. • Search for entities that should be linked • Add references (owl:ObjectProperty) between entities that are to be linked The Ontology Approach - Datanaughting Search for linked entities Identify sources Connect to sources Search for entities Search for linked entities Search for equivalent entities
  • 16. • Search for semantically equivalent entities in other data sources • Search based on property names • Search based on strict value matching/weighting • Search based on sub-string matching/weighting • Reuse the URI pattern • Create references The Ontology Approach - Datanaughting Search for equivalent entities Identify sources Connect to sources Search for entities Search for linked entities Search for equivalent entities
  • 17. High-level solution to the problems the organisation faced • Removed the SME bottleneck - a key enabler for the Agile / SCRUM approach • Creates a searchable domain model, breaking the data into discrete “chunks” • Ontology allows the SCRUM teams to understand the legacy data through ad-hoc queries • Can understand how business concepts are mapped across multiple contradictory data repositories • The quality and suitability of data can more easily be assessed • Provides a definitive view of the commercial position for a given subscriber or set of subscribers • Backlog and sprint priorities are based on a complete understanding of the complexity of the task • Provide data to facilitate mock ups and test harnesses Ontology provides SCRUM members with insight into the data Project Results 17
  • 18. Project Results SCRUM teams gain insight into data 18 CRM Billing Ref Data Debt Orders Ticketing Content Product SME SCRUM SCRUM SCRUM SCRUM SME SME SME SME SME SME SME
  • 19. Project Results Product Architecture 19 Modeller External Event Sources Web UI Ontology Intelligent 360 Ontology Integrity Manager Semantic Graph Store Query API Universal Search Core Semantic Processing Core Universal Search Core Authenticationand Notification LDAP Server (optional) Mail Server (optional) HTTPS RTIA Fully Modelled Data Sources CSV RDBMS XML JDBC XLS Other Data Sources DOC PDF XLS MAIL XML Ontology 4 Modeller Ontology 4 RuntimeHTTPS End Users (Browser Access)
  • 20. Variety • Ability to access data in a variety of formats • Avoid integration to live systems • Possible to work from database - dumps avoids politics • Embracing change – inherently agile Volume • Ontology techniques for managing data scale • Partial index of data • Partial modelling • Semantic search with SQL query to live systems Velocity VarietyVolume Project Results Dealing with two large Vees 20
  • 21. Why Ontology? • Agile response through inherently agile technology • Datanauting provides agile response to SCRUM teams • SME time can now be used for valuable queries Technical advantages • No Schema, No Integration, No Big Bang, No Search Restrictions, No Upfront Risk Benefits delivered • Speed – Greatly accelerated the analysis phase of the project • Risk – Project is not viable without an understanding of the data No Upfront Risk No Schema No Integration No Big Bang No Search Restrictions Zen, and the art of Datanauting Advantages of the Ontology approach to Data Integration 21
  • 22. Learn More To learn more about Ontology Systems, or to access more detailed information about our products and services, please either: Call +44 20 7239 4949 Visit ontology.com Email info@ontology.com Subject to change. All rights reserved. © 2013 No part of this document may be reproduced in any form or by any means for any purpose without our written permission. All other trademarks appearing in this document are acknowledged as the trademarks of their respective owners. Ontology-Partners Limited trading as Ontology Systems Ontology Systems Phoenix Yard, 65 Kings Cross Road, London WC1X 9LW, UNITED KINGDOM Registered in England No. 5794201. Registered Office. Dalton House, 60 Windsor Avenue, London SW19 2RR UNITED KINGDOM