SlideShare a Scribd company logo
Rapid Data Integration
and Curation
Delivering Business Value in the First 24 Hours

SPEAKER:
Thomas Kelly, Practice Director
Semantic Technology Center of Excellence
Enterprise Information Management
Cognizant Technology Solutions, Inc.
| ©2013, Cognizant
Agenda

1

2

| ©2013, Cognizant

BARRIERS TO RAPID DATA INTEGRATION

3

2

DELIVERING BUSINESS VALUE

RAPID DATA INTEGRATION AND
CURATION METHOD
We are at an Inflection Point at which Value is Created or
Destroyed

Source : The Motley Fool
3

| ©2013, Cognizant
Delivering Information Faster Produces Direct, Measurable
Business Value
What Difference Does One Day Make?

A blockbuster drug generates $3M+ in
revenue per day; a one-day delay in
completing clinical trials can generate up
to $500K in additional costs
Banking

A moderate-sized brokerage firm can
generate up to $1M in financial services
revenue per day

4

| ©2013, Cognizant
Barriers to Rapid Data Integration
Rework is expensive –
must “get it right” from
the start

Fit with the existing
data; avoid data silos

| ©2013, Cognizant

Reconciling differences
(data formats, coding,
identifiers, etc.)

Managing data quality
(accuracy, precision,
context)

5

Knowledge acquisition
takes time; new insights
come from
experimentation

Overcoming process
inertia
Evolutionary Method to Data Integration and Curation
Responsive

Data
Approach

• As new information flows into the
enterprise, people and processes are
dynamic in nature
• Questions arising during this phase
are “what to do” and “how to make
the best sense of the new data
source”. Rapid integration tools will
aid in quick prototyping and building
solutions of value

Rapid

Integration and
Curation
Method

• The data is profiled and explored for
value and quality issues.
• A rapid pruning exercise is
undertaken by prototyping and
integrating with in-house data to
evaluate if data is fit for purpose. It
influences in formulating a effective
approach for further phases.

Information
Management
Approach

Time

6

| ©2013, Cognizant

Managed
• As we progress, issues with the new
data are identified and managed.
The main focus is on establishing
data quality and adhering to
enterprise standards and
frameworks while building optimal
integration approaches
• The integration process is
evolutionary as further discoveries
are made for optimal design

Evolutionary
• Progressive build based on the new
data.
• Building awareness of the new
platform and fine tuning the
capabilities around the data source
are primary activities

Proactive
• Data management evolves to a morerefined state. A feedback loop is built
to enable proactive decisions around
data organization and access.
• Data integration is efficient and
stable. Verifiable compliance and
security.
• Integrated with the enterprise
information management framework

Predictable
• The services built around the new
data sources are now managed.
• The focus is on evolution of business
processes, based on managed models

Tactical

Progressive

Managed

First 1-5 Days

First 1 -3 Months

After 3 months
Leverage Insights and Expertise, Rapidly and Sustainably
Identify and leverage
existing, relevant data
assets and expertise

Ingest new data
sources (light
integration and
curation)

Reuse Expertise

Analyze
Monitor and measure
use and benefits
achieved; identify next
set of priorities

Realize
Benefits
Extend

Create and extend data
relationships,
leveraging insights from
previous study cycles

Govern
Elevate proven data,
relationships, and expertise
to organization-wise
definition

7

| ©2013, Cognizant

Refine
Capture insights from new data
analysis cycles, refining
relationships to support new
analytics
Can You Help Me With Some Data?

8

| ©2013, Cognizant
Rapid Data Integration and Curation Method

1

Define Preliminary Objectives

2

Profile the New Data

3

Generate Initial Ontology for the New Data
Generate Initial Ontology for the Existing
Data (if necessary)

4
5

Integrate Entities over Common URIs

6

Create URI Links

7

Add Initial Data Quality Filters

8
9

| ©2013, Cognizant

Analyze Data and Generate Feedback
1. Define Preliminary Objectives

1. Discuss Functional and Timing Objectives, and
Priorities
2. Clarify Immediate, Short-Term, and Long-Term
Business Value (SMART *)
a. Cost Reduction/Avoidance
b. Meet Critical Customer Need
3. Is This the Right Solution?
4. Set Expectations
a. Evolutionary Process
b. Initial Results Quickly
c. Frequent, Active Participation
d. Feedback Critical to Making Refinements

5. Brainstorm Deliverables that Produce Business
Benefits; Define a Few Sample Queries
6. Ask for Commitment to Benefits Realization
7. Start the Clock!

* SMART -- Specific, Measurable, Attainable, Realistic, and Traceable
10

| ©2013, Cognizant
2. Profile the New Data

Light Profiling, focusing on
Understanding Key Data Elements
Needed to Meet the First
Deliverable
Identify Initial Data Filtering
Candidates

Capture Insights about Key Data
Relationships

11

| ©2013, Cognizant
3. Generate Initial Ontology for the New Data

Reverse-engineer Ontology from
New Data

Load New Data into the RDF Store
(or Create Link to the Data)

Create Business-relevant Synonyms
for High-Importance Attributes

Refinements will be made in
Future Iterations

12

| ©2013, Cognizant
4. Generate Initial Ontology for the Existing Data (if necessary)

Map Selected Entities and Critical
Attributes for Existing Data Source(s)
to the Source-specific Ontology

Existing
Data

New
Data

13

| ©2013, Cognizant

Add Reference to the Source-specific
Ontology to the New Data Ontology

Refinements will be made in
Future Iterations
New Data Ontology manages
integration with Existing Data until the
ontology is sufficiently mature to be
promoted into an enterprise ontology
5. Integrate Entities over Common URIs

Different URIs, Separately
Maintained

Focus on Key Entities

Equivalence Functions Logically
Integrate the Federated Data

Reduces Query Complexity and
Can Improve Query Performance

14

| ©2013, Cognizant
6. Create URI Links
Geography

Customer

cust:ZipCode

JOIN

geo:ZipCode

Geography

Customer
cust:ZipCodeURI

LINK

The Data has Common Values that
can be used in Join Operations, but
Doesn’t have Links
Links Reduce Query Complexity
and Can Improve Query
Performance
Focus on Key Queries, Identify
Complex or Time-Sensitive Joins
Add Linking URI Attribute to
Dependent Entity
Amend Selected Queries to
Leverage the New Link

15

| ©2013, Cognizant
7. Add Initial Data Quality Filters and Transformations

Traditional Data Warehouse
Data Quality
Happens Here
Data Quality
Happens Here
Data
Source A

Data
Source B

Data
Source C

16

| ©2013, Cognizant

Existing
Data

ETL
New Data

And
Data Here
Warehouse

JIT Data Quality Management,
Everywhere that it is Needed

Data Filtering and Transformation
Rules are Encoded in the Ontology

Focus is on Critical Data
Quality Rules
Rule Updates are Automatically
in Effect, without Reloading All
of the Data
8. Analyze Data and Generate Feedback

Demonstrate Visualization using
Sample Queries

Walk Through Available Data
Sets and Data Organization

Experiment with Data Access
and New Visualizations
Provide Next Steps
Recommendations to Refine the
Data Integration and Curation

17

| ©2013, Cognizant
Architectural Foundation for Rapid Data Integration and Curation

SPARQL-based Visualization
Relational-to-RDF Mapping
Data Profiling

18

| ©2013, Cognizant

Ontology Editor

Automated Ontology Generation

RDF Store
Data Import
RDF Store
Capabilities That We Have Introduced

Rapid Response to New Data
Onboarding Needs

Process for Evolutionary Data
Integration and Curation

Flexible Design that is
Responsive to Business Changes
Foundation for Refinement and
Expansion of Ontology Models from
Fit-for-Purpose to Department, to
Business Unit, to Enterprise

19

| ©2013, Cognizant
Questions?

20 | ©2013, Cognizant
Thank you!

21 | ©2013, Cognizant
Speaker
Thomas (Tom) Kelly
Practice Director, Enterprise Information Management, Cognizant

Thomas Kelly is a Director in Cognizant’s Enterprise
Information Management (EIM) Practice and heads its
Semantic Technology Center of Excellence, a technology
specialty of Cognizant Business Consulting (CBC). He has
20-plus years of technology consulting experience in
leading data warehousing, business intelligence and big
data projects, focused primarily on the life sciences and
healthcare industries. Tom can be reached at
Thomas.Kelly@cognizant.com.

22

| ©2013, Cognizant

More Related Content

What's hot

Change management success for data governance
Change management success for data governanceChange management success for data governance
Change management success for data governance
Reid Elliott
 
Enterprise Data World Webinar: A Strategic Approach to Data Quality
Enterprise Data World Webinar: A Strategic Approach to Data Quality Enterprise Data World Webinar: A Strategic Approach to Data Quality
Enterprise Data World Webinar: A Strategic Approach to Data Quality
DATAVERSITY
 
A better business case for big data with Hadoop
A better business case for big data with HadoopA better business case for big data with Hadoop
A better business case for big data with Hadoop
Aptitude Software
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data Analytics
Abhishek Sood
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
Boris Otto
 
Data Quality
Data QualityData Quality
Data Quality
Michael Collins
 
The Great Data Debate (4) Implementing a lean approach to Data Quality Manage...
The Great Data Debate (4) Implementing a lean approach to Data Quality Manage...The Great Data Debate (4) Implementing a lean approach to Data Quality Manage...
The Great Data Debate (4) Implementing a lean approach to Data Quality Manage...
BCS Data Management Specialist Group
 
Data Quality
Data QualityData Quality
Data Quality
jerdeb
 
Big Data Readiness & Business Intelligence Capabilities Matrix
Big Data Readiness & Business Intelligence Capabilities MatrixBig Data Readiness & Business Intelligence Capabilities Matrix
Big Data Readiness & Business Intelligence Capabilities Matrix
Michael Ghen
 
DGIQ 2018 Presentation: How to be successful in the post GDPR landscape – bui...
DGIQ 2018 Presentation: How to be successful in the post GDPR landscape – bui...DGIQ 2018 Presentation: How to be successful in the post GDPR landscape – bui...
DGIQ 2018 Presentation: How to be successful in the post GDPR landscape – bui...
DATUM LLC
 
The Total Economic Impact™ Of Cisco Data Virtualization
The Total Economic Impact™ Of Cisco Data VirtualizationThe Total Economic Impact™ Of Cisco Data Virtualization
The Total Economic Impact™ Of Cisco Data Virtualization
xband
 
Data Governance PowerPoint Presentation Slides
Data Governance PowerPoint Presentation Slides Data Governance PowerPoint Presentation Slides
Data Governance PowerPoint Presentation Slides
SlideTeam
 
Infographic: Data Governance Best Practices
Infographic: Data Governance Best Practices Infographic: Data Governance Best Practices
Infographic: Data Governance Best Practices
Enterprise Management Associates
 
Building an Effective & Extensible Data & Analytics Operating Model
Building an Effective & Extensible Data & Analytics Operating ModelBuilding an Effective & Extensible Data & Analytics Operating Model
Building an Effective & Extensible Data & Analytics Operating Model
Cognizant
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
DATA360US
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
Stephen McCarthy
 
Data Governance for Enterprises
Data Governance for EnterprisesData Governance for Enterprises
Data Governance for Enterprises
Chaitanya Avasarala
 
Overall Approach to Data Quality ROI
Overall Approach to Data Quality ROIOverall Approach to Data Quality ROI
Overall Approach to Data Quality ROI
FindWhitePapers
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
Jeffrey T. Pollock
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
Aachen Data & AI Meetup
 

What's hot (20)

Change management success for data governance
Change management success for data governanceChange management success for data governance
Change management success for data governance
 
Enterprise Data World Webinar: A Strategic Approach to Data Quality
Enterprise Data World Webinar: A Strategic Approach to Data Quality Enterprise Data World Webinar: A Strategic Approach to Data Quality
Enterprise Data World Webinar: A Strategic Approach to Data Quality
 
A better business case for big data with Hadoop
A better business case for big data with HadoopA better business case for big data with Hadoop
A better business case for big data with Hadoop
 
Tips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data AnalyticsTips --Break Down the Barriers to Better Data Analytics
Tips --Break Down the Barriers to Better Data Analytics
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data Quality
Data QualityData Quality
Data Quality
 
The Great Data Debate (4) Implementing a lean approach to Data Quality Manage...
The Great Data Debate (4) Implementing a lean approach to Data Quality Manage...The Great Data Debate (4) Implementing a lean approach to Data Quality Manage...
The Great Data Debate (4) Implementing a lean approach to Data Quality Manage...
 
Data Quality
Data QualityData Quality
Data Quality
 
Big Data Readiness & Business Intelligence Capabilities Matrix
Big Data Readiness & Business Intelligence Capabilities MatrixBig Data Readiness & Business Intelligence Capabilities Matrix
Big Data Readiness & Business Intelligence Capabilities Matrix
 
DGIQ 2018 Presentation: How to be successful in the post GDPR landscape – bui...
DGIQ 2018 Presentation: How to be successful in the post GDPR landscape – bui...DGIQ 2018 Presentation: How to be successful in the post GDPR landscape – bui...
DGIQ 2018 Presentation: How to be successful in the post GDPR landscape – bui...
 
The Total Economic Impact™ Of Cisco Data Virtualization
The Total Economic Impact™ Of Cisco Data VirtualizationThe Total Economic Impact™ Of Cisco Data Virtualization
The Total Economic Impact™ Of Cisco Data Virtualization
 
Data Governance PowerPoint Presentation Slides
Data Governance PowerPoint Presentation Slides Data Governance PowerPoint Presentation Slides
Data Governance PowerPoint Presentation Slides
 
Infographic: Data Governance Best Practices
Infographic: Data Governance Best Practices Infographic: Data Governance Best Practices
Infographic: Data Governance Best Practices
 
Building an Effective & Extensible Data & Analytics Operating Model
Building an Effective & Extensible Data & Analytics Operating ModelBuilding an Effective & Extensible Data & Analytics Operating Model
Building an Effective & Extensible Data & Analytics Operating Model
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
Data Quality Presentation
Data Quality PresentationData Quality Presentation
Data Quality Presentation
 
Data Governance for Enterprises
Data Governance for EnterprisesData Governance for Enterprises
Data Governance for Enterprises
 
Overall Approach to Data Quality ROI
Overall Approach to Data Quality ROIOverall Approach to Data Quality ROI
Overall Approach to Data Quality ROI
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
Why data governance is the new buzz?
Why data governance is the new buzz?Why data governance is the new buzz?
Why data governance is the new buzz?
 

Similar to Rapid data integration and curation

Focus
FocusFocus
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
Pedro Martins
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
Justin Hayward
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?
DATUM LLC
 
Executive Overview on EDM Strategy
Executive Overview on EDM StrategyExecutive Overview on EDM Strategy
Executive Overview on EDM Strategy
ssuserf8f9b2
 
1145_October5_NYCDGSummit
1145_October5_NYCDGSummit1145_October5_NYCDGSummit
1145_October5_NYCDGSummit
Robert Quinn
 
12 Guidelines For Success in Data Quality Projects
12 Guidelines For Success in Data Quality Projects12 Guidelines For Success in Data Quality Projects
12 Guidelines For Success in Data Quality Projects
Innovative_Systems
 
TOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptxTOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptx
SabrinaLameiras1
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
DATAVERSITY
 
Pivotal_thought leadership paper_WEB Version
Pivotal_thought leadership paper_WEB VersionPivotal_thought leadership paper_WEB Version
Pivotal_thought leadership paper_WEB Version
Madeleine Lewis
 
Big & Fast Data: The Democratization of Information
Big & Fast Data: The Democratization of InformationBig & Fast Data: The Democratization of Information
Big & Fast Data: The Democratization of Information
Capgemini
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Precisely
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance Success
Ample Insight Inc
 
DGIQ 2013 Learned and Applied Concepts
DGIQ 2013 Learned and Applied Concepts DGIQ 2013 Learned and Applied Concepts
DGIQ 2013 Learned and Applied Concepts
Angela Boyd
 
Fate of the Chief Data Officer
Fate of the Chief Data OfficerFate of the Chief Data Officer
Fate of the Chief Data Officer
Tamarah Usher
 
how to successfully implement a data analytics solution.pdf
how to successfully implement a data analytics solution.pdfhow to successfully implement a data analytics solution.pdf
how to successfully implement a data analytics solution.pdf
basilmph
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
Mdm: why, when, how
Mdm: why, when, howMdm: why, when, how
Mdm: why, when, how
Jean-Michel Franco
 
BI_StrategyDM2
BI_StrategyDM2BI_StrategyDM2
BI_StrategyDM2
Dan McDonald
 
Build a Winning Data Strategy in 2022.pdf
Build a Winning Data Strategy in 2022.pdfBuild a Winning Data Strategy in 2022.pdf
Build a Winning Data Strategy in 2022.pdf
AvinashBatham
 

Similar to Rapid data integration and curation (20)

Focus
FocusFocus
Focus
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?
 
Executive Overview on EDM Strategy
Executive Overview on EDM StrategyExecutive Overview on EDM Strategy
Executive Overview on EDM Strategy
 
1145_October5_NYCDGSummit
1145_October5_NYCDGSummit1145_October5_NYCDGSummit
1145_October5_NYCDGSummit
 
12 Guidelines For Success in Data Quality Projects
12 Guidelines For Success in Data Quality Projects12 Guidelines For Success in Data Quality Projects
12 Guidelines For Success in Data Quality Projects
 
TOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptxTOP_407070357-Data-Governance-Playbook.pptx
TOP_407070357-Data-Governance-Playbook.pptx
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Pivotal_thought leadership paper_WEB Version
Pivotal_thought leadership paper_WEB VersionPivotal_thought leadership paper_WEB Version
Pivotal_thought leadership paper_WEB Version
 
Big & Fast Data: The Democratization of Information
Big & Fast Data: The Democratization of InformationBig & Fast Data: The Democratization of Information
Big & Fast Data: The Democratization of Information
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance Success
 
DGIQ 2013 Learned and Applied Concepts
DGIQ 2013 Learned and Applied Concepts DGIQ 2013 Learned and Applied Concepts
DGIQ 2013 Learned and Applied Concepts
 
Fate of the Chief Data Officer
Fate of the Chief Data OfficerFate of the Chief Data Officer
Fate of the Chief Data Officer
 
how to successfully implement a data analytics solution.pdf
how to successfully implement a data analytics solution.pdfhow to successfully implement a data analytics solution.pdf
how to successfully implement a data analytics solution.pdf
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Mdm: why, when, how
Mdm: why, when, howMdm: why, when, how
Mdm: why, when, how
 
BI_StrategyDM2
BI_StrategyDM2BI_StrategyDM2
BI_StrategyDM2
 
Build a Winning Data Strategy in 2022.pdf
Build a Winning Data Strategy in 2022.pdfBuild a Winning Data Strategy in 2022.pdf
Build a Winning Data Strategy in 2022.pdf
 

More from Thomas Kelly, PMP

Semantic Analytics
Semantic AnalyticsSemantic Analytics
Semantic Analytics
Thomas Kelly, PMP
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
Thomas Kelly, PMP
 
Enterprise Semantic Technology
Enterprise Semantic TechnologyEnterprise Semantic Technology
Enterprise Semantic Technology
Thomas Kelly, PMP
 
Mobile semantic technology
Mobile semantic technologyMobile semantic technology
Mobile semantic technology
Thomas Kelly, PMP
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
Thomas Kelly, PMP
 
Semantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing PractitionerSemantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing Practitioner
Thomas Kelly, PMP
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Thomas Kelly, PMP
 

More from Thomas Kelly, PMP (8)

Semantic Analytics
Semantic AnalyticsSemantic Analytics
Semantic Analytics
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Enterprise Semantic Technology
Enterprise Semantic TechnologyEnterprise Semantic Technology
Enterprise Semantic Technology
 
Mobile semantic technology
Mobile semantic technologyMobile semantic technology
Mobile semantic technology
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
 
Semantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing PractitionerSemantic Technology for the Data Warehousing Practitioner
Semantic Technology for the Data Warehousing Practitioner
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data Collaboration
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

Rapid data integration and curation

  • 1. Rapid Data Integration and Curation Delivering Business Value in the First 24 Hours SPEAKER: Thomas Kelly, Practice Director Semantic Technology Center of Excellence Enterprise Information Management Cognizant Technology Solutions, Inc. | ©2013, Cognizant
  • 2. Agenda 1 2 | ©2013, Cognizant BARRIERS TO RAPID DATA INTEGRATION 3 2 DELIVERING BUSINESS VALUE RAPID DATA INTEGRATION AND CURATION METHOD
  • 3. We are at an Inflection Point at which Value is Created or Destroyed Source : The Motley Fool 3 | ©2013, Cognizant
  • 4. Delivering Information Faster Produces Direct, Measurable Business Value What Difference Does One Day Make? A blockbuster drug generates $3M+ in revenue per day; a one-day delay in completing clinical trials can generate up to $500K in additional costs Banking A moderate-sized brokerage firm can generate up to $1M in financial services revenue per day 4 | ©2013, Cognizant
  • 5. Barriers to Rapid Data Integration Rework is expensive – must “get it right” from the start Fit with the existing data; avoid data silos | ©2013, Cognizant Reconciling differences (data formats, coding, identifiers, etc.) Managing data quality (accuracy, precision, context) 5 Knowledge acquisition takes time; new insights come from experimentation Overcoming process inertia
  • 6. Evolutionary Method to Data Integration and Curation Responsive Data Approach • As new information flows into the enterprise, people and processes are dynamic in nature • Questions arising during this phase are “what to do” and “how to make the best sense of the new data source”. Rapid integration tools will aid in quick prototyping and building solutions of value Rapid Integration and Curation Method • The data is profiled and explored for value and quality issues. • A rapid pruning exercise is undertaken by prototyping and integrating with in-house data to evaluate if data is fit for purpose. It influences in formulating a effective approach for further phases. Information Management Approach Time 6 | ©2013, Cognizant Managed • As we progress, issues with the new data are identified and managed. The main focus is on establishing data quality and adhering to enterprise standards and frameworks while building optimal integration approaches • The integration process is evolutionary as further discoveries are made for optimal design Evolutionary • Progressive build based on the new data. • Building awareness of the new platform and fine tuning the capabilities around the data source are primary activities Proactive • Data management evolves to a morerefined state. A feedback loop is built to enable proactive decisions around data organization and access. • Data integration is efficient and stable. Verifiable compliance and security. • Integrated with the enterprise information management framework Predictable • The services built around the new data sources are now managed. • The focus is on evolution of business processes, based on managed models Tactical Progressive Managed First 1-5 Days First 1 -3 Months After 3 months
  • 7. Leverage Insights and Expertise, Rapidly and Sustainably Identify and leverage existing, relevant data assets and expertise Ingest new data sources (light integration and curation) Reuse Expertise Analyze Monitor and measure use and benefits achieved; identify next set of priorities Realize Benefits Extend Create and extend data relationships, leveraging insights from previous study cycles Govern Elevate proven data, relationships, and expertise to organization-wise definition 7 | ©2013, Cognizant Refine Capture insights from new data analysis cycles, refining relationships to support new analytics
  • 8. Can You Help Me With Some Data? 8 | ©2013, Cognizant
  • 9. Rapid Data Integration and Curation Method 1 Define Preliminary Objectives 2 Profile the New Data 3 Generate Initial Ontology for the New Data Generate Initial Ontology for the Existing Data (if necessary) 4 5 Integrate Entities over Common URIs 6 Create URI Links 7 Add Initial Data Quality Filters 8 9 | ©2013, Cognizant Analyze Data and Generate Feedback
  • 10. 1. Define Preliminary Objectives 1. Discuss Functional and Timing Objectives, and Priorities 2. Clarify Immediate, Short-Term, and Long-Term Business Value (SMART *) a. Cost Reduction/Avoidance b. Meet Critical Customer Need 3. Is This the Right Solution? 4. Set Expectations a. Evolutionary Process b. Initial Results Quickly c. Frequent, Active Participation d. Feedback Critical to Making Refinements 5. Brainstorm Deliverables that Produce Business Benefits; Define a Few Sample Queries 6. Ask for Commitment to Benefits Realization 7. Start the Clock! * SMART -- Specific, Measurable, Attainable, Realistic, and Traceable 10 | ©2013, Cognizant
  • 11. 2. Profile the New Data Light Profiling, focusing on Understanding Key Data Elements Needed to Meet the First Deliverable Identify Initial Data Filtering Candidates Capture Insights about Key Data Relationships 11 | ©2013, Cognizant
  • 12. 3. Generate Initial Ontology for the New Data Reverse-engineer Ontology from New Data Load New Data into the RDF Store (or Create Link to the Data) Create Business-relevant Synonyms for High-Importance Attributes Refinements will be made in Future Iterations 12 | ©2013, Cognizant
  • 13. 4. Generate Initial Ontology for the Existing Data (if necessary) Map Selected Entities and Critical Attributes for Existing Data Source(s) to the Source-specific Ontology Existing Data New Data 13 | ©2013, Cognizant Add Reference to the Source-specific Ontology to the New Data Ontology Refinements will be made in Future Iterations New Data Ontology manages integration with Existing Data until the ontology is sufficiently mature to be promoted into an enterprise ontology
  • 14. 5. Integrate Entities over Common URIs Different URIs, Separately Maintained Focus on Key Entities Equivalence Functions Logically Integrate the Federated Data Reduces Query Complexity and Can Improve Query Performance 14 | ©2013, Cognizant
  • 15. 6. Create URI Links Geography Customer cust:ZipCode JOIN geo:ZipCode Geography Customer cust:ZipCodeURI LINK The Data has Common Values that can be used in Join Operations, but Doesn’t have Links Links Reduce Query Complexity and Can Improve Query Performance Focus on Key Queries, Identify Complex or Time-Sensitive Joins Add Linking URI Attribute to Dependent Entity Amend Selected Queries to Leverage the New Link 15 | ©2013, Cognizant
  • 16. 7. Add Initial Data Quality Filters and Transformations Traditional Data Warehouse Data Quality Happens Here Data Quality Happens Here Data Source A Data Source B Data Source C 16 | ©2013, Cognizant Existing Data ETL New Data And Data Here Warehouse JIT Data Quality Management, Everywhere that it is Needed Data Filtering and Transformation Rules are Encoded in the Ontology Focus is on Critical Data Quality Rules Rule Updates are Automatically in Effect, without Reloading All of the Data
  • 17. 8. Analyze Data and Generate Feedback Demonstrate Visualization using Sample Queries Walk Through Available Data Sets and Data Organization Experiment with Data Access and New Visualizations Provide Next Steps Recommendations to Refine the Data Integration and Curation 17 | ©2013, Cognizant
  • 18. Architectural Foundation for Rapid Data Integration and Curation SPARQL-based Visualization Relational-to-RDF Mapping Data Profiling 18 | ©2013, Cognizant Ontology Editor Automated Ontology Generation RDF Store Data Import RDF Store
  • 19. Capabilities That We Have Introduced Rapid Response to New Data Onboarding Needs Process for Evolutionary Data Integration and Curation Flexible Design that is Responsive to Business Changes Foundation for Refinement and Expansion of Ontology Models from Fit-for-Purpose to Department, to Business Unit, to Enterprise 19 | ©2013, Cognizant
  • 21. Thank you! 21 | ©2013, Cognizant
  • 22. Speaker Thomas (Tom) Kelly Practice Director, Enterprise Information Management, Cognizant Thomas Kelly is a Director in Cognizant’s Enterprise Information Management (EIM) Practice and heads its Semantic Technology Center of Excellence, a technology specialty of Cognizant Business Consulting (CBC). He has 20-plus years of technology consulting experience in leading data warehousing, business intelligence and big data projects, focused primarily on the life sciences and healthcare industries. Tom can be reached at Thomas.Kelly@cognizant.com. 22 | ©2013, Cognizant