SlideShare a Scribd company logo
1 of 24
Download to read offline
Driving Behavioral Change for
Information Management through
Data-Driven Green Strategy
A Case Study
Urmi Majumder and Fernando Aguilar Islas
EDW 2024
Topics Covered
⬢ What is a Green Information Management (IM) Strategy, and
why should you have one?
⬢ How can Artificial Intelligence (AI) and Machine Learning (ML)
support your Green IM Strategy through content deduplication?
⬢ How can an organization use insights into their data to
influence employee behavior for IM?
⬢ How can you reap additional benefits from content reduction
that go beyond Green IM?
⬢ 15+ years of experience in enterprise system architecture,
design, implementation and operations
⬢ Leads the development of technical solutions in support of
wide variety of knowledge and data management solutions
⬢ Principal architect in knowledge graphs, enterprise AI, and
scalable data management systems
⬢ Ph.D in Computer Science, Duke University
Urmi Majumder
Principal Data Architecture Consultant
Fernando Aguilar Islas
Data Science Consultant
⬢ 9+ years of experience serving as data scientist for graph-powered
machine learning and AI-based solutions
⬢ Implemented several knowledge graph-based enterprise data
catalogs
⬢ Experience leading and supporting the integration and
implementation of 20+ data projects
⬢ MS, Applies Statistics, Penn State University
ENTERPRISE KNOWLEDGE
Green Information
Management (IM)
Strategy
Green Information Management:
Putting the “Green” in Information Management (IM)
What is it? Why should enterprises have a
green IM strategy?
Green Information
Management (gIM) is
a strategic approach
focused on optimizing
and minimizing the
environmental
impact of
information-related
processes within an
organization.
Sustainability
Reduce resource
consumption and waste
associated with IM practices.
Cost Efficiency
Reduce energy consumption
through streamlined
processes and optimized
infrastructure.
Compliance
Address regulatory
requirements to demonstrate
adherence to green
standards.
Corporate
Responsibility
Commit to environmental
stewardship.
A supply-chain giant is committed to an
organizational goal of becoming a
Net-Zero Emissions Business.
The organization realized that they have
a huge digital carbon footprint due to
proliferation of duplicate content – over
226K documents occupying ~1 PB of
space – through use of content
management systems and collaboration
software such as SharePoint and
Microsoft Teams that unintentionally
build siloes because of a lack of
visibility/awareness.
Case Study: The Challenge
The Solution
AI-Powered Digital Carbon Footprint Calculator
ORIGINAL STATE THE NEED SOLUTION
● Rules-based non-record
deletion application deleting
forgotten non-sensitive
documents periodically.
● Algorithm could only delete
documents that were not
modified for at least 3 years
and marked as non-records.
But a lack of sharing culture
meant most documents were
unnecessarily marked as
sensitive.
● Need to aggregate tens of
primary sources with slightly
different metadata and access
levels, and yet are duplicate or
near-duplicate content, to build
a content similarity and
resultant carbon footprint
dashboard.
● Need to augment rule-based
approach relying solely on
metadata with AI relying on
content similarity to identify
duplicate and near- duplicate
content.
● Implemented the data
pipelines – and matching
algorithms – to connect data
siloed in different systems.
● Automated duplicate content
identification to give the
organization the ability to drill
down into duplicate
content-related findings across
data sources and improve QA.
● Built a BI dashboard to provide
a clear view into content
duplication and its connection
to CO2 emissions.
The AI Connection:
How to Use AI for
Green IM
ENTERPRISE KNOWLEDGE
Overall Solution Phased Approach
Phase I: Proof of Concept
● Refine use case, prioritize requirements and
define KPIs
● Conduct Exploratory Data Analysis
● Develop Matching Algorithms
● Implement Content Deduplication Data
Pipeline
● Implement CO2 Emissions BI Dashboard
● Track KPIs
Phase II: Productionalization
● Scale Data Pipeline
● Enhance BI Dashboard to make it
actionable
● Integrate Pipeline with Content
Management System and
Collaboration Software
● Develop broader Green IM strategy
Metadata Ingestion
● Data Source Integration
● Metadata Extraction
● Content Extraction
● Content Vectorization
using AI ● Rule-Based Metadata
Similarity Analysis
● Stochastic Content
Similarity Analysis
● Combining Metadata and
Content Findings to identify
duplicates and
near-duplicates
● Duplicate Storage Impact
Analysis
● Resultant CO2 Emissions
Calculation
● BI Dashboard for summary
statistics and drill down by
key metadata fields
Content Deduplication
CO2 Emissions Viewer
End-to-End Process Overview
The AI Connection
ENTERPRISE KNOWLEDGE
Data Ingestion
● Source system identification
● Establishment of data crawlers
that meet system-specific
access requirements
Duplicate Analysis Output
● Combination of rule-based and
stochastic analysis to identify
duplicates
● Resultant Storage Impact
● Resultant CO2 Emissions
Metadata & Content Extraction
● Metadata Extraction from
either source system or
supporting metadata store
● Content extraction based on
file type
Matching Algorithm Execution
● Rules-based duplicate
inferencing on content
metadata
● Stochastic duplicate
inferencing on content
vectors
Metadata Enrichment
● Use of reference data/taxonomy management system
● Content Vectorization via use of Generative AI
Content
Deduplication
Process
Content Deduplication Pipeline
The AI Connection
ENTERPRISE KNOWLEDGE
Content Deduplication Pipeline
Conceptual Architecture
Minimize Data
Movement
Use Transformer
Models for
Vectorization
Run Analysis
Pipelines in Cloud
Infrastructure
⬢ Reduce energy
consumption from
duplicate content
storage and data transfer
⬢ No copies – extract
content from original file
for in-memory
processing
⬢ < 100m parameters (e.g.,
DistilBERT)
⬢ Use less memory and
storage space due to
smaller model size
⬢ Take advantage of
resource efficiency at
scale
⬢ Use under-utilized
regions (e.g., Azure
Norway East region) or
regions powered by 100%
renewable energy (e.g.,
AWS US East 2)
Training OpenAI’s GPT 3.5 requires 1K
GPU processors running in parallel for
weeks at a time
Content Deduplication Pipeline
Green Application Development Considerations
Reconcile high volume of distinct content items to significantly
lower number of unique content items across silo-ed systems
Give users clear view into content duplication and its
connection to CO2 emissions through meaningful dashboard
Establish a plug-n-play architecture for extracting content from
many file types and vectorizing the same using multiple
Generative AI models to best align the content similarity
pipeline to the organizational needs
Benefits of the AI-Powered Digital
Carbon Footprint Calculator
The Power of Data:
Drive Social Changes
Through Data
Size of the Opportunity
An estimated*
50%
of corporate data is
duplicated across the
organization
Real World Example
⬢ An email server contains 100 instances of the same 1
MB file attachment sent to 100 people
⬢ Without content deduplication, if all 100 people backup
their mailboxes, it would consume 100 MB of storage
In the supply-chain
organization, ~226K
documents occupied
~1 PB of storage,
resulting in 228 tonnes
of CO2 emissions.
34
tonnes
CO2
15% content reduction through
duplicate identification
*equivalent to
20 flights
from JFK to
LHR
* https://www.xillio.com/blog/recognize-duplicate-folder-structures-with-xillio-insights
Why is content duplication so
prevalent in the enterprise?
NON-DELIBERATE action on part of the user
● Users forget a document exists and
recreates it
● Users cannot find what they are looking
for and creates it from scratch
● Users save email attachments,
sometimes the same file multiple times
● Users downloads files from the intranet,
sometimes the same file multiple times
DELIBERATE action on part of the user
● Maintain backup copy
● Copy file for easier transfer/distribution
● Use separate files for different document
versions
Defensible Deletion
● Redundant, obsolete, trivial data held on
by users just in case
● Non-record deletion policy in an
organization can save storage space by
deleting documents not marked as
records that have not been modified
for a predefined period
Barriers to Automated
Content Removal
● Content incorrectly marked as records
due to lack of proper compliance
training
● Content marked with higher sensitivity
labels because of knowledge hoarding
culture
● Content duplicated to associate
different access permissions due to
limited cross departmental collaboration
Automated Content Renewal
DATA-DRIVEN USER
BEHAVIOR CHANGE: Goals
“Educate and empower to influence positive behavior change.”
Educate
● Facilitate self-directed and social
learning opportunities for green
information management
Empower
● Facilitate evidence-based decision by offering
easy-access to personal CO2 emissions viewer
● Propel user into action by equipping him with the
right interactive tool to act on the findings in the
flow of work
● Provide the data needed to identify personal
emissions trends and a way to track progress over
time
ENTERPRISE KNOWLEDGE
Pilot CO2 Emissions Viewer: Demo Time!
DATA-DRIVEN USER
BEHAVIOR CHANGE:
Recommended
Actions
“Educate and empower to influence positive behavior change.”
Educate
● Educate users to
use links instead
of attachments
for file sharing
● Educate users on
componentized
content
management
Empower
● Provide accurate data
○ Establish KPIs measuring accuracy of duplicate detection pipeline
● Frame up the data in the context of the bigger picture
○ Enable visualization of immediate CO2 emission reduction as a result of
deduplication
○ Enable visualization of impact of content reduction over time
○ Enable visualization of a personal digital footprint counter for unique content
over time
● Create a “don’t make me think” experience with push-of-a-button actions
available in the end-user application
○ Enable system triggers to remove content through the application interface
The Bigger Impact:
Beyond Green IM
Generative AI (LLMs)
05
● Increase efficiency in RAG applications by removing noise and bias
● Decrease costs associated with vectorizing content
Legal and Regulatory
Compliance
04
● Reduced exposure to copyrights, trademarks, or other intellectual
property rights violations
● Decrease the risk of privacy breaches, as they may contain sensitive
information that can be accessed by unauthorized parties
Content Auditing and
Analysis
03
● Identify redundant or obsolete data
● Surface similar content content with different associated metadata
Cloud Data Migration
02
● Minimizes the volume of data to be transferred, optimizing network
bandwidth and reducing associated costs and energy consumption
● Lower operational costs and environmental impact
Mergers and Acquisitions
01
● Content deduplication streamlines the integration of data from merged
entities, ensuring a more efficient and sustainable data consolidation process
● Lowering infrastructure costs and minimizing environmental impact through
efficient content management practices
Content Deduplication Use Cases
ENTERPRISE KNOWLEDGE
Questions?
Thank you for listening.
We are happy to take any
questions at this time.
Urmi Majumder
umajumder@enterprise-knowledge
.com
www.linkedin.com/in/urmim/
Fernando Aguilar Islas
fislas@enterprise-knowledge.com
www.linkedin.com/in/feraguilaris/

More Related Content

Similar to Driving Behavioral Change for Information Management through Data-Driven Green Strategy (EDW 2024)

Information Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationInformation Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationChristopher Wynder
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDATAVERSITY
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
Enterprise content management (in short)
Enterprise content management  (in short)Enterprise content management  (in short)
Enterprise content management (in short)Anatoliy Arkhipov
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Citadelh2020
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Gayane Sedrakyan
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...DLT Solutions
 
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...Wiiisdom
 
Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentLinked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentMartin Kaltenböck
 
Easy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalEasy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalMartin Kaltenböck
 
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationEnabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationDenodo
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findabilityKristian Norling
 
Monitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIMonitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIChristian Buckley
 

Similar to Driving Behavioral Change for Information Management through Data-Driven Green Strategy (EDW 2024) (20)

Information Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentationInformation Management aaS AIIM First Canadian presentation
Information Management aaS AIIM First Canadian presentation
 
Webinar: Slippery Slope of SharePoint Migrations
Webinar: Slippery Slope of SharePoint Migrations Webinar: Slippery Slope of SharePoint Migrations
Webinar: Slippery Slope of SharePoint Migrations
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the SameDAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
DAS Slides: Cloud-Based Data Warehousing – What’s New and What Stays the Same
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Enterprise content management (in short)
Enterprise content management  (in short)Enterprise content management  (in short)
Enterprise content management (in short)
 
David Reeve - UKAD 2016 forum
David Reeve - UKAD 2016 forumDavid Reeve - UKAD 2016 forum
David Reeve - UKAD 2016 forum
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
 
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
Data Harvesting, Curation and Fusion Model to Support Public Service Recommen...
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
Bringing Strategy to Life: Using an Intelligent Data Platform to Become Data ...
 
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
How the Environmental Protection Agency Maximized its SAP BusinessObjects Inv...
 
Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentLinked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable development
 
Easy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance ProfessionalEasy SPARQLing for the Building Performance Professional
Easy SPARQLing for the Building Performance Professional
 
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data VirtualizationEnabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
Enabling a Bimodal IT Framework for Advanced Analytics with Data Virtualization
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Optimising Your Content for findability
Optimising Your Content for findabilityOptimising Your Content for findability
Optimising Your Content for findability
 
Monitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROIMonitoring and Measuring SharePoint to Guarantee Your ROI
Monitoring and Measuring SharePoint to Guarantee Your ROI
 

More from Enterprise Knowledge

Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceEnterprise Knowledge
 
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding America
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding AmericaNonprofit KM Journey to Success: Lessons and Learnings at Feeding America
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding AmericaEnterprise Knowledge
 
Road to the Taxonomy Rollercoaster
Road to the Taxonomy RollercoasterRoad to the Taxonomy Rollercoaster
Road to the Taxonomy RollercoasterEnterprise Knowledge
 
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...Enterprise Knowledge
 
Scaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIEnterprise Knowledge
 
Making Knowledge Management Clickable
Making Knowledge Management ClickableMaking Knowledge Management Clickable
Making Knowledge Management ClickableEnterprise Knowledge
 
Building for the Knowledge Management Archetypes at Your Company
Building for the Knowledge Management Archetypes at Your CompanyBuilding for the Knowledge Management Archetypes at Your Company
Building for the Knowledge Management Archetypes at Your CompanyEnterprise Knowledge
 
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessKnowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessEnterprise Knowledge
 
Introducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdfIntroducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdfEnterprise Knowledge
 
Road Maps & Roadblocks to Federal Electronic Records Management
Road Maps & Roadblocks to Federal Electronic Records ManagementRoad Maps & Roadblocks to Federal Electronic Records Management
Road Maps & Roadblocks to Federal Electronic Records ManagementEnterprise Knowledge
 
Building an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph TechnologiesBuilding an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph TechnologiesEnterprise Knowledge
 
Identifying Security Risks Using Auto-Tagging and Text Analytics
Identifying Security Risks Using Auto-Tagging and Text AnalyticsIdentifying Security Risks Using Auto-Tagging and Text Analytics
Identifying Security Risks Using Auto-Tagging and Text AnalyticsEnterprise Knowledge
 
Taxonomy in the Age of Personalization
Taxonomy in the Age of PersonalizationTaxonomy in the Age of Personalization
Taxonomy in the Age of PersonalizationEnterprise Knowledge
 
Climbing the Ontology Mountain to Achieve a Successful Knowledge Graph
Climbing the Ontology Mountain to Achieve a Successful Knowledge GraphClimbing the Ontology Mountain to Achieve a Successful Knowledge Graph
Climbing the Ontology Mountain to Achieve a Successful Knowledge GraphEnterprise Knowledge
 
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...Enterprise Knowledge
 
Learning 360: Crafting a Comprehensive View of Learning by Using a Graph
Learning 360: Crafting a Comprehensive View of Learning by Using a GraphLearning 360: Crafting a Comprehensive View of Learning by Using a Graph
Learning 360: Crafting a Comprehensive View of Learning by Using a GraphEnterprise Knowledge
 

More from Enterprise Knowledge (20)

Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial Intelligence
 
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding America
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding AmericaNonprofit KM Journey to Success: Lessons and Learnings at Feeding America
Nonprofit KM Journey to Success: Lessons and Learnings at Feeding America
 
Road to the Taxonomy Rollercoaster
Road to the Taxonomy RollercoasterRoad to the Taxonomy Rollercoaster
Road to the Taxonomy Rollercoaster
 
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
 
Scaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AI
 
Making Knowledge Management Clickable
Making Knowledge Management ClickableMaking Knowledge Management Clickable
Making Knowledge Management Clickable
 
Building for the Knowledge Management Archetypes at Your Company
Building for the Knowledge Management Archetypes at Your CompanyBuilding for the Knowledge Management Archetypes at Your Company
Building for the Knowledge Management Archetypes at Your Company
 
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessKnowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
 
Introducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdfIntroducing the Agile KM Manifesto.pdf
Introducing the Agile KM Manifesto.pdf
 
Road Maps & Roadblocks to Federal Electronic Records Management
Road Maps & Roadblocks to Federal Electronic Records ManagementRoad Maps & Roadblocks to Federal Electronic Records Management
Road Maps & Roadblocks to Federal Electronic Records Management
 
Building an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph TechnologiesBuilding an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph Technologies
 
Identifying Security Risks Using Auto-Tagging and Text Analytics
Identifying Security Risks Using Auto-Tagging and Text AnalyticsIdentifying Security Risks Using Auto-Tagging and Text Analytics
Identifying Security Risks Using Auto-Tagging and Text Analytics
 
Taxonomy in the Age of Personalization
Taxonomy in the Age of PersonalizationTaxonomy in the Age of Personalization
Taxonomy in the Age of Personalization
 
Climbing the Ontology Mountain to Achieve a Successful Knowledge Graph
Climbing the Ontology Mountain to Achieve a Successful Knowledge GraphClimbing the Ontology Mountain to Achieve a Successful Knowledge Graph
Climbing the Ontology Mountain to Achieve a Successful Knowledge Graph
 
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
JPL’s Institutional Knowledge Graph II: A Foundation for Constructing Enterpr...
 
Learning 360: Crafting a Comprehensive View of Learning by Using a Graph
Learning 360: Crafting a Comprehensive View of Learning by Using a GraphLearning 360: Crafting a Comprehensive View of Learning by Using a Graph
Learning 360: Crafting a Comprehensive View of Learning by Using a Graph
 

Recently uploaded

Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligencePrecisely
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 

Recently uploaded (20)

Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 

Driving Behavioral Change for Information Management through Data-Driven Green Strategy (EDW 2024)

  • 1. Driving Behavioral Change for Information Management through Data-Driven Green Strategy A Case Study Urmi Majumder and Fernando Aguilar Islas EDW 2024
  • 2. Topics Covered ⬢ What is a Green Information Management (IM) Strategy, and why should you have one? ⬢ How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? ⬢ How can an organization use insights into their data to influence employee behavior for IM? ⬢ How can you reap additional benefits from content reduction that go beyond Green IM?
  • 3. ⬢ 15+ years of experience in enterprise system architecture, design, implementation and operations ⬢ Leads the development of technical solutions in support of wide variety of knowledge and data management solutions ⬢ Principal architect in knowledge graphs, enterprise AI, and scalable data management systems ⬢ Ph.D in Computer Science, Duke University Urmi Majumder Principal Data Architecture Consultant Fernando Aguilar Islas Data Science Consultant ⬢ 9+ years of experience serving as data scientist for graph-powered machine learning and AI-based solutions ⬢ Implemented several knowledge graph-based enterprise data catalogs ⬢ Experience leading and supporting the integration and implementation of 20+ data projects ⬢ MS, Applies Statistics, Penn State University ENTERPRISE KNOWLEDGE
  • 5. Green Information Management: Putting the “Green” in Information Management (IM) What is it? Why should enterprises have a green IM strategy? Green Information Management (gIM) is a strategic approach focused on optimizing and minimizing the environmental impact of information-related processes within an organization. Sustainability Reduce resource consumption and waste associated with IM practices. Cost Efficiency Reduce energy consumption through streamlined processes and optimized infrastructure. Compliance Address regulatory requirements to demonstrate adherence to green standards. Corporate Responsibility Commit to environmental stewardship.
  • 6. A supply-chain giant is committed to an organizational goal of becoming a Net-Zero Emissions Business. The organization realized that they have a huge digital carbon footprint due to proliferation of duplicate content – over 226K documents occupying ~1 PB of space – through use of content management systems and collaboration software such as SharePoint and Microsoft Teams that unintentionally build siloes because of a lack of visibility/awareness. Case Study: The Challenge
  • 7. The Solution AI-Powered Digital Carbon Footprint Calculator ORIGINAL STATE THE NEED SOLUTION ● Rules-based non-record deletion application deleting forgotten non-sensitive documents periodically. ● Algorithm could only delete documents that were not modified for at least 3 years and marked as non-records. But a lack of sharing culture meant most documents were unnecessarily marked as sensitive. ● Need to aggregate tens of primary sources with slightly different metadata and access levels, and yet are duplicate or near-duplicate content, to build a content similarity and resultant carbon footprint dashboard. ● Need to augment rule-based approach relying solely on metadata with AI relying on content similarity to identify duplicate and near- duplicate content. ● Implemented the data pipelines – and matching algorithms – to connect data siloed in different systems. ● Automated duplicate content identification to give the organization the ability to drill down into duplicate content-related findings across data sources and improve QA. ● Built a BI dashboard to provide a clear view into content duplication and its connection to CO2 emissions.
  • 8. The AI Connection: How to Use AI for Green IM
  • 9. ENTERPRISE KNOWLEDGE Overall Solution Phased Approach Phase I: Proof of Concept ● Refine use case, prioritize requirements and define KPIs ● Conduct Exploratory Data Analysis ● Develop Matching Algorithms ● Implement Content Deduplication Data Pipeline ● Implement CO2 Emissions BI Dashboard ● Track KPIs Phase II: Productionalization ● Scale Data Pipeline ● Enhance BI Dashboard to make it actionable ● Integrate Pipeline with Content Management System and Collaboration Software ● Develop broader Green IM strategy
  • 10. Metadata Ingestion ● Data Source Integration ● Metadata Extraction ● Content Extraction ● Content Vectorization using AI ● Rule-Based Metadata Similarity Analysis ● Stochastic Content Similarity Analysis ● Combining Metadata and Content Findings to identify duplicates and near-duplicates ● Duplicate Storage Impact Analysis ● Resultant CO2 Emissions Calculation ● BI Dashboard for summary statistics and drill down by key metadata fields Content Deduplication CO2 Emissions Viewer End-to-End Process Overview The AI Connection
  • 11. ENTERPRISE KNOWLEDGE Data Ingestion ● Source system identification ● Establishment of data crawlers that meet system-specific access requirements Duplicate Analysis Output ● Combination of rule-based and stochastic analysis to identify duplicates ● Resultant Storage Impact ● Resultant CO2 Emissions Metadata & Content Extraction ● Metadata Extraction from either source system or supporting metadata store ● Content extraction based on file type Matching Algorithm Execution ● Rules-based duplicate inferencing on content metadata ● Stochastic duplicate inferencing on content vectors Metadata Enrichment ● Use of reference data/taxonomy management system ● Content Vectorization via use of Generative AI Content Deduplication Process Content Deduplication Pipeline The AI Connection
  • 12. ENTERPRISE KNOWLEDGE Content Deduplication Pipeline Conceptual Architecture
  • 13. Minimize Data Movement Use Transformer Models for Vectorization Run Analysis Pipelines in Cloud Infrastructure ⬢ Reduce energy consumption from duplicate content storage and data transfer ⬢ No copies – extract content from original file for in-memory processing ⬢ < 100m parameters (e.g., DistilBERT) ⬢ Use less memory and storage space due to smaller model size ⬢ Take advantage of resource efficiency at scale ⬢ Use under-utilized regions (e.g., Azure Norway East region) or regions powered by 100% renewable energy (e.g., AWS US East 2) Training OpenAI’s GPT 3.5 requires 1K GPU processors running in parallel for weeks at a time Content Deduplication Pipeline Green Application Development Considerations
  • 14. Reconcile high volume of distinct content items to significantly lower number of unique content items across silo-ed systems Give users clear view into content duplication and its connection to CO2 emissions through meaningful dashboard Establish a plug-n-play architecture for extracting content from many file types and vectorizing the same using multiple Generative AI models to best align the content similarity pipeline to the organizational needs Benefits of the AI-Powered Digital Carbon Footprint Calculator
  • 15. The Power of Data: Drive Social Changes Through Data
  • 16. Size of the Opportunity An estimated* 50% of corporate data is duplicated across the organization Real World Example ⬢ An email server contains 100 instances of the same 1 MB file attachment sent to 100 people ⬢ Without content deduplication, if all 100 people backup their mailboxes, it would consume 100 MB of storage In the supply-chain organization, ~226K documents occupied ~1 PB of storage, resulting in 228 tonnes of CO2 emissions. 34 tonnes CO2 15% content reduction through duplicate identification *equivalent to 20 flights from JFK to LHR * https://www.xillio.com/blog/recognize-duplicate-folder-structures-with-xillio-insights
  • 17. Why is content duplication so prevalent in the enterprise? NON-DELIBERATE action on part of the user ● Users forget a document exists and recreates it ● Users cannot find what they are looking for and creates it from scratch ● Users save email attachments, sometimes the same file multiple times ● Users downloads files from the intranet, sometimes the same file multiple times DELIBERATE action on part of the user ● Maintain backup copy ● Copy file for easier transfer/distribution ● Use separate files for different document versions
  • 18. Defensible Deletion ● Redundant, obsolete, trivial data held on by users just in case ● Non-record deletion policy in an organization can save storage space by deleting documents not marked as records that have not been modified for a predefined period Barriers to Automated Content Removal ● Content incorrectly marked as records due to lack of proper compliance training ● Content marked with higher sensitivity labels because of knowledge hoarding culture ● Content duplicated to associate different access permissions due to limited cross departmental collaboration Automated Content Renewal
  • 19. DATA-DRIVEN USER BEHAVIOR CHANGE: Goals “Educate and empower to influence positive behavior change.” Educate ● Facilitate self-directed and social learning opportunities for green information management Empower ● Facilitate evidence-based decision by offering easy-access to personal CO2 emissions viewer ● Propel user into action by equipping him with the right interactive tool to act on the findings in the flow of work ● Provide the data needed to identify personal emissions trends and a way to track progress over time
  • 20. ENTERPRISE KNOWLEDGE Pilot CO2 Emissions Viewer: Demo Time!
  • 21. DATA-DRIVEN USER BEHAVIOR CHANGE: Recommended Actions “Educate and empower to influence positive behavior change.” Educate ● Educate users to use links instead of attachments for file sharing ● Educate users on componentized content management Empower ● Provide accurate data ○ Establish KPIs measuring accuracy of duplicate detection pipeline ● Frame up the data in the context of the bigger picture ○ Enable visualization of immediate CO2 emission reduction as a result of deduplication ○ Enable visualization of impact of content reduction over time ○ Enable visualization of a personal digital footprint counter for unique content over time ● Create a “don’t make me think” experience with push-of-a-button actions available in the end-user application ○ Enable system triggers to remove content through the application interface
  • 23. Generative AI (LLMs) 05 ● Increase efficiency in RAG applications by removing noise and bias ● Decrease costs associated with vectorizing content Legal and Regulatory Compliance 04 ● Reduced exposure to copyrights, trademarks, or other intellectual property rights violations ● Decrease the risk of privacy breaches, as they may contain sensitive information that can be accessed by unauthorized parties Content Auditing and Analysis 03 ● Identify redundant or obsolete data ● Surface similar content content with different associated metadata Cloud Data Migration 02 ● Minimizes the volume of data to be transferred, optimizing network bandwidth and reducing associated costs and energy consumption ● Lower operational costs and environmental impact Mergers and Acquisitions 01 ● Content deduplication streamlines the integration of data from merged entities, ensuring a more efficient and sustainable data consolidation process ● Lowering infrastructure costs and minimizing environmental impact through efficient content management practices Content Deduplication Use Cases
  • 24. ENTERPRISE KNOWLEDGE Questions? Thank you for listening. We are happy to take any questions at this time. Urmi Majumder umajumder@enterprise-knowledge .com www.linkedin.com/in/urmim/ Fernando Aguilar Islas fislas@enterprise-knowledge.com www.linkedin.com/in/feraguilaris/