SlideShare a Scribd company logo
1 of 3
Download to read offline
Correlation Technology Solutions
Compared to
“Massive Semantic Infrastructure” Solutions
When enterprise or government face difficult, critical problems – intractable problems
that simply must be dealt with – the solutions that are constructed in response to these
problems are often “non-optimal”. Typically, the solutions are expensive, and stunningly
complex. Also typically, these solutions do not perform very well – despite their expense
and complexity. These attributes – expense, complexity, poor outcomes – are especially
prevalent in those cases where computer software is the basis for such solutions. These
“non-optimal” software solutions can be found in many of the 1200 vertical market
sectors for enterprise (identified by NAICS[2007]), and in every sphere of government
operations. Research from Make Sence Florida, Inc. has shown that non-optimal
software solutions are likely to use one or more of three approaches:
“Massive Semantic Infrastructure Solutions” – Systems that require large natural
language databases, ontologies, taxonomies, and concept repositories, and utilize tagging,
threading, entity recognition, and other similar corpus analysis techniques in preparation
for answering user queries.
“Subjective Statistical Model Solutions” – Systems that rely upon statistical models
influenced by subjective human judgments in establishing base or conditional
probabilities of events or outcomes – particularly those which purport to capture *all*
possible events in a complex real-world domain. Such systems typically utilize Bayesian
statistical techniques and include Neural Networks.
“Brute Force Computing Solutions” – Systems which achieve results from the power of
modern day computers to perform a relatively simple process at high speed against large
volumes of data. Keyword searches are a typical example.
The purpose of this document is limited to the examination of how “Massive Semantic
Infrastructure Solutions” differ from Correlation Technology Solutions. A large well
known enterprise software company which is referred to below as “Company A” and that
company’s primary product, which we call “M-Technology” is the example used in this
discussion.
Company A is in fact our "poster-child" for what we call "non-optimal, massive semantic
infrastructure solutions". We like to begin with the practical issues, because the practical
aspects of a Company A solution illustrate perfectly why Company A's "M-Technology"
compares so poorly to Correlation Technology.

1
We often like to recount this true story. At a NYC Search Engine Expo at which we
presented in 2008, a senior staff member of a “Major US Government Financial
Institution” stopped by our booth and, after listening to our explanation of Correlation
Technology, started to complain about Company A - which his organization had
purchased. He said, "for Company A to find a 21-word email I sent (in the past), I had to
remember and enter into the search interface 20 of the words."
Here's why this happens. Before the Company A system can answer a single question, an
enormous set of massive Natural Language databases must be installed and verified.
Then, equally massive dictionaries, thesauri, "concept" repositories, ontologies, lexicons,
and other semantic infrastructure components must be installed and linked. Then, the
corpus (all of the documents) is subjected to indexing, threading, entity recognition, and
other "associative" and "tagging" processes. These require days or weeks of dedicated
server time and huge amounts of memory and data storage. Finally, the system is ready
to do some work. But despite all of this effort, complexity and expense, the Company A
system appears "stupid".
The Company A system appears "stupid" because Company A software is based entirely
on an externally imposed "formal" construct of human language. The "meaning" part of
"M-technology" in fact is constrained to those standard meanings and uses of words
consistent with established academic models. Words are fixed in their allowed use as
only specific parts of speech. The word proximities examined in texts are disregarded if
they do not meet pre-set statistical thresholds of confidence. Syntactically modeled
sentence decomposition is rigidly adhered to, and indexing schemes for "organic"
keyword search are not much improved from their original implementations in the 1990's.
All of these formalisms are observed despite the fact that human expression is riotously,
deliriously, chaotic and adaptive on a moment by moment basis. Writers of even the
shortest communication incorporate cultural memes that no dictionary, no ontology, no
concept map, no semantic infrastructure component could keep current or sort out.
Humans create and utilize idiomatic, vernacular, and colloquial terms and uses for terms
with astounding rapidity and ease, and with astounding confidence in the belief that such
terms and every nuance of meaning carried by such terms will be perfectly understood
and appreciated by the recipients of their expression (and they usually are). Trouble is,
Company A (and its peers) can not make sense of anything not hardwired into the
software's semantic components.
While it is certainly true that a corpus of only very formal documents – such as
government reports, academic papers, and so on - will with the proper lexicons be well
served by a Company A type approach, and while it is also true that Company A has
obliged to provide facilities to users to "make their own lexicons" and to "define their
own concepts" (so, with massive and amazingly time consuming and costly
customization Company A’s product will work better), the fact remains that wherever
human expression and comprehension is informal (such as the majority of email in an
enterprise, human speech captured from transcripts, almost all the other categories of text

2
produced by amateur and professional writers for any purpose), Company A subjects its
users to the possibility for the type of frustrations described above.
If the original text doesn't contain text which conforms to or is confined to the formal
parameters of the academic models used, Company A can often have a lot of trouble in
locating that text. In the last resort, a super-majority of "word matches" was required by
Company A to find the employee's email, because all the "M-technology" was worthless.
The same result could have been achieved with a universally available - and free - Unix
text search utility.
Correlation Technology, in contrast, "permits" a far more "relaxed" and "natural" model
of human language. Our one way, exhaustive transform of data into Knowledge
Fragments (which we call "Acquisition") captures all the significant relations between
words - as they are actually expressed in the text. Unlike "M-technology", Correlation
Technology does not coerce the text into conformity with a set of formalisms or analyze
the text using such formalisms. We "allow" every nuance to be captured without concern
that some artificial rule is observed.
The Correlation process discovers knowledge from the corpus by constructing chains of
iteratively associated Knowledge Fragments, and then analyzing the "Answer Space"
(like the “result set” for RDBMS/SQL) of Correlations. Associations between words can
be as formal or informal as desired or required for the application. We provide in the
Correlation Technology Platform the ability to "dial in" more than 20 differing levels of
"fuzzy association" that actually capture - without imposing any rules which prevent the
discovery of knowledge - all the types of formalisms "understood" by "M-technology".
Further, any additional "reference" preferred for associating words can be "plugged in".
By means of Correlation, knowledge is "emergent", meaning that the analysis of the
Answer Space (a process we call "Refinement") will reveal the desired solutions - if they
exist in the corpus. When the task is Enterprise Search, our Acquisition, Correlation and
Refinement functions will reveal those emails, memos, or documents that the user wants.
Correlation Technology solutions are possible for every product offered by Company A.
In each of these solutions, we believe the Correlation Technology approach will prove far
more effective, far more flexible, and far more straightforward in implementation. While
Correlation Technology solutions can be large scale, every Company A implementation
dwarfs Correlation Technology implementations for an equivalent corpus. While the
complexity of the Correlation Technology solution is obvious, that complexity does not
flow from the hopeless attempt to capture in stone the torrent of human expression and
comprehension, and in fact, Correlation Technology is intrinsically "simple".

For Business Inquiries:
Contact: Carl Wimmer
carl@makesence.us
Mobile: (702) 767-7001

For Technical Inquiries:
Contact: Mark Bobick
m.bobick@correlationconcepts.com
Mobile: (702) 882-5664

3

More Related Content

Recently uploaded

COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 

Recently uploaded (20)

COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Essay: Comparison of Semantic Solutions to Correlation Technology Solutions

  • 1. Correlation Technology Solutions Compared to “Massive Semantic Infrastructure” Solutions When enterprise or government face difficult, critical problems – intractable problems that simply must be dealt with – the solutions that are constructed in response to these problems are often “non-optimal”. Typically, the solutions are expensive, and stunningly complex. Also typically, these solutions do not perform very well – despite their expense and complexity. These attributes – expense, complexity, poor outcomes – are especially prevalent in those cases where computer software is the basis for such solutions. These “non-optimal” software solutions can be found in many of the 1200 vertical market sectors for enterprise (identified by NAICS[2007]), and in every sphere of government operations. Research from Make Sence Florida, Inc. has shown that non-optimal software solutions are likely to use one or more of three approaches: “Massive Semantic Infrastructure Solutions” – Systems that require large natural language databases, ontologies, taxonomies, and concept repositories, and utilize tagging, threading, entity recognition, and other similar corpus analysis techniques in preparation for answering user queries. “Subjective Statistical Model Solutions” – Systems that rely upon statistical models influenced by subjective human judgments in establishing base or conditional probabilities of events or outcomes – particularly those which purport to capture *all* possible events in a complex real-world domain. Such systems typically utilize Bayesian statistical techniques and include Neural Networks. “Brute Force Computing Solutions” – Systems which achieve results from the power of modern day computers to perform a relatively simple process at high speed against large volumes of data. Keyword searches are a typical example. The purpose of this document is limited to the examination of how “Massive Semantic Infrastructure Solutions” differ from Correlation Technology Solutions. A large well known enterprise software company which is referred to below as “Company A” and that company’s primary product, which we call “M-Technology” is the example used in this discussion. Company A is in fact our "poster-child" for what we call "non-optimal, massive semantic infrastructure solutions". We like to begin with the practical issues, because the practical aspects of a Company A solution illustrate perfectly why Company A's "M-Technology" compares so poorly to Correlation Technology. 1
  • 2. We often like to recount this true story. At a NYC Search Engine Expo at which we presented in 2008, a senior staff member of a “Major US Government Financial Institution” stopped by our booth and, after listening to our explanation of Correlation Technology, started to complain about Company A - which his organization had purchased. He said, "for Company A to find a 21-word email I sent (in the past), I had to remember and enter into the search interface 20 of the words." Here's why this happens. Before the Company A system can answer a single question, an enormous set of massive Natural Language databases must be installed and verified. Then, equally massive dictionaries, thesauri, "concept" repositories, ontologies, lexicons, and other semantic infrastructure components must be installed and linked. Then, the corpus (all of the documents) is subjected to indexing, threading, entity recognition, and other "associative" and "tagging" processes. These require days or weeks of dedicated server time and huge amounts of memory and data storage. Finally, the system is ready to do some work. But despite all of this effort, complexity and expense, the Company A system appears "stupid". The Company A system appears "stupid" because Company A software is based entirely on an externally imposed "formal" construct of human language. The "meaning" part of "M-technology" in fact is constrained to those standard meanings and uses of words consistent with established academic models. Words are fixed in their allowed use as only specific parts of speech. The word proximities examined in texts are disregarded if they do not meet pre-set statistical thresholds of confidence. Syntactically modeled sentence decomposition is rigidly adhered to, and indexing schemes for "organic" keyword search are not much improved from their original implementations in the 1990's. All of these formalisms are observed despite the fact that human expression is riotously, deliriously, chaotic and adaptive on a moment by moment basis. Writers of even the shortest communication incorporate cultural memes that no dictionary, no ontology, no concept map, no semantic infrastructure component could keep current or sort out. Humans create and utilize idiomatic, vernacular, and colloquial terms and uses for terms with astounding rapidity and ease, and with astounding confidence in the belief that such terms and every nuance of meaning carried by such terms will be perfectly understood and appreciated by the recipients of their expression (and they usually are). Trouble is, Company A (and its peers) can not make sense of anything not hardwired into the software's semantic components. While it is certainly true that a corpus of only very formal documents – such as government reports, academic papers, and so on - will with the proper lexicons be well served by a Company A type approach, and while it is also true that Company A has obliged to provide facilities to users to "make their own lexicons" and to "define their own concepts" (so, with massive and amazingly time consuming and costly customization Company A’s product will work better), the fact remains that wherever human expression and comprehension is informal (such as the majority of email in an enterprise, human speech captured from transcripts, almost all the other categories of text 2
  • 3. produced by amateur and professional writers for any purpose), Company A subjects its users to the possibility for the type of frustrations described above. If the original text doesn't contain text which conforms to or is confined to the formal parameters of the academic models used, Company A can often have a lot of trouble in locating that text. In the last resort, a super-majority of "word matches" was required by Company A to find the employee's email, because all the "M-technology" was worthless. The same result could have been achieved with a universally available - and free - Unix text search utility. Correlation Technology, in contrast, "permits" a far more "relaxed" and "natural" model of human language. Our one way, exhaustive transform of data into Knowledge Fragments (which we call "Acquisition") captures all the significant relations between words - as they are actually expressed in the text. Unlike "M-technology", Correlation Technology does not coerce the text into conformity with a set of formalisms or analyze the text using such formalisms. We "allow" every nuance to be captured without concern that some artificial rule is observed. The Correlation process discovers knowledge from the corpus by constructing chains of iteratively associated Knowledge Fragments, and then analyzing the "Answer Space" (like the “result set” for RDBMS/SQL) of Correlations. Associations between words can be as formal or informal as desired or required for the application. We provide in the Correlation Technology Platform the ability to "dial in" more than 20 differing levels of "fuzzy association" that actually capture - without imposing any rules which prevent the discovery of knowledge - all the types of formalisms "understood" by "M-technology". Further, any additional "reference" preferred for associating words can be "plugged in". By means of Correlation, knowledge is "emergent", meaning that the analysis of the Answer Space (a process we call "Refinement") will reveal the desired solutions - if they exist in the corpus. When the task is Enterprise Search, our Acquisition, Correlation and Refinement functions will reveal those emails, memos, or documents that the user wants. Correlation Technology solutions are possible for every product offered by Company A. In each of these solutions, we believe the Correlation Technology approach will prove far more effective, far more flexible, and far more straightforward in implementation. While Correlation Technology solutions can be large scale, every Company A implementation dwarfs Correlation Technology implementations for an equivalent corpus. While the complexity of the Correlation Technology solution is obvious, that complexity does not flow from the hopeless attempt to capture in stone the torrent of human expression and comprehension, and in fact, Correlation Technology is intrinsically "simple". For Business Inquiries: Contact: Carl Wimmer carl@makesence.us Mobile: (702) 767-7001 For Technical Inquiries: Contact: Mark Bobick m.bobick@correlationconcepts.com Mobile: (702) 882-5664 3