SlideShare a Scribd company logo
1 of 61
Download to read offline
MAKING AI BEHAVE:
Using Knowledge Domains to
Produce Useful, Trustworthy Results
Marjorie M.K. Hlava
Chief Scientist
Access Innovations, Inc.
mhlava@accessinn.com
Abstract
In today's highly charged atmosphere of anxiety and anticipation about AI, and especially LLMs,
one of the biggest concerns is how to ensure that it returns accurate results (meaning both true
and pertinent to its audience). This is particularly important to scholarly, scientific, and other
technical organizations, whose constituents are often in very specific domains, such as
medicine, engineering, history, biology, chemistry, etc. One extremely useful tool to incorporate in
an AI-based process in such cases is a comprehensive and well-structured knowledge domain
which is based on a controlled vocabulary.
The next Access Innovations webinar, coming up at noon Eastern on Tuesday, March 26, is
"MAKING AI BEHAVE: Using Knowledge Domains to Produce Useful, Trustworthy Results." It's
based on the extensive experience and history Access Innovations has in the development and
implementation of domain-specific thesauri, taxonomies, ontologies, and knowledge graphs, and
their use of them with AI. They have over 70 knowledge domains covered, which they employ in
sophisticated search, auto-tagging, and AI-based solutions for their clients. These are all
available for immediate deployment, so you don't have to start from scratch to develop the ability
to accurately tag your content to ensure proper and effective use by AI tools and systems.
Google bans AI chatbot Gemini from
answering election questions: ‘Try Google
Search’ By Reuters Published March 12,
2024, 12:03 p.m. ET
Microsoft AI Research Introduces
Generalized Instruction Tuning (called
GLAN): A General and Scalable
ArtiEcial Intelligence Method for
Instruction Tuning of Large Language
Models (LLMs) By Tanya Malhotra March 2, 2024
News Corp in ‘advanced’ talks with AI
firms on deals to license content, CEO says
By Social Links forThomas Barrabi
Published Feb. 8, 2024, 2:17 p.m. ET
Synthetic Data (Almost) from Scratch:
Generalized Instruction Tuning for
Language Models
[Submitted on 20 Feb 2024] https://arxiv.org/abs/2402.13064
Haoran Li, Qingxiu Dong, Zhengyang Tang, Chaojun
Wang, Xingxing Zhang, Haoyang Huang, Shaohan
Huang, Xiaolong Huang, Zeqiang Huang, Dongdong Zhang, Yuxian
Gu, Xin Cheng, Xun Wang, Si-Qing Chen, Li Dong, Wei Lu, Zhifang
Sui, Benyou Wang, Wai Lam, Furu Wei
Daily Deluge
Google AI Introduces Croissant: A
Metadata Format for Machine
Learning-Ready Datasets
By Dhanshree Shripad Shenwai -
March 12, 2024
Marjorie M.K. Hlava
• Expert in taxonomies, metadata, their application and data science.
• Her groundbreaking work has earned her numerous awards and 2 patents
with 21 claims granted
• Margie standards work includes
• Dublin Core Z39.85
• DOI Syntax Z39.84
• CrEdit Z39.104
• Thesaurus ANSI/NISO Z39.19 Thesauri and other controlled vocabularies
• many others
• Convener of the ISO - 25964 the International Standard on Controlled
Vocabularies
• Founder, Chairman, Chief Scientist of Access Innovations, Inc.
”large language models will not only mirror but magnify any problems with
the data sets, problems that many organizations may not realize they have."
Amplifying hidden biases and gaps seems like a real danger
What we will cover today
• Definitions
• Getting us to speak the same language
• Quick review of options
• Why Taxonomies with LLM’s?
• Where do they fit?
• What are some available Knowledge Domains?
• Two Approaches
• Summary
What we will NOT cover
• Big topic
• Video
• Politics / Elections
• Recent sensations
• All the tool sets
• Regulatory actions
• Programming aspects
• Business cases
https://www.nature.com/articles/d41586-024-00661-
0?utm_source=Live+Audience&utm_campaign=adeec3770a-briefing-dy-
20240313&utm_medium=email&utm_term=0_b27a691814-adeec3770a-51734080
National Information Standards Organization
– Controlled vocabulary
• "a carefully selected list of words and phrases, which are used to tag
units of information (document, images, videos, etc.) in order to
describe their content. This list is carefully selected and managed by
experts in a particular subject domain or field.”
• Ensure consistency and precision in indexing and retrieval
• Domain specific
NISO – Thesaurus
• "a controlled and structured vocabulary in which concepts are
represented by terms, organized so that relationships between
concepts are made explicit, and and preferred terms are
accompanied by lead-in entries for synonyms or quasi-synonyms.”
• Hierarchical, Equivalence (Synonyms) and Associative (Related)
• Structured
• Organizes concepts
• Facilitates access to information
• Provides standardized terminology and relationships between terms
(concepts), synonyms, or quasi-synonyms
Thesaurus with term records
NISO - Taxonomy
• "a structured, hierarchical representation
of concepts or terms within a specific
domain, organized to show relationships
between concepts or terms.”
• Formal frameworks for representing and
organizing knowledge
• Just the hierarchy
• But now often used interchangeably with
thesaurus and ontology
Image + https://www.thoughtworks.com/insights/blog/data-science-ontology
Radial graph
and
Hierarchical
display
Both are
taxonomy
displays
https://www.hedden-information.com/taxonomies-vs-ontologies/
What are the steps to implement
taxonomy in generative AI? 1 of 2
• Define the Taxonomy Structure:
• Identify the key concepts, categories, and relationships relevant to the domain or problem the generative AI
system will address.
• Design a hierarchical taxonomy structure that organizes these concepts into categories and subcategories.
• Define relationships between categories to capture semantic connections.
• Collect and Preprocess Data:
• Gather a corpus of text data relevant to the domain or problem. This could include documents, articles, or
any other textual resources.
• Preprocess the text data to clean it, remove noise, and standardize formatting. This may involve tasks like
tokenization, stemming, and removing stop words.
• Annotate Data with Taxonomy Labels:
• Manually or semi-automatically annotate the text data with labels corresponding to the taxonomy
categories. This step involves mapping text excerpts or documents to the appropriate categories in the
taxonomy.
What are the steps to implement
taxonomy in generative AI? 2 of 2
• Train the Generative AI Model:
• Choose or develop a generative AI model suitable for the task at hand, such as a language model based on transformers
architecture (e.g., GPT).
• Prepare the annotated data for training, ensuring that each input is associated with its corresponding taxonomy labels.
• Train the generative AI model on the annotated data, incorporating the taxonomy labels as part of the training process. This
allows the model to learn the relationships between textual inputs and taxonomy categories.
• Incorporate Taxonomy into Model Inference:
• After training, integrate the taxonomy structure into the generative AI model's inference process.
• When generating text or responses, use the taxonomy to guide the model's outputs. For example, you can constrain the
generation process to ensure that the generated text aligns with the taxonomy categories.
• Evaluate and Iterate:
• Evaluate the performance of the generative AI system using metrics relevant to the task, such as accuracy, coherence, and
relevance.
• Collect feedback from users or domain experts to identify areas for improvement.
• Iterate on the model and taxonomy design based on the evaluation results and feedback, making adjustments as
necessary to enhance performance.
• Deploy and Monitor:
• Deploy the generative AI system with taxonomy support in a production environment or as part of an application.
• Monitor the system's performance and user interactions, gathering data for further refinement and optimization.
• Collaboration between domain experts, data scientists, and AI engineers is crucial for the success
Knowledge Domain
• Refers to a specific area or field of knowledge
• subject matter, concepts, theories, methodologies, and practices
• Cohesive and organized body of knowledge with a scope and
boundaries
• Vary widely in size and complexity
• Covid 57 terms
• JSTOR 57,000 terms
• Library of Congress – 208,000 terms
• quantum mechanics or medieval literature
• Established disciplines or sub-disciplines
• Each with theories, methods, and research traditions
• Frameworks for understanding and investigating phenomena within
specific areas
• Represent scholars, researchers, practitioners, SME’s contributing to
knowledge within those domains
Available =
already built
• Government resources
• Most agencies
• May need formatting
• NASA, DTIC, DOE, NAL, EPA,
NLM etc
• Sign up for updates
• License-able
• TaxoBank
• Access Innovations
• Others
Knowledge Domains
• Taxonomies, thesauri, or authority files
• Pre-Built
• Knowledge Domains
• full term records
• hierarchical, equivalence, and associative relationships, as well as scope notes
where appropriate.
• hierarchy only.
• NISO Z39.19 and ISO 25964 standards compliant
• Formats,
• 22 options
• Excel/CSV
• SKOS-2
• Etc.
Applied Science
Art
Behavioral Science
Biological Science
Business
Chemical – MAI Chem
Communications
Computer Science
COVID
Economics
Educational Curriculum
Geography
Health and Safety
Health Science
History
Information Science
Language Arts
Law
Linguistics
Literature and Drama
Mathematics
NewsThes
Nursing
Philosophy
Physical Education and Recreation
Physical Sciences
Political Science
Psychology
Religion
Science
Social Sciences
General Purpose Taxonomies
These products can be SKOS downloads
Astronomy
Clinical Drugs
DTIC – Defense Technical Information Center
Environment – GEMET
ERIC – Education Resource Information
Center
JSTOR
NASA
National Agricultural Library
Occupational Safety and Health
PLOS
CPT – Current Procedural Terminology
HCPCS – Healthcare Common Procedure
Coding System
ICD11 – International Classification of
Diseases
Kew Medicinal Plant Names (MPNS)
MeSH – Medical Subject Headings
Suspect Cell Lines
Taxogene – the Human Geonome
These products are
available as SaaS
Knowledge Graph
• A knowledge graph
• structured representation of knowledge
• captures relationships between entities or concepts in a specific domain
• Nodes represent entities or concepts
• Edges represent relationships between these entities
• Using semantic technologies and linked data principles
• Integrate information from multiple sources
• Supports inference of new knowledge based on existing connections
• Enable context-aware information
• data and its relationships
• Gives precise querying and analysis
• Supports discovery of implicit connections and patterns within the data
• For organizing, navigating, and leveraging large volumes of interconnected
data
• Facilitate extraction of insights and the generation of new knowledge
Image = https://ahrefs.com/blog/google-knowledge-graph/
Knowledge Graphs
Does a knowledge graph need a
controlled vocabulary?
• Consistency
• Interoperability
• Facilitates Search and Discovery
• Semantic Enrichment
• Domain Understanding
Knowledge Graphs with Generative AI
• Contextual Understanding:
• Provide a structured representation of relationships between entities and concepts
• Generates more relevant and contextually appropriate responses
• Content Generation:
• Source of structured data and information for generative AI systems
• Use in training process, to learn from the structured relationships encoded in the graph to generate more coherent
and accurate outputs
• Ensure that the generated documents adhere to domains principles and conventions
• Entity Linking and Disambiguation:
• Identify and disambiguate entities mentioned in text
• Let's AI models accurately link mentions of entities to their corresponding entries in the graph,
• Reduces ambiguity
• Improves the quality of generated outputs
• Personalization and Customization:
• Customize to specific domains or use cases
• Generate personalized outputs for use needs and preferences
• By Provides more relevant and useful content
How to Link Knowledge Graph to Generative AI – 1 of 2
• Define Knowledge Graph Schema:
• Identify the entities, relationships, and properties
• Design a schema for both structure and semantics domain
• Acquire and Process Data:
• Gather data sources
• Preprocess the data to extract entities, relationships, and properties
• Convert them into a format suitable for loading into the knowledge graph.
• Build and Populate the Knowledge Graph:
• Use a graph database to create and populate the knowledge graph.
• Load the processed data into the knowledge graph,
• Ensure that entities are represented as nodes, relationships as edges, and
properties as attributes.
How to Link Knowledge Graph to Generative Ai – 2 of 2
• Integrate Knowledge Graph with Generative AI:
• Querying the knowledge graph for relevant information
• Incorporating it as input during the model training process.
• Ensure the model can access and utilize the structured knowledge represented
in the graph.
• Training and Fine-Tuning:
• Train or fine-tune the generative AI model using the knowledge graph-enhanced
data.
• Supervised learning with labeled data
• Unsupervised learning to discover patterns and relationships within the data.
• Generate Outputs and Evaluate:
• Use generative AI system to generate outputs based on user queries or input
data.
• Evaluate the quality, relevance, and coherence of the generated outputs,
• Iterate and Refine:
• Iterate on the implementation, incorporating feedback and making
improvements
• Continuously refine the knowledge graph and generative AI model
• User interactions, new data, and evolving requirements.
Ontology – 1 of 2
• “a formal and explicit specification of a conceptualization defines the
terms, concepts, and relationships within a particular domain of
knowledge."
• Meaningless jumble
• Formal frameworks for representing and organizing knowledge
• Terms or Concepts:
• Represent the entities, classes, or categories within the domain of interest
• Each term (concept or entity) is defined with a precise meaning
• Relationships:
• Define the connections between terms in the ontology
• Relationships can represent various types of connections such as hierarchical (subclass),
part-whole, or associative relationships
• Axioms or Constraints:
• Rules or constraints for properties (behavior) of the terms and relationships in the ontology
• Axioms help ensure the consistency and coherence of the ontology
Ontology – 2 of 2
• Often uses RDF = Resource Description Framework (defines
itself by reference or inclusion)
• One thing (the subject a.k.a. resource) has a relationship (the
predicate a.k.a. edge) with another thing (the object a.k.a.
resource)
• Thing (a resource) and each edge is a given relationship
(either reports to or works for), which is known as a predicate
• Machine-readable
• Facilitates interoperability, reasoning, and semantic
understanding across different systems and applications
• Connect things not strings
Tagging /Indexing
• The process of associating metadata or
descriptive keywords with digital content
• All content types
• Text based
• Identify Things
• People, places, objects, entities
• Identify concepts
• Keywords, descriptors, terms, subject headings,
classification systems and codes, thesaurus terms
• Provide consistent tagging and accurate and
comprehensive retrieval of content items
NLP, ML, AI is not new
• Automation of human activity – around for over 100 years
• Mechanical automation – Jacquard Looms (1804)
• Herman Hollerith 1890 Census (punch cards)
• IBM _ Thomas Watson Group – 1920’s
• Sputnik 1957
• Space Race and Cold war
• 1964 – COSATI
• TEST = Thesaurus of Engineering, Scientific and technical Terms
• Automated retrieval of documents
• Dialog, NASA Recon 1973
Basic algorithms
• Boole, George 1815 – 1864
• Boolean algebra, is basic to the
design of digital computer
circuits.
• Bayes, Thomas 1701-1761
• Richard Price 1723 – 1791
• describes the probability of
an event, based on prior
knowledge of conditions that
might be related to the event
• Beyond reasonable doubt
Wikipeida
AI Building blocks - NLP
• Symbolic
• 1940, Alan Turing published an article titled "Computing Machinery and
Intelligence"
• Jabberwacky Chat box 1997
• Statistical
• 1990’s machine translation – ERTRANS
• Rule based approaches
• 2000’s World Wide Web – HTML
• Neural
• Vectors
• N-grams
What’sthe deal now??
“AI”+ GenAI
• Start with enriched content (tagged)
• Tell (feed to) GenAI
• GenAI puts new rules in the inference engine
• Search results get better
• Repeat, repeat, repeat
Understand the data
you are feeding a GenAI
• Identify cancerous skin lesions
in images
• 100% accurate!
https://sites.mitre.org/aifails/turning-lemons-into-lemon/
ChatGPT Static training
• Example of Generative AI – one of MANY
• ChatGPT is two-year-old data
• Took a lot of Manpower to train
• Need to constantly refresh –
• retrain the model…
• To keep current
• So what is the answer?
• How are the models getting trained / fine tuned?
• A special kind of NLP
• Use to enhance existing data sets
GLAN (Generalized Instruction Tuning)
• Breaks human knowledge into domains, sub-fields, and
disciplines
• Taxonomy is divided into subjects
• syllabus created for each subject (Branch)
• specific essential themes
• “GLAN these ideas to produce a variety of instructions that
closely resemble the design of the human educational system”
• Curriculum outline – Like NICEM Knowledge Domain
Flexible, scalable, and all-purpose
approach
• Produces instructions on an enormous scale
• Task-agnostic
• Spanning a wide range of disciplines
• The input taxonomy has been created with minimal human effort through LLM
prompting and verification
• Can add new fields or skills
• Adaptable, the dataset can be expanded and changed without having to start
from scratch
• Wide range of instructions covering every possible combination of human
knowledge and abilities
• Includes coding, logical reasoning, mathematical reasoning, academic tests, and
general instruction
• No need for task-specific training data for these particular tasks
• Add new domains or proficiencies by adding a new node to its taxonomy
HuixiangDou (Baseline)
• Problems with Chat systems using LLM
• Flooding of the system
• Irrelevant responses
• Lack of answer precision
• Answer
• Fine tuning the system
• Continuous updates
• Identifying the key points of problems
• Handling multiple target points simultaneously
• More focused approach to handling queries
• How?
• Keywords from the taxonomy
• Applied as an incoming filter
• Added to content responses
• Constant additions based on logs
HuixiangDou: A
Domain-
Specific
Knowledge
Assistant
Powered by
Large Language
Models
https://www.marktechpost.com/20
24/01/31/shanghai-ai-lab-presents-
huixiangdou-a-domain-specific-
knowledge-assistant-powered-by-
large-language-models-llm/
What’s the process?
AI
Technology
• It is a tool, not the focus
• Might need shiny new piece of technology
• the technology is generally in the chorus
• not a main character
• Too many companies lead with technology and
• do not spend the time understanding their users or aligning their
strategy
• Any company that has 1000s of SharePoint or Teams sites where
people still can’t find the information needs knows this
• Most large corps have 5 search software systems
• On the shelf
• “Does not work”
• Because the data was not enriched
Organizational model
• Taxonomy and data modeling
• Essential component of this investment
• Data must be
• Well sourced
• Managed
• Maintained
• Essential for the AI
• Both ethical and performance reasons
• Ignore data quality at your peril
• It is hard work
• Does not fit two-week sprint
• Get executives to agree on strategy and structure model
• Without a coherent model, governance, data pipeline, and resourcing there
is no strategic value to an AI initiative
•
It’s the Data Stupid!
• Data is their core asset
• Without the data the rest of the initiative is nothing
• It is the essential component the strategy
• Do enrichment metadata
• SUBJECT metadata
• Use taxonomies, ontologies, and other models
• The large language models will not only mirror but magnify any
problems with the data sets, problems that many organizations may
not realize they have. (Gary Carlson - Factors)
Why a taxonomy?
• Matches your content
• Scales with the content as it increases
• Extensive synonymy – use any of the word term options
• The concept is the unit of thought
• Disambiguation
• Mercury
• Lead
• Built in feedback loops to keep current with content
• Prevents hallucinations
• Misunderstandings of multiple word meanings (Nonsensical output)
• Happens when the model is not trained on your content (Factual contradiction)
• Query goes against the rules of the system (Prompt contradiction)
Why tag / index at all?
• Disambiguation
• Search and retrieval is accurate
• Promote taxonomy term first searching
• In the inverted index search controlled terms first
• Then go to full text if needed
• Use in search response consistency and integrity
• Recommendation engines using tag sets not vectors
Why Auto Tagging?
• Fast
• Sub-second versus 70 seconds per tag
• Able to add more tags quickly in same sub second time
• More depth
• Always goes to the most specific level of tagging
• No misspellings
• Consistency
• No editorial drift – people tend to use same tags over and over
• Do not need as many subject experts
• Replicable results – no black box
Dump in the data to the AI vortex
Approach A – Hybrid - Leverage Gen AI - 1
• Send question to ChatGPT
• Use autocomplete from the taxonomy
• Find more words and concepts
• Tailored to the specific domain or topic
• Extract the semantic context of the query
• MAI tags the query with taxonomy terms
• Send those concepts to the Generative AI system
• Read the answers from the Generative AI system
• Might submit more than once
• But answer will change each time
Approach A – Hybrid - Enrich the content -2
• Send the same query to your own content
• Use the same terms
• Answer will be consistent since it is on tagged actual text
• Keeps your data out of the LLM and secure
• Use the LLM to get a general answer
• Use your content to get the specific and reliable answer
• Combine the two to get a quick summary of the material
Bludgeon your data
Bludgeon your data
Approach B
• Gather large amount of text
• License the Gen Ai of your choice
• Integrate several different systems for optimal results
• OpenAI, Bard, Claude, DALL-E, MidJourney, etc.
• Convert data
• Load to large servers
• Train the data model
• Use series of refining questions
• Clarify the user’s intent interactively
• Convert the text query into a hybrid search query
• Summarize, classify, and display search results in different, easily distinguishable
categories (using an AI classification model), e.g., most relevant answer, most
recent answer, most trustworthy source, etc.
Taxonomy Priority (Semantic) Enrichment
Approach C – Taxonomy Priority – 1 of 2
• Organize and enrich first, then train the data
• Learn where the outliers are
• Allow for new input from outside resources over time
• Taxonomy Development:
• Use existing or create new
• Identify key concepts, categories, and relationships
• Hierarchical taxonomy structure
• Define relationships between different categories
• Represent the semantic connections between concepts
• Get a Taxonomy Tool to create and manage the taxonomy efficiently.
• Tools like Protégé, Data Harmony, or custom-built solutions can be used for this
purpose
Approach C – Taxonomy Priority – 2 of 2
• Data Preprocessing:
• Data Collection – gather documents
• Data Cleaning – remove noise, irrelevant information, and formatting
inconsistencies
• Entity Extraction – extract entities, concepts, and terms from the text
data and link
• Taxonomy Integration:
• Map extracted entities and concepts between text data and the
taxonomy structure
• Index the data using the taxonomy to enable efficient retrieval and
querying
How Can
Taxonomies Help
LLM?
• Understanding Input
• Content Organization
• Knowledge
Representation
• Query Expansion
• Quality Control
Can Taxonomies
make LLM
Behave?
• Guiding Decision-Making
• Enhancing Understanding
• Improving Consistency
• Facilitating Interpretability
• Supporting Compliance
Thank you for
your attention
Questions?
• Marjorie M.K Hlava
• Chief Scientist
• Access Innovations, Inc.
• mhlava@accessinn.com

More Related Content

Similar to Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results

Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...African Open Science Platform
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Heimo Hänninen
 
Subject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionSubject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionkmusthu
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Jenn Riley
 
Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Jeanne Kitchens
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singhJyoti Singh
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldAmit Sheth
 
Taxonomy 101: Presented at Taxonomy Boot Camp 2019
Taxonomy 101: Presented at Taxonomy Boot Camp 2019Taxonomy 101: Presented at Taxonomy Boot Camp 2019
Taxonomy 101: Presented at Taxonomy Boot Camp 2019Enterprise Knowledge
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...Dr. Haxel Consult
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceEnterprise Knowledge
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Asis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsAsis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsBert Carelli
 
ONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSKishan Patel
 
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentEmpowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentThe Digital Group
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singhJyoti Singh
 

Similar to Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results (20)

Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
Subject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionSubject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introduction
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012Learning Registry Overview Aug 2 2012
Learning Registry Overview Aug 2 2012
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singh
 
Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Taxonomy 101: Presented at Taxonomy Boot Camp 2019
Taxonomy 101: Presented at Taxonomy Boot Camp 2019Taxonomy 101: Presented at Taxonomy Boot Camp 2019
Taxonomy 101: Presented at Taxonomy Boot Camp 2019
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial Intelligence
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Asis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsAsis&t webinar people directories access innovations
Asis&t webinar people directories access innovations
 
ONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESS
 
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentEmpowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic Enrichment
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singh
 

More from Access Innovations, Inc.

ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Access Innovations, Inc.
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingAccess Innovations, Inc.
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Access Innovations, Inc.
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Access Innovations, Inc.
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItAccess Innovations, Inc.
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedAccess Innovations, Inc.
 
DHUG 2017 - Thesaurus Construction Training
DHUG 2017 - Thesaurus Construction TrainingDHUG 2017 - Thesaurus Construction Training
DHUG 2017 - Thesaurus Construction TrainingAccess Innovations, Inc.
 

More from Access Innovations, Inc. (20)

ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 
DHUG 2017 - Thesaurus Construction Training
DHUG 2017 - Thesaurus Construction TrainingDHUG 2017 - Thesaurus Construction Training
DHUG 2017 - Thesaurus Construction Training
 

Recently uploaded

Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comsaastr
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.KathleenAnnCordero2
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...漢銘 謝
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !risocarla2016
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxnoorehahmad
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 

Recently uploaded (20)

Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.comSaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
SaaStr Workshop Wednesday w/ Kyle Norton, Owner.com
 
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
PAG-UNLAD NG EKONOMIYA na dapat isaalang alang sa pag-aaral.
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !
 
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results

  • 1. MAKING AI BEHAVE: Using Knowledge Domains to Produce Useful, Trustworthy Results Marjorie M.K. Hlava Chief Scientist Access Innovations, Inc. mhlava@accessinn.com
  • 2. Abstract In today's highly charged atmosphere of anxiety and anticipation about AI, and especially LLMs, one of the biggest concerns is how to ensure that it returns accurate results (meaning both true and pertinent to its audience). This is particularly important to scholarly, scientific, and other technical organizations, whose constituents are often in very specific domains, such as medicine, engineering, history, biology, chemistry, etc. One extremely useful tool to incorporate in an AI-based process in such cases is a comprehensive and well-structured knowledge domain which is based on a controlled vocabulary. The next Access Innovations webinar, coming up at noon Eastern on Tuesday, March 26, is "MAKING AI BEHAVE: Using Knowledge Domains to Produce Useful, Trustworthy Results." It's based on the extensive experience and history Access Innovations has in the development and implementation of domain-specific thesauri, taxonomies, ontologies, and knowledge graphs, and their use of them with AI. They have over 70 knowledge domains covered, which they employ in sophisticated search, auto-tagging, and AI-based solutions for their clients. These are all available for immediate deployment, so you don't have to start from scratch to develop the ability to accurately tag your content to ensure proper and effective use by AI tools and systems.
  • 3. Google bans AI chatbot Gemini from answering election questions: ‘Try Google Search’ By Reuters Published March 12, 2024, 12:03 p.m. ET Microsoft AI Research Introduces Generalized Instruction Tuning (called GLAN): A General and Scalable ArtiEcial Intelligence Method for Instruction Tuning of Large Language Models (LLMs) By Tanya Malhotra March 2, 2024 News Corp in ‘advanced’ talks with AI firms on deals to license content, CEO says By Social Links forThomas Barrabi Published Feb. 8, 2024, 2:17 p.m. ET Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models [Submitted on 20 Feb 2024] https://arxiv.org/abs/2402.13064 Haoran Li, Qingxiu Dong, Zhengyang Tang, Chaojun Wang, Xingxing Zhang, Haoyang Huang, Shaohan Huang, Xiaolong Huang, Zeqiang Huang, Dongdong Zhang, Yuxian Gu, Xin Cheng, Xun Wang, Si-Qing Chen, Li Dong, Wei Lu, Zhifang Sui, Benyou Wang, Wai Lam, Furu Wei Daily Deluge Google AI Introduces Croissant: A Metadata Format for Machine Learning-Ready Datasets By Dhanshree Shripad Shenwai - March 12, 2024
  • 4.
  • 5.
  • 6. Marjorie M.K. Hlava • Expert in taxonomies, metadata, their application and data science. • Her groundbreaking work has earned her numerous awards and 2 patents with 21 claims granted • Margie standards work includes • Dublin Core Z39.85 • DOI Syntax Z39.84 • CrEdit Z39.104 • Thesaurus ANSI/NISO Z39.19 Thesauri and other controlled vocabularies • many others • Convener of the ISO - 25964 the International Standard on Controlled Vocabularies • Founder, Chairman, Chief Scientist of Access Innovations, Inc.
  • 7. ”large language models will not only mirror but magnify any problems with the data sets, problems that many organizations may not realize they have." Amplifying hidden biases and gaps seems like a real danger
  • 8. What we will cover today • Definitions • Getting us to speak the same language • Quick review of options • Why Taxonomies with LLM’s? • Where do they fit? • What are some available Knowledge Domains? • Two Approaches • Summary
  • 9. What we will NOT cover • Big topic • Video • Politics / Elections • Recent sensations • All the tool sets • Regulatory actions • Programming aspects • Business cases https://www.nature.com/articles/d41586-024-00661- 0?utm_source=Live+Audience&utm_campaign=adeec3770a-briefing-dy- 20240313&utm_medium=email&utm_term=0_b27a691814-adeec3770a-51734080
  • 10. National Information Standards Organization – Controlled vocabulary • "a carefully selected list of words and phrases, which are used to tag units of information (document, images, videos, etc.) in order to describe their content. This list is carefully selected and managed by experts in a particular subject domain or field.” • Ensure consistency and precision in indexing and retrieval • Domain specific
  • 11. NISO – Thesaurus • "a controlled and structured vocabulary in which concepts are represented by terms, organized so that relationships between concepts are made explicit, and and preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms.” • Hierarchical, Equivalence (Synonyms) and Associative (Related) • Structured • Organizes concepts • Facilitates access to information • Provides standardized terminology and relationships between terms (concepts), synonyms, or quasi-synonyms
  • 13. NISO - Taxonomy • "a structured, hierarchical representation of concepts or terms within a specific domain, organized to show relationships between concepts or terms.” • Formal frameworks for representing and organizing knowledge • Just the hierarchy • But now often used interchangeably with thesaurus and ontology Image + https://www.thoughtworks.com/insights/blog/data-science-ontology
  • 15. What are the steps to implement taxonomy in generative AI? 1 of 2 • Define the Taxonomy Structure: • Identify the key concepts, categories, and relationships relevant to the domain or problem the generative AI system will address. • Design a hierarchical taxonomy structure that organizes these concepts into categories and subcategories. • Define relationships between categories to capture semantic connections. • Collect and Preprocess Data: • Gather a corpus of text data relevant to the domain or problem. This could include documents, articles, or any other textual resources. • Preprocess the text data to clean it, remove noise, and standardize formatting. This may involve tasks like tokenization, stemming, and removing stop words. • Annotate Data with Taxonomy Labels: • Manually or semi-automatically annotate the text data with labels corresponding to the taxonomy categories. This step involves mapping text excerpts or documents to the appropriate categories in the taxonomy.
  • 16. What are the steps to implement taxonomy in generative AI? 2 of 2 • Train the Generative AI Model: • Choose or develop a generative AI model suitable for the task at hand, such as a language model based on transformers architecture (e.g., GPT). • Prepare the annotated data for training, ensuring that each input is associated with its corresponding taxonomy labels. • Train the generative AI model on the annotated data, incorporating the taxonomy labels as part of the training process. This allows the model to learn the relationships between textual inputs and taxonomy categories. • Incorporate Taxonomy into Model Inference: • After training, integrate the taxonomy structure into the generative AI model's inference process. • When generating text or responses, use the taxonomy to guide the model's outputs. For example, you can constrain the generation process to ensure that the generated text aligns with the taxonomy categories. • Evaluate and Iterate: • Evaluate the performance of the generative AI system using metrics relevant to the task, such as accuracy, coherence, and relevance. • Collect feedback from users or domain experts to identify areas for improvement. • Iterate on the model and taxonomy design based on the evaluation results and feedback, making adjustments as necessary to enhance performance. • Deploy and Monitor: • Deploy the generative AI system with taxonomy support in a production environment or as part of an application. • Monitor the system's performance and user interactions, gathering data for further refinement and optimization. • Collaboration between domain experts, data scientists, and AI engineers is crucial for the success
  • 17. Knowledge Domain • Refers to a specific area or field of knowledge • subject matter, concepts, theories, methodologies, and practices • Cohesive and organized body of knowledge with a scope and boundaries • Vary widely in size and complexity • Covid 57 terms • JSTOR 57,000 terms • Library of Congress – 208,000 terms • quantum mechanics or medieval literature • Established disciplines or sub-disciplines • Each with theories, methods, and research traditions • Frameworks for understanding and investigating phenomena within specific areas • Represent scholars, researchers, practitioners, SME’s contributing to knowledge within those domains
  • 18. Available = already built • Government resources • Most agencies • May need formatting • NASA, DTIC, DOE, NAL, EPA, NLM etc • Sign up for updates • License-able • TaxoBank • Access Innovations • Others
  • 19. Knowledge Domains • Taxonomies, thesauri, or authority files • Pre-Built • Knowledge Domains • full term records • hierarchical, equivalence, and associative relationships, as well as scope notes where appropriate. • hierarchy only. • NISO Z39.19 and ISO 25964 standards compliant • Formats, • 22 options • Excel/CSV • SKOS-2 • Etc.
  • 20. Applied Science Art Behavioral Science Biological Science Business Chemical – MAI Chem Communications Computer Science COVID Economics Educational Curriculum Geography Health and Safety Health Science History Information Science Language Arts Law Linguistics Literature and Drama Mathematics NewsThes Nursing Philosophy Physical Education and Recreation Physical Sciences Political Science Psychology Religion Science Social Sciences General Purpose Taxonomies
  • 21. These products can be SKOS downloads Astronomy Clinical Drugs DTIC – Defense Technical Information Center Environment – GEMET ERIC – Education Resource Information Center JSTOR NASA National Agricultural Library Occupational Safety and Health PLOS
  • 22. CPT – Current Procedural Terminology HCPCS – Healthcare Common Procedure Coding System ICD11 – International Classification of Diseases Kew Medicinal Plant Names (MPNS) MeSH – Medical Subject Headings Suspect Cell Lines Taxogene – the Human Geonome These products are available as SaaS
  • 23. Knowledge Graph • A knowledge graph • structured representation of knowledge • captures relationships between entities or concepts in a specific domain • Nodes represent entities or concepts • Edges represent relationships between these entities • Using semantic technologies and linked data principles • Integrate information from multiple sources • Supports inference of new knowledge based on existing connections • Enable context-aware information • data and its relationships • Gives precise querying and analysis • Supports discovery of implicit connections and patterns within the data • For organizing, navigating, and leveraging large volumes of interconnected data • Facilitate extraction of insights and the generation of new knowledge Image = https://ahrefs.com/blog/google-knowledge-graph/
  • 25. Does a knowledge graph need a controlled vocabulary? • Consistency • Interoperability • Facilitates Search and Discovery • Semantic Enrichment • Domain Understanding
  • 26. Knowledge Graphs with Generative AI • Contextual Understanding: • Provide a structured representation of relationships between entities and concepts • Generates more relevant and contextually appropriate responses • Content Generation: • Source of structured data and information for generative AI systems • Use in training process, to learn from the structured relationships encoded in the graph to generate more coherent and accurate outputs • Ensure that the generated documents adhere to domains principles and conventions • Entity Linking and Disambiguation: • Identify and disambiguate entities mentioned in text • Let's AI models accurately link mentions of entities to their corresponding entries in the graph, • Reduces ambiguity • Improves the quality of generated outputs • Personalization and Customization: • Customize to specific domains or use cases • Generate personalized outputs for use needs and preferences • By Provides more relevant and useful content
  • 27. How to Link Knowledge Graph to Generative AI – 1 of 2 • Define Knowledge Graph Schema: • Identify the entities, relationships, and properties • Design a schema for both structure and semantics domain • Acquire and Process Data: • Gather data sources • Preprocess the data to extract entities, relationships, and properties • Convert them into a format suitable for loading into the knowledge graph. • Build and Populate the Knowledge Graph: • Use a graph database to create and populate the knowledge graph. • Load the processed data into the knowledge graph, • Ensure that entities are represented as nodes, relationships as edges, and properties as attributes.
  • 28. How to Link Knowledge Graph to Generative Ai – 2 of 2 • Integrate Knowledge Graph with Generative AI: • Querying the knowledge graph for relevant information • Incorporating it as input during the model training process. • Ensure the model can access and utilize the structured knowledge represented in the graph. • Training and Fine-Tuning: • Train or fine-tune the generative AI model using the knowledge graph-enhanced data. • Supervised learning with labeled data • Unsupervised learning to discover patterns and relationships within the data. • Generate Outputs and Evaluate: • Use generative AI system to generate outputs based on user queries or input data. • Evaluate the quality, relevance, and coherence of the generated outputs, • Iterate and Refine: • Iterate on the implementation, incorporating feedback and making improvements • Continuously refine the knowledge graph and generative AI model • User interactions, new data, and evolving requirements.
  • 29. Ontology – 1 of 2 • “a formal and explicit specification of a conceptualization defines the terms, concepts, and relationships within a particular domain of knowledge." • Meaningless jumble • Formal frameworks for representing and organizing knowledge • Terms or Concepts: • Represent the entities, classes, or categories within the domain of interest • Each term (concept or entity) is defined with a precise meaning • Relationships: • Define the connections between terms in the ontology • Relationships can represent various types of connections such as hierarchical (subclass), part-whole, or associative relationships • Axioms or Constraints: • Rules or constraints for properties (behavior) of the terms and relationships in the ontology • Axioms help ensure the consistency and coherence of the ontology
  • 30. Ontology – 2 of 2 • Often uses RDF = Resource Description Framework (defines itself by reference or inclusion) • One thing (the subject a.k.a. resource) has a relationship (the predicate a.k.a. edge) with another thing (the object a.k.a. resource) • Thing (a resource) and each edge is a given relationship (either reports to or works for), which is known as a predicate • Machine-readable • Facilitates interoperability, reasoning, and semantic understanding across different systems and applications • Connect things not strings
  • 31. Tagging /Indexing • The process of associating metadata or descriptive keywords with digital content • All content types • Text based • Identify Things • People, places, objects, entities • Identify concepts • Keywords, descriptors, terms, subject headings, classification systems and codes, thesaurus terms • Provide consistent tagging and accurate and comprehensive retrieval of content items
  • 32. NLP, ML, AI is not new • Automation of human activity – around for over 100 years • Mechanical automation – Jacquard Looms (1804) • Herman Hollerith 1890 Census (punch cards) • IBM _ Thomas Watson Group – 1920’s • Sputnik 1957 • Space Race and Cold war • 1964 – COSATI • TEST = Thesaurus of Engineering, Scientific and technical Terms • Automated retrieval of documents • Dialog, NASA Recon 1973
  • 33. Basic algorithms • Boole, George 1815 – 1864 • Boolean algebra, is basic to the design of digital computer circuits. • Bayes, Thomas 1701-1761 • Richard Price 1723 – 1791 • describes the probability of an event, based on prior knowledge of conditions that might be related to the event • Beyond reasonable doubt Wikipeida
  • 34. AI Building blocks - NLP • Symbolic • 1940, Alan Turing published an article titled "Computing Machinery and Intelligence" • Jabberwacky Chat box 1997 • Statistical • 1990’s machine translation – ERTRANS • Rule based approaches • 2000’s World Wide Web – HTML • Neural • Vectors • N-grams
  • 36. “AI”+ GenAI • Start with enriched content (tagged) • Tell (feed to) GenAI • GenAI puts new rules in the inference engine • Search results get better • Repeat, repeat, repeat
  • 37. Understand the data you are feeding a GenAI • Identify cancerous skin lesions in images • 100% accurate! https://sites.mitre.org/aifails/turning-lemons-into-lemon/
  • 38. ChatGPT Static training • Example of Generative AI – one of MANY • ChatGPT is two-year-old data • Took a lot of Manpower to train • Need to constantly refresh – • retrain the model… • To keep current • So what is the answer? • How are the models getting trained / fine tuned? • A special kind of NLP • Use to enhance existing data sets
  • 39. GLAN (Generalized Instruction Tuning) • Breaks human knowledge into domains, sub-fields, and disciplines • Taxonomy is divided into subjects • syllabus created for each subject (Branch) • specific essential themes • “GLAN these ideas to produce a variety of instructions that closely resemble the design of the human educational system” • Curriculum outline – Like NICEM Knowledge Domain
  • 40. Flexible, scalable, and all-purpose approach • Produces instructions on an enormous scale • Task-agnostic • Spanning a wide range of disciplines • The input taxonomy has been created with minimal human effort through LLM prompting and verification • Can add new fields or skills • Adaptable, the dataset can be expanded and changed without having to start from scratch • Wide range of instructions covering every possible combination of human knowledge and abilities • Includes coding, logical reasoning, mathematical reasoning, academic tests, and general instruction • No need for task-specific training data for these particular tasks • Add new domains or proficiencies by adding a new node to its taxonomy
  • 41. HuixiangDou (Baseline) • Problems with Chat systems using LLM • Flooding of the system • Irrelevant responses • Lack of answer precision • Answer • Fine tuning the system • Continuous updates • Identifying the key points of problems • Handling multiple target points simultaneously • More focused approach to handling queries • How? • Keywords from the taxonomy • Applied as an incoming filter • Added to content responses • Constant additions based on logs
  • 42. HuixiangDou: A Domain- Specific Knowledge Assistant Powered by Large Language Models https://www.marktechpost.com/20 24/01/31/shanghai-ai-lab-presents- huixiangdou-a-domain-specific- knowledge-assistant-powered-by- large-language-models-llm/
  • 44. Technology • It is a tool, not the focus • Might need shiny new piece of technology • the technology is generally in the chorus • not a main character • Too many companies lead with technology and • do not spend the time understanding their users or aligning their strategy • Any company that has 1000s of SharePoint or Teams sites where people still can’t find the information needs knows this • Most large corps have 5 search software systems • On the shelf • “Does not work” • Because the data was not enriched
  • 45. Organizational model • Taxonomy and data modeling • Essential component of this investment • Data must be • Well sourced • Managed • Maintained • Essential for the AI • Both ethical and performance reasons • Ignore data quality at your peril • It is hard work • Does not fit two-week sprint • Get executives to agree on strategy and structure model • Without a coherent model, governance, data pipeline, and resourcing there is no strategic value to an AI initiative •
  • 46. It’s the Data Stupid! • Data is their core asset • Without the data the rest of the initiative is nothing • It is the essential component the strategy • Do enrichment metadata • SUBJECT metadata • Use taxonomies, ontologies, and other models • The large language models will not only mirror but magnify any problems with the data sets, problems that many organizations may not realize they have. (Gary Carlson - Factors)
  • 47.
  • 48. Why a taxonomy? • Matches your content • Scales with the content as it increases • Extensive synonymy – use any of the word term options • The concept is the unit of thought • Disambiguation • Mercury • Lead • Built in feedback loops to keep current with content • Prevents hallucinations • Misunderstandings of multiple word meanings (Nonsensical output) • Happens when the model is not trained on your content (Factual contradiction) • Query goes against the rules of the system (Prompt contradiction)
  • 49. Why tag / index at all? • Disambiguation • Search and retrieval is accurate • Promote taxonomy term first searching • In the inverted index search controlled terms first • Then go to full text if needed • Use in search response consistency and integrity • Recommendation engines using tag sets not vectors
  • 50. Why Auto Tagging? • Fast • Sub-second versus 70 seconds per tag • Able to add more tags quickly in same sub second time • More depth • Always goes to the most specific level of tagging • No misspellings • Consistency • No editorial drift – people tend to use same tags over and over • Do not need as many subject experts • Replicable results – no black box
  • 51. Dump in the data to the AI vortex
  • 52. Approach A – Hybrid - Leverage Gen AI - 1 • Send question to ChatGPT • Use autocomplete from the taxonomy • Find more words and concepts • Tailored to the specific domain or topic • Extract the semantic context of the query • MAI tags the query with taxonomy terms • Send those concepts to the Generative AI system • Read the answers from the Generative AI system • Might submit more than once • But answer will change each time
  • 53. Approach A – Hybrid - Enrich the content -2 • Send the same query to your own content • Use the same terms • Answer will be consistent since it is on tagged actual text • Keeps your data out of the LLM and secure • Use the LLM to get a general answer • Use your content to get the specific and reliable answer • Combine the two to get a quick summary of the material
  • 55. Approach B • Gather large amount of text • License the Gen Ai of your choice • Integrate several different systems for optimal results • OpenAI, Bard, Claude, DALL-E, MidJourney, etc. • Convert data • Load to large servers • Train the data model • Use series of refining questions • Clarify the user’s intent interactively • Convert the text query into a hybrid search query • Summarize, classify, and display search results in different, easily distinguishable categories (using an AI classification model), e.g., most relevant answer, most recent answer, most trustworthy source, etc.
  • 57. Approach C – Taxonomy Priority – 1 of 2 • Organize and enrich first, then train the data • Learn where the outliers are • Allow for new input from outside resources over time • Taxonomy Development: • Use existing or create new • Identify key concepts, categories, and relationships • Hierarchical taxonomy structure • Define relationships between different categories • Represent the semantic connections between concepts • Get a Taxonomy Tool to create and manage the taxonomy efficiently. • Tools like ProtĂŠgĂŠ, Data Harmony, or custom-built solutions can be used for this purpose
  • 58. Approach C – Taxonomy Priority – 2 of 2 • Data Preprocessing: • Data Collection – gather documents • Data Cleaning – remove noise, irrelevant information, and formatting inconsistencies • Entity Extraction – extract entities, concepts, and terms from the text data and link • Taxonomy Integration: • Map extracted entities and concepts between text data and the taxonomy structure • Index the data using the taxonomy to enable efficient retrieval and querying
  • 59. How Can Taxonomies Help LLM? • Understanding Input • Content Organization • Knowledge Representation • Query Expansion • Quality Control
  • 60. Can Taxonomies make LLM Behave? • Guiding Decision-Making • Enhancing Understanding • Improving Consistency • Facilitating Interpretability • Supporting Compliance
  • 61. Thank you for your attention Questions? • Marjorie M.K Hlava • Chief Scientist • Access Innovations, Inc. • mhlava@accessinn.com