© 2022 Neo4j, Inc. All rights reserved.
1
Knowledge-graph-based chatbot
By Tomaž Bratanič
About me
Tomaž Bratanič - Graph & LLM enthusiast Author of:
● https://bratanic-tomaz.medium.com/
● https://twitter.com/tb_tomaz
● https://github.com/tomasonjo
● bratanic.tomaz@gmail.com
Special discount code for 40% off: ctwnodes40
http://mng.bz/vPnM
© 2022 Neo4j, Inc. All rights reserved.
3
LLMs are great, but:
• Knowledge
cutoff date
• Hallucinations
• Lack of
information
source
• Lack of
domain-specific
or private
information
© 2022 Neo4j, Inc. All rights reserved.
4
● Generating training
data is hard and
expensive
● Doesn’t solve
knowledge cutoff
issue, only pushed
it to a later date
● No data sources
● Hard to debug, is
the information
coming from
foundation training
set or from fine-
tuning set or is it
hallucinated?
Supervised fine-tuning an LLM
From Google documentations
© 2022 Neo4j, Inc. All rights reserved.
5
● Providing relevant
context in prompt
● Avoid relying on
facts provided by
LLMs
● Increased
transparency due to
provided sources
● The answer is as
good as the
provided context
Retrieval-augmented generation
(in-context passing)
Retrieval-augmented generation (in-context passing) flow
Chat with your PDF application
Vector similarity search
Given a question, find the most relevant documents based on a similarity metric (such as
Cosine Similarity) between vector of the question and vectors of contents.
Moving from keyword search to similarity (semantic) search.
Q: what is text
embedding?
abstractId similarity
456 0.923445
22 0.892114
… ...
Top K by similarity
Vector index in Neo4j - Added in 5.11
Define vector index Query vector index
Chat with your PDF - GenAI Stack
https://github.com/docker/genai-stack
PDF Input
User question
Generated answer
Chat with your PDF - GenAI Stack
https://github.com/docker/genai-stack
Neo4j LLM integrations with
LangChain and LlamaIndex can get
you started in less than 5 minutes
13
Knowledge graphs
Go beyond unstructured text in your LLM applications
© 2022 Neo4j, Inc. All rights reserved.
14
● Consists of nodes
and relationships
● Nodes are used to
represent entities,
concepts, and more
● Relationships
describe the
context and how it
is connected to
other nodes
Knowledge graph
Retrieval-augmented generation flow for structured
information using generated Cypher statements
https://towardsdatascience.com/langchain-has-added-cypher-search-cb9d821120d5
© 2022 Neo4j, Inc. All rights reserved.
16
● Pass the schema
information to LLM
along with the
question to
generate a
database query
● Can be enhanced
by providing
specific query
examples in the
prompt
Using LLMs to generate queries
Few-shot example for Cypher generation
Also useful for
translating domain-
specific lingo to
Cypher
Answer generating chain
Giving an LLM presentation
with a pop quiz at the end.
© 2022 Neo4j, Inc. All rights reserved.
19
● Using vector or full
text index to map
values or entities
from the user
question to
database property
values
● Using vector
similarity search to
find most relevant
few-shot cypher
examples allows
you to scale to
more examples
Introducing dynamic prompting
https://github.com/tomasonjo/streamlit-neo4j-hackathon
Multi-hop question answering
Take for example the following question:
Did any of the former OpenAI employees start
their own company?
Could be broken into two sub questions:
● Who are the former employees of OpenAI?
● Did any of them start their own company?
Typical recommendation is to use vector similarity
search in combination with agents
© 2022 Neo4j, Inc. All rights reserved.
21
Potential problems
● Repeated information
in the top N
documents
● Hard to define ideal
N number of
retrieved documents
● Potentially a lot of
steps in the question
generation process
Multi-hop question answering
© 2022 Neo4j, Inc. All rights reserved.
22
● Move the bulk work
from query time to
ingestion time
● Don’t worry about
repeated
information
● Don’t worry about
determining the
ideal K number of
retrieved
documents
Multi-hop question answering - a
better way
https://blog.langchain.dev/constructing-knowledge-graphs-from-text-using-openai-functions/
Real-time analytics with
knowledge graph
Take for example the following question:
How many new customers we had last week?
Which services depend indirectly on the
authorization service?
Vector similarity search doesn’t natively
handle aggregations, transformation,
graph analytics, etc…
However, a graph database like Neo4j
offers a vast possibility of traditional and
graph analytics through the use of the
graph-based structured query language
Cypher and Graph Data Science plugin
© 2022 Neo4j, Inc. All rights reserved.
24
● Which products will
be affected if
particular system
goes down?
● Who owns a
particular system?
● Are there any active
systems that are
not part of a
product (could be
deprecated)?
● Are there any
existing tasks
related to particular
bug in authorization
service?
Real-time analytics - Microservices
(dev-ops) graph
https://blog.langchain.dev/using-a-knowledge-graph-to-implement-a-devops-rag-application/
© 2022 Neo4j, Inc. All rights reserved.
25
● Future of LLM
applications is the
combination of
structured and
unstructured data
● Explicit links
between structured
and unstructured
information give the
ability to perform
complex “metadata”
filtering
Combining structured and
unstructured information
Unstructured data
Questions
Tomaž Bratanič - Graph & LLM enthusiast Author of:
Book raffle:
https://forms.gle/HKWAaSnmzawYtVqr6
● https://bratanic-tomaz.medium.com/
● https://twitter.com/tb_tomaz
● https://github.com/tomasonjo
● bratanic.tomaz@gmail.com
Special discount code for 40% off: ctwnodes40

Nodes 2023 - Knowledge graph based chatbot.pptx

  • 1.
    © 2022 Neo4j,Inc. All rights reserved. 1 Knowledge-graph-based chatbot By Tomaž Bratanič
  • 2.
    About me Tomaž Bratanič- Graph & LLM enthusiast Author of: ● https://bratanic-tomaz.medium.com/ ● https://twitter.com/tb_tomaz ● https://github.com/tomasonjo ● bratanic.tomaz@gmail.com Special discount code for 40% off: ctwnodes40 http://mng.bz/vPnM
  • 3.
    © 2022 Neo4j,Inc. All rights reserved. 3 LLMs are great, but: • Knowledge cutoff date • Hallucinations • Lack of information source • Lack of domain-specific or private information
  • 4.
    © 2022 Neo4j,Inc. All rights reserved. 4 ● Generating training data is hard and expensive ● Doesn’t solve knowledge cutoff issue, only pushed it to a later date ● No data sources ● Hard to debug, is the information coming from foundation training set or from fine- tuning set or is it hallucinated? Supervised fine-tuning an LLM From Google documentations
  • 5.
    © 2022 Neo4j,Inc. All rights reserved. 5 ● Providing relevant context in prompt ● Avoid relying on facts provided by LLMs ● Increased transparency due to provided sources ● The answer is as good as the provided context Retrieval-augmented generation (in-context passing)
  • 6.
  • 7.
    Chat with yourPDF application
  • 8.
    Vector similarity search Givena question, find the most relevant documents based on a similarity metric (such as Cosine Similarity) between vector of the question and vectors of contents. Moving from keyword search to similarity (semantic) search. Q: what is text embedding? abstractId similarity 456 0.923445 22 0.892114 … ... Top K by similarity
  • 9.
    Vector index inNeo4j - Added in 5.11 Define vector index Query vector index
  • 10.
    Chat with yourPDF - GenAI Stack https://github.com/docker/genai-stack PDF Input User question Generated answer
  • 11.
    Chat with yourPDF - GenAI Stack https://github.com/docker/genai-stack Neo4j LLM integrations with LangChain and LlamaIndex can get you started in less than 5 minutes
  • 13.
    13 Knowledge graphs Go beyondunstructured text in your LLM applications
  • 14.
    © 2022 Neo4j,Inc. All rights reserved. 14 ● Consists of nodes and relationships ● Nodes are used to represent entities, concepts, and more ● Relationships describe the context and how it is connected to other nodes Knowledge graph
  • 15.
    Retrieval-augmented generation flowfor structured information using generated Cypher statements https://towardsdatascience.com/langchain-has-added-cypher-search-cb9d821120d5
  • 16.
    © 2022 Neo4j,Inc. All rights reserved. 16 ● Pass the schema information to LLM along with the question to generate a database query ● Can be enhanced by providing specific query examples in the prompt Using LLMs to generate queries
  • 17.
    Few-shot example forCypher generation Also useful for translating domain- specific lingo to Cypher
  • 18.
    Answer generating chain Givingan LLM presentation with a pop quiz at the end.
  • 19.
    © 2022 Neo4j,Inc. All rights reserved. 19 ● Using vector or full text index to map values or entities from the user question to database property values ● Using vector similarity search to find most relevant few-shot cypher examples allows you to scale to more examples Introducing dynamic prompting https://github.com/tomasonjo/streamlit-neo4j-hackathon
  • 20.
    Multi-hop question answering Takefor example the following question: Did any of the former OpenAI employees start their own company? Could be broken into two sub questions: ● Who are the former employees of OpenAI? ● Did any of them start their own company? Typical recommendation is to use vector similarity search in combination with agents
  • 21.
    © 2022 Neo4j,Inc. All rights reserved. 21 Potential problems ● Repeated information in the top N documents ● Hard to define ideal N number of retrieved documents ● Potentially a lot of steps in the question generation process Multi-hop question answering
  • 22.
    © 2022 Neo4j,Inc. All rights reserved. 22 ● Move the bulk work from query time to ingestion time ● Don’t worry about repeated information ● Don’t worry about determining the ideal K number of retrieved documents Multi-hop question answering - a better way https://blog.langchain.dev/constructing-knowledge-graphs-from-text-using-openai-functions/
  • 23.
    Real-time analytics with knowledgegraph Take for example the following question: How many new customers we had last week? Which services depend indirectly on the authorization service? Vector similarity search doesn’t natively handle aggregations, transformation, graph analytics, etc… However, a graph database like Neo4j offers a vast possibility of traditional and graph analytics through the use of the graph-based structured query language Cypher and Graph Data Science plugin
  • 24.
    © 2022 Neo4j,Inc. All rights reserved. 24 ● Which products will be affected if particular system goes down? ● Who owns a particular system? ● Are there any active systems that are not part of a product (could be deprecated)? ● Are there any existing tasks related to particular bug in authorization service? Real-time analytics - Microservices (dev-ops) graph https://blog.langchain.dev/using-a-knowledge-graph-to-implement-a-devops-rag-application/
  • 25.
    © 2022 Neo4j,Inc. All rights reserved. 25 ● Future of LLM applications is the combination of structured and unstructured data ● Explicit links between structured and unstructured information give the ability to perform complex “metadata” filtering Combining structured and unstructured information Unstructured data
  • 26.
    Questions Tomaž Bratanič -Graph & LLM enthusiast Author of: Book raffle: https://forms.gle/HKWAaSnmzawYtVqr6 ● https://bratanic-tomaz.medium.com/ ● https://twitter.com/tb_tomaz ● https://github.com/tomasonjo ● bratanic.tomaz@gmail.com Special discount code for 40% off: ctwnodes40