Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Embodied Methods for Quick and Accurate Insights About Documents Using LLMs
1. Embodied methods for quick and
accurate insights about documents using
LLMs
Ben Goosman
2. About Me
Me: Ben Goosman
Picture credit Weidong Yang
Picture credit Piper Werle
3. Problem
Documents take a long time to read
Easy to get lost in the details
LLMs hallucinate
Lack of trust in AI
4. Solution
Don’t rely completely on the AI
Involve the human in as many steps as possible
Knowledge Map, not Knowledge Graph
Why Map?
5. Methods
- Generating
- Infrastructure for bulk document analysis
- Knowledge map with the POLE model
- Make the LLM explain itself
- Provide definitions to the LLM
- Use examples to get desired output
- Allow human to change query
- It’s ok not to label everything
- At first, observations are nodes, not edges
- Use shortcut
- Exploring
- Neo4j Full Text search
- Apply Force Layout in 2d and 3d
- Find central nodes
- Use path finding
- Zoom in and read
- Expand using Cypher
- Question answering with the graph
- Find relevant documents
- Find relevant knowledge map
7. Knowledge map with the POLE model
Find relationships involving entities of types
{labels} in the text provided.
A relationship has a Source, Target,
Explanation as to why these are in relation, and
a Short relationship.
One of the Source or Target can be of a type
not in the list {labels}, but not both.
The definitions are {str(definitions)}. If there are
no relationships, don't say anything.
Some examples of your output are below.
8. Make the LLM explain itself
See: Chain of Thought reasoning
Explanation: Lily lived in the village nestled
in the mountains.
Short: LIVED_IN
9. Provide definitions to the LLM
definitions = {
"Person": "An individual human being. This can include but is not limited to information about their name, age, gender,
occupation, nationality, and relationships.",
"Organization": " A structured body of people with a particular purpose, especially a business, society, association, etc. This can
include elements such as its name, founders, founding date, purpose, key people, and locations.",
"Location": "A specific place or position. This includes geopolitical places like countries, cities, and towns, or smaller, specific
places like buildings or landmarks. Information can cover elements such as its name, geographical coordinates, population, and
relevant features.",
"Event": "An occurrence of interest happening at a particular place and time. It can be historical, current, or future. It usually
involves people or organizations, and takes place at a specific location. Information can include elements such as its name, date,
location, participants, purpose, and outcomes.",
}
10. Use examples to get desired output
Your task is to generate {example_count} few-shot examples to train an LLM to identify
the relationships between entities of types {labels} in a text in order to create a
Knowledge Graph. The few-shot examples should have the following structure, but
adapted for the entities and relationships in question. The definitions of the types are
{str(definitions)}. Follow the example format below where each relationship has a
Source, Target, Explanation, and Short.
Source: Bruno Pusterla | Person
Target: Italian Agricultural Confederation | Organization
Explanation: Bruno Pusterla is a top official of the Italian Agricultural Confederation.
Short: WORKS_FOR
11. Allow human to change query
Find relationships matching the
given query, in the text provided.
Follow the example format. Each
relationship must have a Source,
Target, Explanation, and Short. If
there are no matches, don't say
anything.
12. It’s ok not to label everything
Focus on a few labels at a time, and label
everything else “Entity”
Not trying to be WikiData
Compromise between Knowledge Graph and Mind Map
More accurate results with GPT-4
“Chain-of-thought (CoT) prompting is a technique that allows large language models (LLMs) to solve a problem as a series of intermediate steps[27] before giving a final answer. Chain-of-thought prompting improves reasoning ability by inducing the model to answer a multi-step problem with steps of reasoning that mimic a train of thought.[28][17][29] It allows large language models to overcome difficulties with some reasoning tasks that require logical thinking and multiple steps to solve, such as arithmetic or commonsense reasoning questions.[30][31][32]” https://en.wikipedia.org/wiki/Prompt_engineering
Observation nodes create separation in the graph layout
Observation nodes can be linked to the Source and Chunk
Observation nodes can be skipped later