EMPOWERING SALES WITH
INTELLIGENT BI AGENTS:
DWH, LLM, AND RAG
INTEGRATION
ŠIMUN ŠUNJIĆ
LOVRO MATOŠEVIĆ
Challenges
User related Technically related
Can simple retry help?
BI LLM Advanced Database Chat System
LLM-enhanced
SQL generation
Retrieval-Augmented
Generation (RAG) for context
understanding
Multi-agent architecture for
autonomous management of
various processing aspects
and state retention
Graph databases for
global schema context
and understanding
User-friendly UI for
visualization and
reporting
Technical Deep Dive
Core
Integration
Data
Schema Linking
Essential for handling complex,
multi-table queries
Why?
Schema Linking
How?
Embed user query and schema for similarity search​
Periodically update schema
Only pick portion of the schema
Few-shot with Golden SQL queries and relevance scoring
Schema Linking
Bridges the gap between natural
language and database structure
Conclusion?
Query Generation Process
Query Generation Process
Properly detect users' intent
via confidence scoring
recommendation
utterances: using natural
language parsers / LLM
Query Generation Process
System prompts: Guide LLM
through system prompts by
injecting few-shot examples,
relevant data from high cardinality
columns and the relevant portion
of DDL context
Query Generation Process
Conversation history:
Cache and database records
Query Generation Process
Chatty agents: make sure
agents don't fall into
recursion
Query Generation Process
Context manager: ensure
agents share common
context storage
Query Generation Process
Query Optimizer: built in
generation process with SQL
validation and fixing
High-cardinal columns
• Values with millions of unique
product IDs
• Helps with "vague terms into
specific database values"
conversions
• User query to sub-queries
decomposition
Query Generation Process
• The system maintains
conversation context and
understands business terminology,
allowing users to ask follow-up
questions naturally.
• For example, after seeing sales
data, users can simply ask without
needing to specify all the details
again:
o Show this as a chart
o Compare with previous year
Agents
• Generate skeleton SQL using
graphs and similarity search
• Improve the WHERE clause with
high-cardinal data
• Sub-query agent breaks down the
query into different components
Graphs
• "What is the most effective sales strategy employed by a contemporary
of top sales leaders in the industry?"
• Reason about relationships to create DAG (directed acyclic graph)
• Extract non-local entities connected through multi-hop
• Identify root node -> Graph -> Sub-graph = Query -> Sub-query
Traditional efficient search methods, such as locality-sensitive hashing, which
are designed for similarity search, are not well-suited for extracting complex
structural patterns such as paths or subgraphs. Extracting structural
information must cover the critical evidence needed to answer the query
without exceeding the reasoning capacity of LLMs. Expanding the context
window increases computational complexity and can degrade RAG
performance by introducing irrelevant information.
Fine-Tuning Llama 3.1
Fine-Tuning Llama 3.1
Enhancing Query Generation Accuracy
Fine-tuning adapts Llama 3.1
to our client's specific needs
Improves understanding of
complex queries
Fine-Tuning Llama 3.1
Customization to Business Context
Incorporates specific
terminology and data schemas
Aligns model with industry-
specific language
Fine-Tuning Llama 3.1
Handling Domain-Specific Terms
"Customer churn" in telecom
"Inventory turnover" in retail
Examples:
Crafting a Custom Synthetic Dataset
Crafting a Custom Synthetic Dataset
Utilizing LLMs for Dataset Generation
Employed models like ChatGPT and Anthropic
Generated tailored question-query pairs for the
client's DWH
Crafting a Custom Synthetic Dataset
Building a Robust Training Set
Created and validated 100 extremely complex queries
Added 200 less complex queries for
comprehensive coverage
Crafting a Custom Synthetic Dataset
t
eries
WITH CustomerOrderValues AS (
SELECT
"f_sales"."BILL_CUSTOMER_SID",
DATE_TRUNC('quarter', "d_date"."DATE") AS "quarter",
AVG("f_sales"."EXTENDED_PRICE") AS "avg_order_value",
COUNT(DISTINCT "f_sales"."SALES_DOCUMENT_SID") AS "order_count"
FROM "f_sales"
JOIN "d_date" ON "f_sales"."ORDER_DATE_SID" = "d_date".date_sid
WHERE "d_date"."DATE" >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '3 months'
AND "d_date"."DATE" < DATE_TRUNC('quarter', CURRENT_DATE) + INTERVAL '3 months'
GROUP BY "f_sales"."BILL_CUSTOMER_SID", DATE_TRUNC('quarter', "d_date"."DATE")
),
CustomerGrowth AS (
SELECT
"c"."CUSTOMER_SID",
"c"."CUSTOMER_NAME",
"cov_current"."avg_order_value" AS "current_avg_order_value",
"cov_previous"."avg_order_value" AS "previous_avg_order_value",
("cov_current"."avg_order_value" - "cov_previous"."avg_order_value") / "cov_previous"."avg_order_value" AS "growth_rate"
FROM "d_customers" "c"
JOIN CustomerOrderValues "cov_current" ON "c"."CUSTOMER_SID" = "cov_current"."BILL_CUSTOMER_SID"
JOIN CustomerOrderValues "cov_previous" ON "c"."CUSTOMER_SID" = "cov_previous"."BILL_CUSTOMER_SID"
WHERE "cov_current"."quarter" = DATE_TRUNC('quarter', CURRENT_DATE)
AND "cov_previous"."quarter" = DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '3 months'
AND "cov_current"."order_count" >= 5
AND "cov_previous"."order_count" >= 5
),
CompanyAverage AS (
SELECT SUM("f_sales"."EXTENDED_PRICE") / COUNT("f_sales"."SALES_DOCUMENT_SID") AS "company_avg_order_value"
FROM "f_sales"
JOIN "d_date" ON "f_sales"."ORDER_DATE_SID" = "d_date".date_sid
WHERE "d_date"."DATE" >= DATE_TRUNC('quarter', CURRENT_DATE)
AND "d_date"."DATE" < DATE_TRUNC('quarter', CURRENT_DATE) + INTERVAL '3 months'
)
SELECT
"cg"."CUSTOMER_NAME",
ROUND("cg"."current_avg_order_value", 2) AS "current_avg_order_value",
ROUND("cg"."previous_avg_order_value", 2) AS "previous_avg_order_value",
ROUND("cg"."growth_rate" * 100, 2) AS "growth_percentage",
ROUND(("cg"."current_avg_order_value" - "ca"."company_avg_order_value") / "ca"."company_avg_order_value" * 100, 2) AS "percent_di
FROM CustomerGrowth "cg"
CROSS JOIN CompanyAverage "ca"
WHERE "cg"."growth_rate" > 0
ORDER BY "cg"."growth_rate" DESC
LIMIT 10;
"Which customers have
shown the highest increase
in average order value from
last quarter to this quarter,
and how does their current
performance compare to
the overall company
average?"
Crafting a Custom Synthetic Dataset
Building a Robust Training Set
Created and validated 100 extremely complex queries
Added 200 less complex queries for
comprehensive coverage
Crafting a Custom Synthetic Dataset
Expanding Through Paraphrasing
Addressed natural language ambiguities
Added 200 less complex queries for
comprehensive coverage
Crafting a Custom Synthetic Dataset
Impact on Model Performance
Improved accuracy and relevance in query results
Enhanced ability to handle varied expressions
and complex queries
System in Action
Expanding Capabilities and Tackling Challenges
• Current Challenges:
• Ongoing model tuning for diverse datasets
• Context management for complex queries
• SQL accuracy and injection prevention
• Future Directions:
• Reduced latency
• High precision during the query generation
• Better intention detection
Thank you!

Exlevel GrowFX for Autodesk 3ds Max Download

  • 1.
    EMPOWERING SALES WITH INTELLIGENTBI AGENTS: DWH, LLM, AND RAG INTEGRATION ŠIMUN ŠUNJIĆ LOVRO MATOŠEVIĆ
  • 3.
  • 4.
  • 5.
    BI LLM AdvancedDatabase Chat System LLM-enhanced SQL generation Retrieval-Augmented Generation (RAG) for context understanding Multi-agent architecture for autonomous management of various processing aspects and state retention Graph databases for global schema context and understanding User-friendly UI for visualization and reporting
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    Schema Linking Essential forhandling complex, multi-table queries Why?
  • 11.
    Schema Linking How? Embed userquery and schema for similarity search​ Periodically update schema Only pick portion of the schema Few-shot with Golden SQL queries and relevance scoring
  • 12.
    Schema Linking Bridges thegap between natural language and database structure Conclusion?
  • 13.
  • 14.
    Query Generation Process Properlydetect users' intent via confidence scoring recommendation utterances: using natural language parsers / LLM
  • 15.
    Query Generation Process Systemprompts: Guide LLM through system prompts by injecting few-shot examples, relevant data from high cardinality columns and the relevant portion of DDL context
  • 16.
    Query Generation Process Conversationhistory: Cache and database records
  • 17.
    Query Generation Process Chattyagents: make sure agents don't fall into recursion
  • 18.
    Query Generation Process Contextmanager: ensure agents share common context storage
  • 19.
    Query Generation Process QueryOptimizer: built in generation process with SQL validation and fixing
  • 20.
    High-cardinal columns • Valueswith millions of unique product IDs • Helps with "vague terms into specific database values" conversions • User query to sub-queries decomposition
  • 21.
    Query Generation Process •The system maintains conversation context and understands business terminology, allowing users to ask follow-up questions naturally. • For example, after seeing sales data, users can simply ask without needing to specify all the details again: o Show this as a chart o Compare with previous year
  • 22.
    Agents • Generate skeletonSQL using graphs and similarity search • Improve the WHERE clause with high-cardinal data • Sub-query agent breaks down the query into different components
  • 23.
    Graphs • "What isthe most effective sales strategy employed by a contemporary of top sales leaders in the industry?" • Reason about relationships to create DAG (directed acyclic graph) • Extract non-local entities connected through multi-hop • Identify root node -> Graph -> Sub-graph = Query -> Sub-query Traditional efficient search methods, such as locality-sensitive hashing, which are designed for similarity search, are not well-suited for extracting complex structural patterns such as paths or subgraphs. Extracting structural information must cover the critical evidence needed to answer the query without exceeding the reasoning capacity of LLMs. Expanding the context window increases computational complexity and can degrade RAG performance by introducing irrelevant information.
  • 24.
  • 25.
    Fine-Tuning Llama 3.1 EnhancingQuery Generation Accuracy Fine-tuning adapts Llama 3.1 to our client's specific needs Improves understanding of complex queries
  • 26.
    Fine-Tuning Llama 3.1 Customizationto Business Context Incorporates specific terminology and data schemas Aligns model with industry- specific language
  • 27.
    Fine-Tuning Llama 3.1 HandlingDomain-Specific Terms "Customer churn" in telecom "Inventory turnover" in retail Examples:
  • 28.
    Crafting a CustomSynthetic Dataset
  • 29.
    Crafting a CustomSynthetic Dataset Utilizing LLMs for Dataset Generation Employed models like ChatGPT and Anthropic Generated tailored question-query pairs for the client's DWH
  • 30.
    Crafting a CustomSynthetic Dataset Building a Robust Training Set Created and validated 100 extremely complex queries Added 200 less complex queries for comprehensive coverage
  • 31.
    Crafting a CustomSynthetic Dataset t eries WITH CustomerOrderValues AS ( SELECT "f_sales"."BILL_CUSTOMER_SID", DATE_TRUNC('quarter', "d_date"."DATE") AS "quarter", AVG("f_sales"."EXTENDED_PRICE") AS "avg_order_value", COUNT(DISTINCT "f_sales"."SALES_DOCUMENT_SID") AS "order_count" FROM "f_sales" JOIN "d_date" ON "f_sales"."ORDER_DATE_SID" = "d_date".date_sid WHERE "d_date"."DATE" >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '3 months' AND "d_date"."DATE" < DATE_TRUNC('quarter', CURRENT_DATE) + INTERVAL '3 months' GROUP BY "f_sales"."BILL_CUSTOMER_SID", DATE_TRUNC('quarter', "d_date"."DATE") ), CustomerGrowth AS ( SELECT "c"."CUSTOMER_SID", "c"."CUSTOMER_NAME", "cov_current"."avg_order_value" AS "current_avg_order_value", "cov_previous"."avg_order_value" AS "previous_avg_order_value", ("cov_current"."avg_order_value" - "cov_previous"."avg_order_value") / "cov_previous"."avg_order_value" AS "growth_rate" FROM "d_customers" "c" JOIN CustomerOrderValues "cov_current" ON "c"."CUSTOMER_SID" = "cov_current"."BILL_CUSTOMER_SID" JOIN CustomerOrderValues "cov_previous" ON "c"."CUSTOMER_SID" = "cov_previous"."BILL_CUSTOMER_SID" WHERE "cov_current"."quarter" = DATE_TRUNC('quarter', CURRENT_DATE) AND "cov_previous"."quarter" = DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '3 months' AND "cov_current"."order_count" >= 5 AND "cov_previous"."order_count" >= 5 ), CompanyAverage AS ( SELECT SUM("f_sales"."EXTENDED_PRICE") / COUNT("f_sales"."SALES_DOCUMENT_SID") AS "company_avg_order_value" FROM "f_sales" JOIN "d_date" ON "f_sales"."ORDER_DATE_SID" = "d_date".date_sid WHERE "d_date"."DATE" >= DATE_TRUNC('quarter', CURRENT_DATE) AND "d_date"."DATE" < DATE_TRUNC('quarter', CURRENT_DATE) + INTERVAL '3 months' ) SELECT "cg"."CUSTOMER_NAME", ROUND("cg"."current_avg_order_value", 2) AS "current_avg_order_value", ROUND("cg"."previous_avg_order_value", 2) AS "previous_avg_order_value", ROUND("cg"."growth_rate" * 100, 2) AS "growth_percentage", ROUND(("cg"."current_avg_order_value" - "ca"."company_avg_order_value") / "ca"."company_avg_order_value" * 100, 2) AS "percent_di FROM CustomerGrowth "cg" CROSS JOIN CompanyAverage "ca" WHERE "cg"."growth_rate" > 0 ORDER BY "cg"."growth_rate" DESC LIMIT 10; "Which customers have shown the highest increase in average order value from last quarter to this quarter, and how does their current performance compare to the overall company average?"
  • 32.
    Crafting a CustomSynthetic Dataset Building a Robust Training Set Created and validated 100 extremely complex queries Added 200 less complex queries for comprehensive coverage
  • 33.
    Crafting a CustomSynthetic Dataset Expanding Through Paraphrasing Addressed natural language ambiguities Added 200 less complex queries for comprehensive coverage
  • 34.
    Crafting a CustomSynthetic Dataset Impact on Model Performance Improved accuracy and relevance in query results Enhanced ability to handle varied expressions and complex queries
  • 35.
  • 37.
    Expanding Capabilities andTackling Challenges • Current Challenges: • Ongoing model tuning for diverse datasets • Context management for complex queries • SQL accuracy and injection prevention • Future Directions: • Reduced latency • High precision during the query generation • Better intention detection
  • 38.