2. Context
I'm neither a data scientist
nor an AI specialist
Just a Java Dev and
Software Architect
Wondering how to leverage
LLMs impressive
capabilities in our Apps
4. How to
• Basic REST/HTTP
• Specific SDK: OpenAI
• Framework: langchain
• Low/No Code: FlowizeAI
• Orchestration tool: RAGNA
5. LangChain
• A popular framework for developing applications powered
by language models
• Assemblages of components for accomplishing higher-
level tasks
• Connect various building blocks: large language models,
document loaders, text splitters, output parsers, vector
stores to store text embeddings, tools, and prompts
• Supports Python and JavaScript
• Launched elf 2022 (just after ChatGPT release)
6. langchain4j
• The “Java version” of langchain
• Simplify the integration of AI/LLM capabilities into your
Java application
• Launched in 2023
• Last release : 0.27.1 (6 March 2024)
7. Quarkus-langchain4j
• Seamless integration between Quarkus and LangChain4j
• Easy incorporation of LLMs into your Quarkus applications
• Launched eof 2023
• Last release : 0.9.0 (6 March 2024) based on langchain4j
0.27.1
8. A fast pace of change
2017
Transformer
GPT1
2018
langchain
2022
2022
ChatGPT
2023
langchain4j
quarkus-langchain4j
2023
10. Defining an AI interface
@RegisterAiService
public interface CustomerSupportAgent {
// Free chat method, unstructured user message
@SystemMessage("You are a customer support agent of a car rental company …")
String chat(String userMessage);
// Structured fraud detection method with parameters
@SystemMessage("You are a car booking fraud detection AI… »)
@UserMessage("Your task is to detect if a fraud was committed for the customer {{name}} {{surname}} …")
String detectFraudForCustomer(String name, String surname);
}
11. LLM configuration
# Connection configuration to Azure OpenAI instance
quarkus.langchain4j.azure-openai.api-key=…
quarkus.langchain4j.azure-openai.resource-name=…
quarkus.langchain4j.azure-openai.deployment-name=…
quarkus.langchain4j.azure-openai.endpoint=…
# Warning: function calls support depends on the api-version
quarkus.langchain4j.azure-openai.api-version=2023-12-01-preview
quarkus.langchain4j.azure-openai.max-retries=2
quarkus.langchain4j.azure-openai.timeout=60S
# Set the model temperature for deterministic (non-creative) behavior (between 0 and 2)
quarkus.langchain4j.azure-openai.chat-model.temperature=0.1
# An alternative (or a complement?) to temperature: 0.1 means only top 10% probable tokens are considered
quarkus.langchain4j.azure-openai.chat-model.top-p=0.1
# Logging requests and responses in dev mode
%dev.quarkus.langchain4j.azure-openai.log-requests=true
%dev.quarkus.langchain4j.azure-openai.log-responses=true
13. Principles
• Augment the LLM with specific knowledge
• From different data sources and formats: text, PDF, CSV …
• First off, the input text is turned into a vectorial representation
• Each request is then completed with relevant selected data
• Vector databases: InMemory, PgVector, Redis, Chroma …
• In-process embedding models: all-minlm-l6-v2-q, bge-small-en, bge-
small-zh …
15. Retrieving relevant contents
public class DocRetriever implements ContentRetriever {
…
// From 0 (low selectivity) to 1 (high selectivity)
private static final double MIN_SCORE = 0.7;
@Inject
public DocRetriever(EmbeddingStore<TextSegment> store, EmbeddingModel model) {
this.retriever = EmbeddingStoreContentRetriever
.builder()
.embeddingModel(model)
.embeddingStore(store)
.maxResults(MAX_RESULTS)
.minScore(MIN_SCORE)
.build();
}
@Override
public List<Content> retrieve(Query query) {
return retriever.retrieve(query);
}
}
16. Binding an AI service to a document retriever
// Binding is defined with the RegisterAiService annotation
@RegisterAiService(retrievalAugmentor = DocRagAugmentor.class))
public interface CustomerSupportAgent { … }
// DocRagAugmentor is an intermediate class supplying the retriever
public class DocRagAugmentor implements Supplier<RetrievalAugmentor> {
@Override
public RetrievalAugmentor get() { … }
}
17. RAG configuration
# Local Embedding Model for RAG
quarkus.langchain4j.embedding-model.provider=dev.langchain4j…AllMiniLmL6V2EmbeddingModel
# Local directory for RAG documents
app.local-data-for-rag.dir=data-for-rag
19. Stephan Pirson, 2023
Basic principles
1. Instruct the LLM to call App functions
2. A function is a Java method annotated with @Tool
3. Function descriptors are sent requests
4. The LLM decides whether it’s relevant to call a function
5. A description of the function call is provided in the response
6. quarkus-langchain4j automatically calls the @Tool method
20. Perspective
Use the LLM as a “workflow
engine”
The LLM is entrusted with the
decision to call business logic
Both powerful and dangerous
Trustable? Reliable?
21. Defining a function
@Tool("Get booking details for booking number {bookingNumber} and customer {name} {surname}")
public Booking getBookingDetails(String bookingNumber, String name, String surname) {
Log.info("DEMO: Calling Tool-getBookingDetails: " + bookingNumber + " and customer: "
+ name + " " + surname);
return checkBookingExists(bookingNumber, name, surname);
}
22. Binding the functions to an AI interface
@RegisterAiService(tools = BookingService.class)
public interface CustomerSupportAgent { … }
26. Example of a booking cancelation
Initial request
Second request
Response: “Your booking 456-789 has
been successfully cancelled, Mr. Bond.
Prompt: “I'm James Bond, can you
cancel my booking 456-789”
Local execution
Third request
call getBookingDetails
POST
final response (finish_reason=stop)
POST cancelBooking result
Stateless request
processing
Local execution
call cancelBooking
POST getBookingDetails result
Stateless request
processing
Stateless request
processing
User Application LLM
28. Lesson learns
• Overall interesting results:
• quarkus-langchain4j makes GenAI really easy!
• Even a generic LLM such as GPT proves to be helpful regarding a specific domain context
• GPT4 is more precise but significantly slower in this example:
• GPT 4 >=5 sec
• GPT 3.5 >=2 sec
• RAG:
• Be selective: set min_score appropriately in your context when retrieving text segments
• Request message can be verbose: selected text segments are added to the user message
• Function calls:
• Not supported by all LLMs
• Powerful and dangerous
• Hard to debug
• Potentially verbose: 1 round-trip per function call
• Many requests under the cover, similar to JPA N+1 queries problem
• Non-deterministic behavior but acceptable with temperature and seed set to minimum
• To be used with care on critical functions: payment, cancelation
29. Next steps
• Testability
• Auditability
• Observability
• Security
• Production readiness
• Real use cases beyond the fun