JF James, 2024
When Java
meets GenAI
at JChateau
Context
I'm neither a data scientist
nor an AI specialist
Just a Java Dev and
Software Architect
Wondering how to leverage
LLMs impressive
capabilities in our Apps
Experimentation
LET’S
EXPERIMENT
QUARKUS-
LANCHAIN4J
EXTENSION
WITH A SIMPLE
CAR BOOKING
APP
FOCUS ON
RAG AND
FUNCTION CALLS
USING AZURE GPT 3.5 & 4
How to
• Basic REST/HTTP
• Specific SDK: OpenAI
• Framework: langchain
• Low/No Code: FlowizeAI
• Orchestration tool: RAGNA
LangChain
• A popular framework for developing applications powered
by language models
• Assemblages of components for accomplishing higher-
level tasks
• Connect various building blocks: large language models,
document loaders, text splitters, output parsers, vector
stores to store text embeddings, tools, and prompts
• Supports Python and JavaScript
• Launched elf 2022 (just after ChatGPT release)
langchain4j
• The “Java version” of langchain
• Simplify the integration of AI/LLM capabilities into your
Java application
• Launched in 2023
• Last release : 0.27.1 (6 March 2024)
Quarkus-langchain4j
• Seamless integration between Quarkus and LangChain4j
• Easy incorporation of LLMs into your Quarkus applications
• Launched eof 2023
• Last release : 0.9.0 (6 March 2024) based on langchain4j
0.27.1
A fast pace of change
2017
Transformer
GPT1
2018
langchain
2022
2022
ChatGPT
2023
langchain4j
quarkus-langchain4j
2023
Defining an AI service
Defining an AI interface
@RegisterAiService
public interface CustomerSupportAgent {
// Free chat method, unstructured user message
@SystemMessage("You are a customer support agent of a car rental company …")
String chat(String userMessage);
// Structured fraud detection method with parameters
@SystemMessage("You are a car booking fraud detection AI… »)
@UserMessage("Your task is to detect if a fraud was committed for the customer {{name}} {{surname}} …")
String detectFraudForCustomer(String name, String surname);
}
LLM configuration
# Connection configuration to Azure OpenAI instance
quarkus.langchain4j.azure-openai.api-key=…
quarkus.langchain4j.azure-openai.resource-name=…
quarkus.langchain4j.azure-openai.deployment-name=…
quarkus.langchain4j.azure-openai.endpoint=…
# Warning: function calls support depends on the api-version
quarkus.langchain4j.azure-openai.api-version=2023-12-01-preview
quarkus.langchain4j.azure-openai.max-retries=2
quarkus.langchain4j.azure-openai.timeout=60S
# Set the model temperature for deterministic (non-creative) behavior (between 0 and 2)
quarkus.langchain4j.azure-openai.chat-model.temperature=0.1
# An alternative (or a complement?) to temperature: 0.1 means only top 10% probable tokens are considered
quarkus.langchain4j.azure-openai.chat-model.top-p=0.1
# Logging requests and responses in dev mode
%dev.quarkus.langchain4j.azure-openai.log-requests=true
%dev.quarkus.langchain4j.azure-openai.log-responses=true
Rietreval
Augmented
Generation
Principles
• Augment the LLM with specific knowledge
• From different data sources and formats: text, PDF, CSV …
• First off, the input text is turned into a vectorial representation
• Each request is then completed with relevant selected data
• Vector databases: InMemory, PgVector, Redis, Chroma …
• In-process embedding models: all-minlm-l6-v2-q, bge-small-en, bge-
small-zh …
Ingesting documents
public void ingest(@Observes StartupEvent evt) throws Exception {
DocumentSplitter splitter = DocumentSplitters.recursive(500, 0);
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor
.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.documentSplitter(splitter)
.build();
List<Document> docs = loadDocs();
ingestor.ingest(docs);
}
Retrieving relevant contents
public class DocRetriever implements ContentRetriever {
…
// From 0 (low selectivity) to 1 (high selectivity)
private static final double MIN_SCORE = 0.7;
@Inject
public DocRetriever(EmbeddingStore<TextSegment> store, EmbeddingModel model) {
this.retriever = EmbeddingStoreContentRetriever
.builder()
.embeddingModel(model)
.embeddingStore(store)
.maxResults(MAX_RESULTS)
.minScore(MIN_SCORE)
.build();
}
@Override
public List<Content> retrieve(Query query) {
return retriever.retrieve(query);
}
}
Binding an AI service to a document retriever
// Binding is defined with the RegisterAiService annotation
@RegisterAiService(retrievalAugmentor = DocRagAugmentor.class))
public interface CustomerSupportAgent { … }
// DocRagAugmentor is an intermediate class supplying the retriever
public class DocRagAugmentor implements Supplier<RetrievalAugmentor> {
@Override
public RetrievalAugmentor get() { … }
}
RAG configuration
# Local Embedding Model for RAG
quarkus.langchain4j.embedding-model.provider=dev.langchain4j…AllMiniLmL6V2EmbeddingModel
# Local directory for RAG documents
app.local-data-for-rag.dir=data-for-rag
Function calls
Stephan Pirson, 2023
Basic principles
1. Instruct the LLM to call App functions
2. A function is a Java method annotated with @Tool
3. Function descriptors are sent requests
4. The LLM decides whether it’s relevant to call a function
5. A description of the function call is provided in the response
6. quarkus-langchain4j automatically calls the @Tool method
Perspective
Use the LLM as a “workflow
engine”
The LLM is entrusted with the
decision to call business logic
Both powerful and dangerous
Trustable? Reliable?
Defining a function
@Tool("Get booking details for booking number {bookingNumber} and customer {name} {surname}")
public Booking getBookingDetails(String bookingNumber, String name, String surname) {
Log.info("DEMO: Calling Tool-getBookingDetails: " + bookingNumber + " and customer: "
+ name + " " + surname);
return checkBookingExists(bookingNumber, name, surname);
}
Binding the functions to an AI interface
@RegisterAiService(tools = BookingService.class)
public interface CustomerSupportAgent { … }
LLM initial request
"functions":[
{
"name":"getBookingDetails",
"description":"Get booking details for {bookingNumber} and customer {firstName} {lastName}",
"parameters":{
"type":"object",
"properties":{
"firstName":{
"type":"string"
},
"lastName":{
"type":"string"
},
"bookingNumber":{
"type":"string"
}
},
"required":[
"bookingNumber",
"firstName",
"lastName"
]
}
}, …]
LLM intermediate response
"choices":[
{
"finish_reason":"function_call",
"index":0,
"message":{
"role":"assistant",
"function_call":{
"name":"getBookingDetails",
"arguments":"{"firstName":"James","lastName":"Bond","bookingNumber":"456-789"}"
}
},
…
}
]
LLM intermediate request
{
"role":"function",
"name":"getBookingDetails",
"content":"{"bookingNumber" : "456-789",
"customer" : { "firstName" : "James", "lastName" : "Bond" },
"startDate" : "2024-03-01",
"endDate" : "2024-03-09",
"carModel" : "Volvo",
"cancelled" : false}"
}
Example of a booking cancelation
Initial request
Second request
Response: “Your booking 456-789 has
been successfully cancelled, Mr. Bond.
Prompt: “I'm James Bond, can you
cancel my booking 456-789”
Local execution
Third request
call getBookingDetails
POST
final response (finish_reason=stop)
POST cancelBooking result
Stateless request
processing
Local execution
call cancelBooking
POST getBookingDetails result
Stateless request
processing
Stateless request
processing
User Application LLM
Lessons learnt
Lesson learns
• Overall interesting results:
• quarkus-langchain4j makes GenAI really easy!
• Even a generic LLM such as GPT proves to be helpful regarding a specific domain context
• GPT4 is more precise but significantly slower in this example:
• GPT 4 >=5 sec
• GPT 3.5 >=2 sec
• RAG:
• Be selective: set min_score appropriately in your context when retrieving text segments
• Request message can be verbose: selected text segments are added to the user message
• Function calls:
• Not supported by all LLMs
• Powerful and dangerous
• Hard to debug
• Potentially verbose: 1 round-trip per function call
• Many requests under the cover, similar to JPA N+1 queries problem
• Non-deterministic behavior but acceptable with temperature and seed set to minimum
• To be used with care on critical functions: payment, cancelation
Next steps
• Testability
• Auditability
• Observability
• Security
• Production readiness
• Real use cases beyond the fun
Code available on GitHub
https://github.com/jefrajames/car-booking

When GenAI meets with Java with Quarkus and langchain4j

  • 1.
    JF James, 2024 WhenJava meets GenAI at JChateau
  • 2.
    Context I'm neither adata scientist nor an AI specialist Just a Java Dev and Software Architect Wondering how to leverage LLMs impressive capabilities in our Apps
  • 3.
    Experimentation LET’S EXPERIMENT QUARKUS- LANCHAIN4J EXTENSION WITH A SIMPLE CARBOOKING APP FOCUS ON RAG AND FUNCTION CALLS USING AZURE GPT 3.5 & 4
  • 4.
    How to • BasicREST/HTTP • Specific SDK: OpenAI • Framework: langchain • Low/No Code: FlowizeAI • Orchestration tool: RAGNA
  • 5.
    LangChain • A popularframework for developing applications powered by language models • Assemblages of components for accomplishing higher- level tasks • Connect various building blocks: large language models, document loaders, text splitters, output parsers, vector stores to store text embeddings, tools, and prompts • Supports Python and JavaScript • Launched elf 2022 (just after ChatGPT release)
  • 6.
    langchain4j • The “Javaversion” of langchain • Simplify the integration of AI/LLM capabilities into your Java application • Launched in 2023 • Last release : 0.27.1 (6 March 2024)
  • 7.
    Quarkus-langchain4j • Seamless integrationbetween Quarkus and LangChain4j • Easy incorporation of LLMs into your Quarkus applications • Launched eof 2023 • Last release : 0.9.0 (6 March 2024) based on langchain4j 0.27.1
  • 8.
    A fast paceof change 2017 Transformer GPT1 2018 langchain 2022 2022 ChatGPT 2023 langchain4j quarkus-langchain4j 2023
  • 9.
  • 10.
    Defining an AIinterface @RegisterAiService public interface CustomerSupportAgent { // Free chat method, unstructured user message @SystemMessage("You are a customer support agent of a car rental company …") String chat(String userMessage); // Structured fraud detection method with parameters @SystemMessage("You are a car booking fraud detection AI… ») @UserMessage("Your task is to detect if a fraud was committed for the customer {{name}} {{surname}} …") String detectFraudForCustomer(String name, String surname); }
  • 11.
    LLM configuration # Connectionconfiguration to Azure OpenAI instance quarkus.langchain4j.azure-openai.api-key=… quarkus.langchain4j.azure-openai.resource-name=… quarkus.langchain4j.azure-openai.deployment-name=… quarkus.langchain4j.azure-openai.endpoint=… # Warning: function calls support depends on the api-version quarkus.langchain4j.azure-openai.api-version=2023-12-01-preview quarkus.langchain4j.azure-openai.max-retries=2 quarkus.langchain4j.azure-openai.timeout=60S # Set the model temperature for deterministic (non-creative) behavior (between 0 and 2) quarkus.langchain4j.azure-openai.chat-model.temperature=0.1 # An alternative (or a complement?) to temperature: 0.1 means only top 10% probable tokens are considered quarkus.langchain4j.azure-openai.chat-model.top-p=0.1 # Logging requests and responses in dev mode %dev.quarkus.langchain4j.azure-openai.log-requests=true %dev.quarkus.langchain4j.azure-openai.log-responses=true
  • 12.
  • 13.
    Principles • Augment theLLM with specific knowledge • From different data sources and formats: text, PDF, CSV … • First off, the input text is turned into a vectorial representation • Each request is then completed with relevant selected data • Vector databases: InMemory, PgVector, Redis, Chroma … • In-process embedding models: all-minlm-l6-v2-q, bge-small-en, bge- small-zh …
  • 14.
    Ingesting documents public voidingest(@Observes StartupEvent evt) throws Exception { DocumentSplitter splitter = DocumentSplitters.recursive(500, 0); EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor .builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .documentSplitter(splitter) .build(); List<Document> docs = loadDocs(); ingestor.ingest(docs); }
  • 15.
    Retrieving relevant contents publicclass DocRetriever implements ContentRetriever { … // From 0 (low selectivity) to 1 (high selectivity) private static final double MIN_SCORE = 0.7; @Inject public DocRetriever(EmbeddingStore<TextSegment> store, EmbeddingModel model) { this.retriever = EmbeddingStoreContentRetriever .builder() .embeddingModel(model) .embeddingStore(store) .maxResults(MAX_RESULTS) .minScore(MIN_SCORE) .build(); } @Override public List<Content> retrieve(Query query) { return retriever.retrieve(query); } }
  • 16.
    Binding an AIservice to a document retriever // Binding is defined with the RegisterAiService annotation @RegisterAiService(retrievalAugmentor = DocRagAugmentor.class)) public interface CustomerSupportAgent { … } // DocRagAugmentor is an intermediate class supplying the retriever public class DocRagAugmentor implements Supplier<RetrievalAugmentor> { @Override public RetrievalAugmentor get() { … } }
  • 17.
    RAG configuration # LocalEmbedding Model for RAG quarkus.langchain4j.embedding-model.provider=dev.langchain4j…AllMiniLmL6V2EmbeddingModel # Local directory for RAG documents app.local-data-for-rag.dir=data-for-rag
  • 18.
  • 19.
    Stephan Pirson, 2023 Basicprinciples 1. Instruct the LLM to call App functions 2. A function is a Java method annotated with @Tool 3. Function descriptors are sent requests 4. The LLM decides whether it’s relevant to call a function 5. A description of the function call is provided in the response 6. quarkus-langchain4j automatically calls the @Tool method
  • 20.
    Perspective Use the LLMas a “workflow engine” The LLM is entrusted with the decision to call business logic Both powerful and dangerous Trustable? Reliable?
  • 21.
    Defining a function @Tool("Getbooking details for booking number {bookingNumber} and customer {name} {surname}") public Booking getBookingDetails(String bookingNumber, String name, String surname) { Log.info("DEMO: Calling Tool-getBookingDetails: " + bookingNumber + " and customer: " + name + " " + surname); return checkBookingExists(bookingNumber, name, surname); }
  • 22.
    Binding the functionsto an AI interface @RegisterAiService(tools = BookingService.class) public interface CustomerSupportAgent { … }
  • 23.
    LLM initial request "functions":[ { "name":"getBookingDetails", "description":"Getbooking details for {bookingNumber} and customer {firstName} {lastName}", "parameters":{ "type":"object", "properties":{ "firstName":{ "type":"string" }, "lastName":{ "type":"string" }, "bookingNumber":{ "type":"string" } }, "required":[ "bookingNumber", "firstName", "lastName" ] } }, …]
  • 24.
  • 25.
    LLM intermediate request { "role":"function", "name":"getBookingDetails", "content":"{"bookingNumber": "456-789", "customer" : { "firstName" : "James", "lastName" : "Bond" }, "startDate" : "2024-03-01", "endDate" : "2024-03-09", "carModel" : "Volvo", "cancelled" : false}" }
  • 26.
    Example of abooking cancelation Initial request Second request Response: “Your booking 456-789 has been successfully cancelled, Mr. Bond. Prompt: “I'm James Bond, can you cancel my booking 456-789” Local execution Third request call getBookingDetails POST final response (finish_reason=stop) POST cancelBooking result Stateless request processing Local execution call cancelBooking POST getBookingDetails result Stateless request processing Stateless request processing User Application LLM
  • 27.
  • 28.
    Lesson learns • Overallinteresting results: • quarkus-langchain4j makes GenAI really easy! • Even a generic LLM such as GPT proves to be helpful regarding a specific domain context • GPT4 is more precise but significantly slower in this example: • GPT 4 >=5 sec • GPT 3.5 >=2 sec • RAG: • Be selective: set min_score appropriately in your context when retrieving text segments • Request message can be verbose: selected text segments are added to the user message • Function calls: • Not supported by all LLMs • Powerful and dangerous • Hard to debug • Potentially verbose: 1 round-trip per function call • Many requests under the cover, similar to JPA N+1 queries problem • Non-deterministic behavior but acceptable with temperature and seed set to minimum • To be used with care on critical functions: payment, cancelation
  • 29.
    Next steps • Testability •Auditability • Observability • Security • Production readiness • Real use cases beyond the fun
  • 30.
    Code available onGitHub https://github.com/jefrajames/car-booking