Databricks-Generative-AI-Engineer-Associate exam dumps

Questions & Answers
(Demo Version - Limited Content)
DATABRICKS
Databricks-Generative-AI-Engineer-
Associate Exam
Databricks Certified Generative AI
Engineer Associate Exam
https://www.certifiedumps.com/databricks/databricks-generative-ai-engineer-associate-dumps.html
Thank you for Downloading Databricks-Generative-AI-Engineer-Associate Exam PDF Demo
Get Full File:

Questions & Answers PDF Page 2
Which of the following considerations is most important when creating and querying a Vector Search
index for use in a Generative AI application in Databricks?
A. Choose a vector indexing method optimized for high-dimensional data and ensure it supports
efficient similarity search operations.
B. Use a SQL-based search engine to ensure the embeddings can be queried using standard SQL
queries.
C.
D.
Store the embeddings in a CSV format for easier querying and storage within Databricks.
Ensure the document corpus is indexed in a relational database before creating vector embeddings.
When working with Generative AI applications in Databricks that require vector search, it is crucial to use
an indexing method that is optimized for high-dimensional data. Embeddings used in such models are
typically high-dimensional vectors, and the search needs to be efficient in terms of both speed and
accuracy. Using a vector indexing method such as FAISS or Annoy, which are specifically designed for
similarity search in high-dimensional spaces, ensures that the application can perform efficiently. Other
methods like relational databases or CSV formats would not be optimized for this purpose and would
result in slower and less efficient querying.
You have successfully trained a machine learning model in Databricks using MLflow. Your next task is to
register the model to Unity Catalog for easy discovery and management. What are the correct steps you
should take to ensure the model is properly registered? (Select two)
A. Tag the model with a Unity Catalog-specific tag using mlflow.set_tag() before registering it.
B. Use the Databricks Model Registry to register the model and select "Unity Catalog" as the
destination.
C. Register the model manually by navigating to the Unity Catalog tab in the Databricks workspace.
D. Set the environment variable MLFLOW_MODEL_REGISTRY_URI to the Unity Catalog URI before
running your MLflow script.
E. Use the MLflow mlflow.register_model() function with the Unity Catalog URI.
To properly register a machine learning model in Unity Catalog after training it with MLflow in Databricks,
the correct steps involve using the Databricks Model Registry and MLflow functions that interface with
Unity Catalog:
• B. The Databricks Model Registry allows you to manage, discover, and version your models. When
registering a model, you can select Unity Catalog as the destination, which ensures that the model is
Question: 1
Question: 2
Answer: A
Explanation:
Answer: B, E
Explanation:
www.certifiedumps.com

You are tasked with building a text generation model that will generate marketing content for various
products. The generated text should have coherence, relevance to the product descriptions, and a
controlled length. The primary requirements are scalability, low latency during inference, and the ability
to fine-tune the model with domain-specific data. Which architecture would be the most appropriate for
your task?
You are preparing a large legal document to be used in a generative AI model for text summarization.
The document has many chapters, and each chapter contains multiple sections with varying lengths.
The model you're using has a token limit of 2048 tokens for processing. Which of the following chunking
strategies would best ensure efficient processing of the document without exceeding the token limit?
A. Chunk the document into sections, further splitting large sections into smaller chunks that respect
sentence boundaries while staying within the 2048-token limit.
B. Chunk the document into chapters, ensuring each chapter fits within the model’s token limit.
C. Chunk the entire document into sections, where each section is treated as one chunk regardless of
length.
D. Dynamically chunk the document based on token count, ensuring that each chunk contains no more
than 2048 tokens, even if it cuts off in the middle of a sentence.
Answer: A
Explanation:
available within the Unity Catalog framework for easy discovery and management.
• E. You can also use the MLflow mlflow.register_model() function with the Unity Catalog URI to
programmatically register the model, ensuring the model is linked to the Unity Catalog for future
management.
Options like tagging the model or setting an environment variable are not necessary for registration to
Unity Catalog. Instead, using the Model Registry or MLflow functions with the correct Unity Catalog URI is
the most direct and appropriate method.
When preparing a large legal document for use in a generative AI model that has a token limit of 2048, the
most efficient way to chunk the text is to split the document into smaller, manageable sections that respect
natural language boundaries, such as sentences. This ensures that each chunk is coherent and meaningful
without exceeding the model's token limit.
• Option A is the best strategy because it preserves the logical flow of the content while ensuring each
chunk stays within the model's constraints, avoiding fragmented or incomplete sentences.
• Option B may result in chapters that are too large to fit within the token limit, while Option C doesn't
account for section lengths, which could also exceed the limit.
• Option D, although it ensures the token limit is respected, could cut off sentences, leading to
incomplete or less meaningful chunks for the model to process.
Question: 3
Question: 4

3.
Questions & Answers PDF
A. GPT-3 (Generative Pre-trained Transformer)
B. LSTM (Long Short-Term Memory) Network
C. Transformer Encoder-Only Model
D. BERT (Bidirectional Encoder Representations from Transformers)
Page 4
The Overwrite mode ensures that existing data in the
You are tasked with writing a large, chunked text dataset into Delta Lake tables within Unity Catalog. The
data needs to be prepared efficiently for querying and analysis. Which of the following is the correct
sequence of operations to write the chunked text data into a Delta Lake table?
A. Combine chunks →Convert to DataFrame →Define Delta Table schema →Write to Delta Lake in
Merge mode
B. Combine chunks →Convert to DataFrame →Write to Delta Lake in Overwrite mode
C. Convert to DataFrame →Combine chunks →Write to Delta Lake in Append mode
D. Combine chunks →Create Delta Table schema →Write to Delta Lake in Append mode
The correct process to efficiently write a large chunked text dataset into Delta Lake tables involves:
1. Combine chunks: Since the dataset is chunked, it must first be combined into a cohesive structure
that can be processed effectively.
2. Convert to DataFrame: Delta Lake operates on Spark DataFrames, so the combined chunks must
be converted into a DataFrame.
For generating marketing content that requires coherence, relevance to product descriptions, controlled
length, and the ability to fine-tune the model with domain-specific data, GPT-3 is the most appropriate
architecture. It is specifically designed for text generation tasks and has the following advantages:
• Scalability: GPT-3 is highly scalable and can handle large volumes of text data.
• Low latency during inference: Its architecture is optimized for generating text quickly.
• Fine-tuning: GPT-3 can be fine-tuned with domain-specific data to produce content that is highly
relevant to specific products or contexts.
Other models like LSTM and BERT are not as optimized for generative tasks. LSTM struggles with long-
range dependencies, while BERT, being an encoder-only model, is more suited for tasks like classification
and token prediction rather than text generation. Transformer encoder-only models focus on
understanding input sequences but lack the generative capabilities of GPT-3.
Question: 5
Answer: A
Explanation:
Answer: B
Explanation:
Write to Delta Lake in Overwrite mode:

:
Optimized writes help to reduce the overhead involved in writing data, ensuring that the
performance impact of applying data masking is minimized while still securing sensitive
information.
E. Creating Materialized Views with Masking Logic Pre-applied: By creating materialized
views that have masking logic applied, you can ensure sensitive information is protected while
maintaining optimal query performance. The views precompute the masked data, making
reporting queries faster.
Page 5
You are working with a large-scale dataset on Databricks that includes personal information. You are
required to mask sensitive information while ensuring query performance remains optimal for reporting.
Which of the following techniques should you consider to meet both security and performance
objectives? (Select two)
A. Implementing Fine-Grained Access Control and avoiding masking to improve performance
B. Leveraging Databricks' Z-Order Indexing to Speed Up Queries with Masking
C. Using Databricks' Optimized Writes to Minimize Performance Impact of Masking
D. Using Pass-Through Authentication to Ensure Performance with Masked Data
E. Creating Materialized Views with Masking Logic Pre-applied
: While fine-grained access control helps secure data, it does not mask sensitive
information. Masking is still needed to ensure data security in scenarios where personal
information is exposed.
B. Leveraging Databricks' Z-Order Indexing to Speed Up Queries with Masking: Z-Order
indexing helps optimize query performance but does not directly address masking or security. It is
not sufficient by itself for handling sensitive data.
D. Using Pass-Through Authentication to Ensure Performance with Masked Data: Pass-
through authentication controls user access but does not involve data masking. It ensures identity
management but does not protect sensitive information during queries.
table is replaced by the new data being written, which is suitable when working with large datasets
that need to be refreshed or replaced.
Append mode is not ideal in this case because it would add data without replacing the existing table
contents, and schema definitions are typically handled during the DataFrame creation phase rather than
as a separate step. Therefore, Option B is the most efficient sequence of operations for this task.
Question: 7
Question: 6
Answer: C, E
Explanation:
Why not the others:
• A. Implementing Fine-Grained Access Control and avoiding masking to improve
performance
• C. Using Databricks' Optimized Writes to Minimize Performance Impact of Masking
•
•
•

This context length strikes a balance between being large enough to capture meaningful product
descriptions, customer reviews, and features, while also being optimized for fast, accurate
responses. With customer queries being short (1-2 sentences), 512 tokens provide sufficient
context for embedding product descriptions and reviews without using excessive memory.
You are designing a recommendation engine for an e-commerce platform. The source documents
consist of product descriptions, customer reviews, and seller guidelines, ranging from 50 to 1000 words.
Customer queries are typically short (1-2 sentences) and focus on finding specific products or features.
You want to optimize the system for fast, accurate responses to queries while minimizing unnecessary
memory usage. Which context length for the embedding model would be most appropriate for your use
case?
A.
B.
C.
D.
2048 tokens
128 tokens
256 tokens
512 tokens
• 2048 tokens would be overkill for short queries and relatively small source documents, leading
to unnecessary memory usage without significant gains in accuracy.
• B. 128 tokens would be too short to capture sufficient context from product descriptions and
customer reviews, potentially affecting the quality of the recommendations.
C. 256 tokens, while better than 128, might still fall short in handling longer product descriptions or
reviews, leading to incomplete context.
•
Choosing 512 tokens ensures that enough context is captured for accurate recommendations while
keeping memory usage efficient.
You are preparing to deploy a Retrieval-Augmented Generation (RAG) model on Databricks. Which two
of the following elements are critical to ensure that your deployment functions correctly and can process
queries as expected? (Select two)
A. A dependency management tool such as conda to ensure compatibility of all components.
B. An embedding model to convert query text into a vector representation for document retrieval.
C.
D.
A configuration for distributed training to ensure efficient parallelism.
A model signature that specifies the input and output format of the deployed model.
E. A dataset of labeled training examples to fine-tune the generative model.
Question: 8
Answer: D
Explanation:
Answer: B, D
Explanation:
• D. 512 tokens
Why not the others:
A.

BLEU and ROUGE are common metrics for evaluating the quality of generated text. Regularly
tracking these metrics on a fixed set of evaluation queries allows you to assess whether the
model's response generation quality is degrading over time, indicating model drift.
Page 7
Monitoring the accuracy of the retrieval step is useful, but it does not capture the quality of the
generated responses, which is crucial for identifying model drift in a RAG model.
• While a dependency management tool like conda is helpful for managing environments, it is
not critical to the core functioning of the RAG model in query processing.
• C. Distributed training is beneficial for scaling and performance but not directly required for
deploying a RAG model, especially if the model is already trained.
• E. A labeled training dataset is important for fine-tuning, but if you are deploying an already fine-
tuned RAG model, it is not necessary for deployment and query processing.
The embedding model (B) and model signature (D) are critical to ensure that the RAG system can
retrieve relevant documents and process queries correctly.
B. An embedding model is essential to convert query text into a vector representation, which is
then used for document retrieval in the Retrieval-Augmented Generation (RAG) system. This step
is critical for retrieving relevant documents based on user queries.
D. A model signature specifies the input and output format of the deployed model, ensuring
that the system can correctly process queries and return responses in the expected format.
This is critical for a properly functioning deployment.
You have deployed a RAG model for document retrieval and response generation in a customer service
application. Over time, you want to monitor if the performance of your model degrades, particularly in
terms of its ability to generate useful and accurate responses. Which of the following approaches would
be most appropriate for using MLflow to monitor model drift over time?
A. Monitor the accuracy of the retrieval step over time
B. Track the number of queries processed by the model daily
C.
D.
Monitor the change in the learning rate and number of training epochs used in fine-tuning the model
Regularly log BLEU and ROUGE scores on a fixed set of evaluation queries and compare them
over time
Answer: D
Explanation:
Question: 9
•
•
Why not the others:
A.
Why not the others:
• A.
• D. Regularly log BLEU and ROUGE scores on a fixed set of evaluation queries and
compare them over time

By
endpoint
This is a key consideration for efficient deployment and scalability. Fine-tuning the LLM on
Databricks ensures that the model is adapted to your specific use case. Registering the model
with MLflow allows for version control and tracking, and serving the model through a REST API
endpoint ensures it can be scaled efficiently as part of the Databricks infrastructure.
• B. Tracking the number of queries processed is more related to monitoring usage patterns rather
than detecting performance degradation.
• C. Changes in learning rate or training epochs would be relevant during fine-tuning but are not
directly helpful for monitoring model performance in production after deployment.
Tracking BLEU and ROUGE scores regularly helps detect performance issues related to response
quality and ensures that the model maintains its accuracy and usefulness over time.
A company wants to build a system where users can input natural language questions, and the system
When serving an LLM application using Foundation Model APIs on Databricks, which of the following is
a key consideration for ensuring efficient deployment and scalability?
A. Fine-tune the LLM on Databricks and register the model with MLflow for version control, then use a
Databricks REST API endpoint to serve the model.
B. Ensure the LLM is fully retrained on your specific dataset before deploying it to Databricks, as pre-
trained models are not suitable for Foundation Model APIs.
C.
D.
Store the LLM as a Delta table in Unity Catalog and query it in real-time using SQL endpoints.
The LLM should be downloaded locally and deployed on a custom virtual machine for scalability.
Pre-trained models can be used with Foundation Model APIs on Databricks. Fine-tuning may
be necessary for specific use cases, but full retraining is not required.
C. Storing the LLM as a Delta table is not an appropriate method for serving or querying an LLM in
real-time. Delta tables are used for structured data, not for serving machine learning models.
D. Deploying the LLM on a custom virtual machine is not scalable compared to the cloud
infrastructure provided by Databricks, which allows for easy scaling and management.
, registering it with MLflow, and serving it through a Databricks REST API
, you ensure efficient deployment and scalability of the model.
Question: 10
Question: 11
•
•
Why not the others:
• B.
fine-tuning the LLM
Explanation:
• A. Fine-tune the LLM on Databricks and register the model with MLflow for version control,
then use a Databricks REST API endpoint to serve the model.
Answer: A

This combination is ideal for building a system that retrieves relevant information from a
document repository and generates natural language answers. The
uses dense embeddings to search and retrieve the most relevant documents, and the
model (e.g., GPT-based) produces coherent, natural language answers based on the retrieved
documents, making it highly effective for answering user queries.
retrieves relevant information from a document repository, then generates a natural language answer.
The system should use a retriever component to search the document repository and a generator
component to produce answers in natural language based on the retrieved documents. Which
combination of components would best fit this requirement?
A. Named Entity Recognition (NER) model followed by a text summarization model
B. Dense passage retriever followed by a generative model (e.g., GPT-based)
C.
D.
Text classification model followed by a text generation model
Keyword-based search engine followed by a text summarization model
• Named Entity Recognition (NER) is used to extract specific entities, not for document retrieval
or generating answers.
• C. A text classification model categorizes inputs but doesn’t retrieve or generate responses.
• D. A keyword-based search engine is less effective than dense retrievers for finding relevant
documents in a natural language context, and text summarization models do not generate new
answers but only summarize content.
Using a dense passage retriever and a generative model provides both accurate retrieval and natural
language generation, making it the best fit for this requirement.
You are working on a project to build a Generative AI application using Langchain in Databricks. You
need to implement a simple chain that takes user input (a paragraph of text) and returns a summary of
that text. Choose the correct implementation that creates and uses a chain to achieve this task. Which of
the following code snippets correctly implements a simple summarization chain using Langchain?
A.
Question: 12
Answer: B
Explanation:
Why not the others:
A.
• B. Dense passage retriever followed by a generative model (e.g., GPT-based)
dense passage retriever
generative

B.
C.
D.

is the correct implementation. It uses the LLMChain to create a simple summarization chain.
The chain uses a PromptTemplate to structure the summarization request, and the LLM
processes the input to return the summary.
Page 11
You are developing an AI-powered sentiment analysis application in Databricks using a large language
model (LLM). The task is to classify customer reviews as either positive or negative. You notice
inconsistent results when the input prompt is written in various formats. Which of the following prompt
formats is most likely to generate the most accurate result when requesting the model to classify the
sentiment of a review?
A. Prompt: "The customer says: '[REVIEW TEXT]'. Is the sentiment positive?"
B. Prompt: "Analyze the sentiment of this text: [REVIEW TEXT]."
C. Prompt: "[REVIEW TEXT]. What is the mood of the speaker?"
D. Prompt: "Classify the following review as either Positive or Negative: [REVIEW TEXT]."
uses SummarizationPrompt, which is not a valid class in Langchain. Also, the input structure
does not align with Langchain's typical workflow.
C uses SummarizationChain, which does not exist in Langchain.
D incorrectly uses InputExample and sets up the chain improperly. The correct approach involves
using PromptTemplate, as seen in option B.
•
•
Answer: B
Explanation:
Answer: D
Explanation:
• B
Why not the others:
• A
• D. "Classify the following review as either Positive or Negative: [REVIEW TEXT]."
This prompt format provides clear instructions to the model, explicitly asking for a classification
Question: 13

B.
You are building a chatbot application using Langchain in Databricks that takes a user’s query and
provides a response from a language model. You want to deploy a simple conversational chain to
respond to user queries. Choose the correct implementation for this chain. Which of the following code
snippets correctly implements a conversational chain for chatbot interaction using Langchain?
A.
• "Is the sentiment positive?" might bias the model toward a positive response by framing the
question as a yes/no query, rather than asking for a neutral classification.
• B. "Analyze the sentiment of this text" is too general and may result in an answer that’s less
specific, potentially describing the mood instead of explicitly classifying it as positive or negative.
• C. "What is the mood of the speaker?" may prompt the model to infer emotions or attitudes that
are more complex than simple positive/negative sentiment, which can lead to inconsistent results.
By clearly asking for a classification between positive and negative, option D creates the most
direct and effective prompt for a sentiment analysis task.
of the sentiment with defined categories ("Positive" or "Negative"). It sets up the task in a way that
aligns with how sentiment analysis is typically framed and removes ambiguity, leading to more
accurate and consistent results.
Question: 14
Why not the others:
A.

C.
D.
This option correctly uses the LLMChain to create a conversational chain with a prompt template.
The PromptTemplate takes a user’s question and passes it to the language model (in this case,
OpenAI), generating a response based on the provided input.
Page 13
Answer: B
Explanation:
A.
Why not the others:

B.
C.
1. Summarize the following research paper.
2. - Abstract: {abstract}
3. - Title: {title}
4. Keywords: Automatically generate from the abstract.
5. Functions available: Extract keywords, Generate outline.
6. Summary: {summary}
Option
You are tasked with creating a prompt template for a generative AI system that will help users generate
summaries of research papers. The system should allow the user to input the abstract and a few key
details (e.g., title, keywords) while generating a concise, well-structured summary. Additionally, the
template should expose available functions like extracting keywords and generating content outlines.
The goal is to design a template that minimizes user interaction errors and maximizes the prompt's
effectiveness. Which of the following prompt templates best accomplishes this task, considering both
structure and function exposure?
A.
• A. The SimpleChain does not exist in Langchain for conversational purposes; it's not suitable for
chatbot interactions.
• C. ConversationalRetrievalChain is used when combining retrieval-based systems with language
models, but it’s not appropriate for simple chatbot queries without retrieval functionality.
• D. The ConversationalChain does not exist in Langchain; it's not valid for creating a chatbot
interaction.
Thus, B provides the correct implementation of a conversational chain using LLMChain and a prompt
template.
provides a well-structured and user-friendly prompt template that allows for both flexibility and
Explanation:
A
1.
2.
3.
4.
5.
6.
Provide the necessary information to generate a research summary:
Abstract: {abstract}
Title: {title}
Keywords: Optional (generated from abstract if left blank).
Available functions: [extract_keywords(), generate_outline()]
Summary: {summary}
1. Research Summary Generator:
2. - Abstract: {abstract}
3. - Title: {title}
4. - Keywords: {keywords}
5. - Generate Outline:
Yes/No 6.
7. Please summarize the content based on the provided abstract and keywords.
Answer: A
Question: 15

Clearly stating in the metaprompt that sensitive information like phone numbers or addresses
should not be generated ensures that the model avoids outputting private data.
D. Instructing the model to avoid generating or referencing data not explicitly provided helps
minimize hallucinations and ensures the model only uses the provided input data.
The structure is somewhat redundant and less clear on the functionality of available functions
like keyword extraction and outline generation. It may cause confusion by blending multiple
operations in one prompt.
• C. This option lacks detail and clarity regarding the available functions, making it harder for users
to understand how to interact with the system effectively.
Thus, A provides the optimal structure and function exposure to create a concise, well-structured
summary with minimal user interaction errors.
Avoiding limitations in the metaprompt could lead to more flexible responses but increases the
risk of generating inappropriate or sensitive data.
C. Repeating input data is unnecessary and does not help minimize hallucinations or protect
sensitive information.
You are designing a metaprompt for a generative AI application in Databricks that will handle sensitive
customer information, such as phone numbers and addresses. Your primary objective is to minimize
hallucinations and prevent the leakage of private data. Which two approaches should you include in your
metaprompt to achieve this? (Select two)
A. Avoid specifying limitations in the metaprompt to allow for more flexible responses.
B. Clearly state in the metaprompt that any sensitive information, such as phone numbers or
addresses, must not be generated or included in the response.
C.
D.
Instruct the model to repeat all the input data to ensure accuracy in its output.
Specify that the model should refrain from generating or referencing data that was not explicitly
provided in the input.
E. Use a temperature setting of 1.5 to encourage more creative and diverse outputs from the model.
functionality. It ensures the user inputs critical information (abstract, title) while offering optional keyword
generation. It also clearly exposes functions like extract_keywords() and generate_outline() to enhance
user interaction and minimize errors, making the template effective for generating concise and structured
summaries.
Question: 16
Explanation:
• B.
Why not the others:
• B.
Why not the others:
• A.
•
•
Answer: B, D

C: Autoscaling helps manage compute resources, but it doesn't address the core issue of
reducing expensive LLM inference costs.
: Model checkpointing is useful for training but doesn't directly reduce inference costs in a
production RAG system.
D: Reducing the number of tokens may compromise the quality of responses, which isn't ideal if
maintaining response quality is critical.
Page 16
allows for
significant cost savings. Optimizing prompts can reduce unnecessary token usage, and caching
common results prevents the need to rerun inferences for frequently asked queries, thereby
controlling costs while maintaining the quality of responses.
E. Increasing the temperature encourages more creative outputs but can lead to less control over
the responses, increasing the likelihood of hallucinations.
You are working with a Retrieval-Augmented Generation (RAG) application that uses a large language
model (LLM) to generate responses. The cost of running this application is increasing due to high usage
of the LLM for inference. What is the most effective way to use Databricks features to control costs
without compromising the quality of responses?
A. Use model checkpointing to avoid retraining the LLM from scratch for each query
B. Employ prompt optimization techniques and cache common query results in Databricks
C.
D.
Use the Databricks autoscaling feature to scale compute clusters based on LLM load
Decrease the number of tokens used for generation by reducing the max tokens parameter in the
LLM
Answer: B
Explanation:
A developer is working with a large language model (LLM) to generate summaries of long technical
reports. However, the initial summaries are too detailed. The developer wants to create a prompt that
adjusts the LLM's response to provide more concise summaries. Which of the following prompt
modifications would most effectively adjust the LLM’s output from a detailed summary to a concise one?
(Select two)
A. “Give an in-depth analysis of the report, including all technical aspects and details.”
B. “Summarize the report in 500 words or less, focusing on the technical details.”
C. “Provide a bullet-point summary of the key highlights from the report.”
Question: 17
Question: 18
•
•
•
Why not the others:
• A
• Employing prompt optimization techniques and caching common query results

D.
full.”
E. “Generate a concise executive summary focusing on only the most important findings.”
“Summarize the report, making sure to include the abstract, conclusion, and every major section in
You are deploying a Retrieval-Augmented Generation (RAG) application on Databricks. This application
must allow users to submit queries that are embedded into vector space, retrieve the most relevant
documents using a retriever, and then pass them to a generative model for response generation. In
order to deploy this application, you must ensure that all necessary elements, including dependencies
and model signature, are properly specified for a seamless integration into Databricks and for future use
by other teams. Which of the following lists the essential components required to deploy this RAG
application?
A. Pre-trained language model, document retriever, tokenizer, SQL query generator, dependencies,
and input pipeline.
B. Language model, input format parser, retriever, output formatter, embedding index, and model
signature.
C. Embedding model, retriever, generative model, dependencies, model signature, and input
To adjust the LLM's output from a detailed summary to a more concise one, the prompt needs to clearly
instruct the model to focus on brevity and essential information:
• C. “Provide a bullet-point summary of the key highlights from the report.”: Bullet points
naturally encourage brevity and a focus on the most important points, which leads to a more concise
output.
•
: This explicitly asks for a "concise" summary and narrows the focus to only the "most
important findings," ensuring the summary is brief and to the point.
The other options would not achieve the desired outcome:
• A. “Give an in-depth analysis of the report, including all technical aspects and details.”:
This explicitly asks for a detailed analysis, which is the opposite of what is required.
• B. “Summarize the report in 500 words or less, focusing on the technical details.”: While it
sets a word limit, the focus on technical details could still result in a longer, more detailed
summary than intended.
•
: This would lead to a comprehensive summary rather than a concise one,
as it asks for the inclusion of every major section.
By specifying the use of key highlights or focusing on only the most important findings, C and E best
guide the LLM toward a concise output.
Question: 19
Answer: C, E
Explanation:
D. “Summarize the report, making sure to include the abstract, conclusion, and every
major section in full.”
E. “Generate a concise executive summary focusing on only the most important
findings.”

: While it mentions several components,
necessary focus on embedding and generative models specific to RAG.
examples.
D. Retriever, vectorizer, generative model, dataset schema, hyperparameter configuration, and API
gateway.
Answer: C
Explanation:
Page 18
You are working on a Retrieval-Augmented Generation (RAG) application using a large language model
(LLM) on Databricks. The cost of inference has increased significantly due to high traffic. You want to
This option includes all the essential components required for deploying a Retrieval-Augmented
Generation (RAG) application effectively:
1. Embedding Model: This is necessary for converting user queries and documents into vector
representations, enabling semantic search.
2. Retriever: This component retrieves the most relevant documents based on the embedded
queries, critical for the RAG architecture.
3. Generative Model: After retrieving the relevant documents, this model generates responses
based on the retrieved information.
4. Dependencies: This includes all necessary libraries and packages required for the application to
function correctly.
5. Model Signature: Specifies the expected inputs and outputs of the model, facilitating integration
and ensuring compatibility with other systems.
6. Input Examples: Providing example inputs helps with testing and validating the application during
deployment and future usage.
: This option mixes components that may not be relevant or necessary for the
specific RAG deployment context.
•
: While some components are relevant, it includes elements like dataset
schema and hyperparameter configuration that are not essential for the deployment process.
Thus, option C comprehensively captures all the necessary components for deploying the RAG
application on Databricks.
Question: 20
B. Language model, input format parser, retriever, output formatter, embedding index, and
model signature.
D. Retriever, vectorizer, generative model, dataset schema, hyperparameter configuration,
and API gateway.
Other Options:
• A. Pre-trained language model, document retriever, tokenizer, SQL query generator,
it lacks the
dependencies, and input pipeline.
•

use Databricks features to control the costs associated with running the LLM while maintaining
reasonable performance for end-users. Which of the following methods would be the BEST way to
control LLM costs in your RAG application on Databricks?
A. Use Databricks Auto-Scaling clusters to dynamically adjust the number of nodes in your cluster
based on workload, reducing costs during periods of low traffic.
B. Use MLflow to log all LLM responses and track usage, but do not change the underlying
infrastructure as Databricks optimizes costs automatically.
C.
D.
Cache all LLM-generated responses in Databricks to avoid repeated queries to the model.
Utilize Databricks Serverless endpoints, which automatically adjust based on the number of
incoming requests, to optimize cost-per-query for LLM inference.
Answer: D
Explanation:
Page 19
Databricks Serverless endpoints are highly efficient for handling variable traffic, as they dynamically
scale based on incoming request volume. This ensures that you're only paying for the compute
resources you use, reducing costs when there are fewer requests and scaling up to maintain
performance when traffic increases. This is ideal for managing costs in high-traffic scenarios while
maintaining good user experience.
• Option A (Auto-Scaling clusters) is beneficial but may not scale as efficiently for inference
workloads as Serverless endpoints, and you still pay for idle cluster time.
• Option B (Using MLflow to log responses) helps with tracking but doesn't directly control
infrastructure costs.
• Option C (Caching responses) can reduce repeated queries but doesn’t fully address the cost of
handling new queries or high traffic.
Serverless endpoints provide a more targeted approach to cost and performance optimization in this
scenario.
You are tasked with developing an AI-powered application using Databricks to summarize long-form legal
documents. The documents can be thousands of words long, and the large language model (LLM) has a
token limit of 4096 tokens. You need to decide on the optimal chunking strategy to ensure that the
summarization captures the essential legal clauses accurately without missing important context. Which
chunking strategy is most appropriate to generate an accurate and coherent summary, considering the
token limit and the document structure?
A. Chunk by paragraphs, overlapping the last sentence of each chunk with the next chunk.
B. Chunk by arbitrary 400-token segments without overlapping content.
C.
D.
Chunk by sentences, with no overlap.
Chunk based on logical sections of the document, with no overlap.
Question: 21
Answer: A
Explanation:

Chunking by paragraphs with overlap helps maintain continuity and context across chunks.
Overlapping the last sentence ensures that the context flows smoothly between chunks, reducing
the risk of losing important information at chunk boundaries. This approach is particularly useful
for summarizing long-form legal documents, where clauses and context may span across multiple
paragraphs.
Including a section that describes the available functions (e.g., get_order_status, cancel_order,
return_order) ensures that the AI agent knows what actions it can take, making it more efficient in
selecting the appropriate function based on the user’s query. This helps guide the model and
reduces ambiguity in handling specific requests.
B. Using placeholders in the prompt template to dynamically inject user input allows the system to
adapt the prompt in real-time based on the specific information provided by the user. This
You are developing an AI agent using Databricks for a customer support chatbot. To enhance its
flexibility and efficiency, you decide to build prompt templates to expose available functions that the
agent can call. The prompt must dynamically adjust based on user input and provide access to multiple
functions like get_order_status, cancel_order, and return_order. Which of the following are correct
practices when designing and using prompt templates to expose available functions in Databricks
Generative AI agent development? (Select two)
A. Ensure the prompt template includes a section that describes the available functions for the agent to
choose from.
B. Use placeholders in the prompt template to dynamically inject user input at runtime.
C. Hard-code all possible function options directly into the prompt template for consistency and
security.
D. Construct the prompt template so that it always exposes every available function to the user,
regardless of the context.
E. Avoid specifying functions in the prompt template to reduce complexity, letting the model infer which
function to use.
Answer: A, B
Explanation:
• B. Arbitrary 400-token segments without overlap can break the flow of important legal clauses,
leading to incomplete or disjointed summaries.
C. Chunking by sentences with no overlap may result in chunks that are too small and
disconnected, losing the broader context necessary for accurately summarizing legal documents.
•
• D. Chunking by logical sections without overlap might work for very well-structured
documents, but legal documents often require continuity between sections, so overlapping is
crucial for maintaining context.
Thus, chunking by paragraphs with overlap ensures both coherence and context retention, making it
the best approach for summarizing long legal documents effectively.
Question: 22
•
•
• A.

ensures flexibility and customization, improving the agent's ability to generate relevant
and context-aware responses.
This modified prompt explicitly directs the model to respond in a friendly and empathetic tone,
ensuring that the chatbot not only provides a solution but also acknowledges the customer's
emotions.
This approach is key to handling complaints effectively, as it emphasizes understanding and
empathy, which are important when dealing with customer service interactions.
• A focuses solely on technical accuracy and detail, missing the empathetic and
conversational aspect.
• B encourages repeating the concern and giving a brief technical fix, but it lacks the
• Hard-coding all possible function options directly into the prompt template limits flexibility
and can make the system less scalable or adaptable to new functions.
• D. Exposing every available function regardless of context could overwhelm the user and
increase the complexity of the model’s decision-making process, leading to suboptimal
results.
E. Avoiding the specification of functions in the prompt can lead to confusion and reduce
the model’s ability to execute specific tasks efficiently.
•
Thus, A and B are the best practices for building dynamic and efficient prompt templates for an AI
agent in customer support.
You are developing a customer service chatbot using a large language model (LLM) in Databricks.
The baseline model generates formal, fact-based responses, but you want to adjust the prompt so
the chatbot’s responses are more empathetic and conversational when handling complaints. How
should you modify the prompt to best adjust the LLM’s tone?
Your baseline prompt is:
“Analyze the customer’s complaint and provide a solution.”
Which of the following prompt modifications will most effectively adjust the response tone to be
empathetic and conversational?
A. “Give a technical solution to the customer’s issue in detail, ensuring accuracy.”
B. “Respond to the customer by repeating their concern and providing a brief technical fix.”
C. “Analyze the customer’s complaint and offer a solution in a friendly and empathetic
tone, acknowledging their feelings.”
D. “Provide a concise solution to the customer’s issue, focusing only on facts.”
Page 21
Question: 23
Explanation:
Why not the other options?
C.
Answer: C

empathy required for complaints.
• D emphasizes providing facts only, which would keep the tone formal and impersonal,
not addressing the need for a conversational, empathetic approach.
Thus, C provides the best guidance for generating responses that are both solution-oriented and
empathetic, improving customer satisfaction in complaint handling.
Your team is tasked with ensuring data governance while maintaining query performance for an
application that involves real-time analytics on sensitive user data. Which of the following strategies
best implements data masking techniques to optimize both governance and performance in
Databricks?
A. Use a combination of column-level encryption and static masking to ensure sensitive
information is always hidden, reducing governance overhead.
B. Implement dynamic masking at the view level and cache frequently queried results to
avoid unnecessary masking operations for each query.
C. Mask data at the storage layer and configure the system to remove sensitive information
before loading into Databricks.
D. Use query-level dynamic masking to ensure that data is masked every time a user issues a
query on the sensitive dataset.
Answer: B
Explanation:
ensures that sensitive information is masked based on
without altering the underlying data. This
user permissions, providing
allows for flexibility in data access while still enforcing security policies.
Caching frequently queried results helps optimize performance by reducing the need to
reapply masking operations every time a query is executed. This approach ensures that data
governance is maintained while also improving query performance.
Page 22
• Column-level encryption and static masking ensure data security but might negatively
, especially in real-time analytics, due to the overhead of encryption and
decryption.
• C. Masking data at the storage layer can be effective, but it lacks flexibility and might not
provide the real-time dynamic masking required for different user roles and permissions.
• D. Query-level dynamic masking can ensure governance but could lead to performance
bottlenecks if masking operations are repeated for every query, particularly in a high-
frequency, real-time analytics environment.
Thus, B provides the best balance between data governance and query performance in Databricks,
Question: 24
•
Why not the other options?
A.
impact performance
• Dynamic masking at the view level
real-time governance

leveraging dynamic masking and caching.
Page 23
You are developing an enterprise-grade application that requires generating highly technical reports
from structured data. The application must accurately interpret the domain-specific terminology used
in the aerospace industry. Given the following LLMs, which one would be the best choice based on
the requirements? Which LLM is best suited for this application?
A.
B.
C.
D.
GPT-3.5 fine-tuned on aerospace data
GPT-Neo
GPT-3
T5 (Text-to-Text Transfer Transformer)
You need to generate a structured table of customer feedback data using a generative AI model.
Each feedback entry should include columns: “Customer ID,” “Rating,” “Feedback,” and “Timestamp.”
Which of the following prompts is most likely to elicit a table format with correctly labeled columns
and corresponding rows of data?
A. "Output a table with customer details, including the feedback, rating, and time."
B. "Generate a table of customer feedback with rows for each entry and columns for Customer
ID, Rating, Feedback, and Timestamp."
C. "List customer feedback in CSV format with columns: Customer ID, Rating, Feedback,
and Timestamp."
• is the most suitable model because it has been
specifically fine-tuned on and data. Fine-tuning a
large language model (LLM) on aerospace-specific datasets ensures it can accurately
interpret and generate reports that require deep technical understanding of the field.
• B. GPT-Neo is an open-source alternative to GPT models but is less powerful and may not
have the aerospace-specific knowledge required unless fine-tuned, making it less ideal for
this enterprise-grade application.
• C. GPT-3 is a powerful model but lacks the fine-tuning on aerospace-specific data, making it
less accurate for interpreting specialized terminology and generating industry-specific reports.
• D. T5 (Text-to-Text Transfer Transformer) is a versatile model, but it is not as well-suited
for specialized tasks unless fine-tuned for the specific domain, which isn't indicated here.
Thus, GPT-3.5 fine-tuned on aerospace data is the best option because it combines the power of a
large model with domain-specific fine-tuning, ensuring accurate and contextually appropriate technical
report generation.
Question: 25
Question: 26
Answer: A
Explanation:
A. GPT-3.5 fine-tuned on aerospace data
domain-specific aerospace terminology

D. "Provide a summary of customer feedback, mentioning the customer’s ID, rating, and feedback
they provided."
Answer: C
Explanation:
A Generative AI Engineer is designing an LLM-powered live sports commentary platform. The
platform provides real-time updates and LLM-generated analyses for any users who would like to
have live summaries, rather than reading a series of potentially outdated news articles.
Which tool below will give the platform access to real-time data for generating game analyses based
on the latest game scores?
A. DatabrickslQ
B. Foundation Model APIs
C. Feature Serving
D. AutoML
Page 24
Problem Context: The engineer is developing an LLM-powered live sports commentary platform that
needs to provide real-time updates and analyses based on the latest game scores. The critical
requirement here is the capability to access and integrate real-time data efficiently with the platform for
immediate analysis and reporting.
Explanation of Options:
Option A: DatabricksIQ: While DatabricksIQ offers integration and data processing capabilities, it is
more aligned with data analytics rather than real-time feature serving, which is crucial for immediate
updates necessary in a live sports commentary context.
Option B: Foundation Model APIs: These APIs facilitate interactions with pre-trained models and
could be part of the solution, but on their own, they do not provide mechanisms to access real-time
• is explicit in requesting a CSV format, which is a well-understood, structured format
for tabular data. By specifying the column names (Customer ID, Rating, Feedback, Timestamp)
and the format (CSV), the model is more likely to output the data in a structured, table-like format
with properly labeled columns.
A is less specific about the structure of the output and does not mention the exact column names.
B requests a table with rows and columns but lacks the specificity of a format like CSV that
models often handle better for structured data output.
•
•
• D asks for a summary of feedback, which is not likely to result in a structured table format.
Thus, C provides the best clarity for generating structured tabular data with properly labeled columns.
Question: 27
Answer: C
Explanation:
Prompt C

A Generative Al Engineer is responsible for developing a chatbot to enable their company’s internal
HelpDesk Call Center team to more quickly find related tickets and provide resolution. While creating
the GenAI application work breakdown tasks for this project, they realize they need to start planning
which data sources (either Unity Catalog volume or Delta table) they could choose for this
application. They have collected several candidate data sources for consideration:
call_rep_history: a Delta table with primary keys representative_id, call_id. This table is
maintained to calculate representatives’ call resolution from fields call_duration and call
start_time.
transcript Volume: a Unity Catalog Volume of all recordings as a *.wav files, but also a text transcript as
*.txt files.
call_cust_history: a Delta table with primary keys customer_id, cal1_id. This table is maintained to
calculate how much internal customers use the HelpDesk to make sure that the charge back model is
consistent with actual service use.
call_detail: a Delta table that includes a snapshot of all call details updated hourly. It includes
root_cause and resolution fields, but those fields may be empty for calls that are still active.
maintenance_schedule – a Delta table that includes a listing of both HelpDesk application outages
as well as planned upcoming maintenance downtimes.
They need sources that could add context to best identify ticket root cause and resolution. Which
TWO sources do that? (Choose two.)
A. call_cust_history
B. maintenance_schedule
C.
D.
E.
call_rep_history
call_detail
transcript Volume
game scores.
Option C: Feature Serving: This is the correct answer as feature serving specifically refers to the
real- time provision of data (features) to models for prediction. This would be essential for an LLM
that generates analyses based on live game data, ensuring that the commentary is current and based
on the latest events in the sport.
Option D: AutoML: This tool automates the process of applying machine learning models to real- world
problems, but it does not directly provide real-time data access, which is a critical requirement for the
platform.
Thus, Option C (Feature Serving) is the most suitable tool for the platform as it directly supports the
real- time data needs of an LLM-powered sports commentary system, ensuring that the analyses and
updates are based on the latest available information.
Question: 28
Answer: D, E

A Generative AI Engineer is testing a simple prompt template in LangChain using the code below, but
is getting an error.
In the context of developing a chatbot for a company's internal HelpDesk Call Center, the key is to
select data sources that provide the most contextual and detailed information about the issues being
addressed. This includes identifying the root cause and suggesting resolutions. The two most
appropriate sources from the list are:
Call Detail (Option D):
Contents: This Delta table includes a snapshot of all call details updated hourly, featuring essential
fields like root_cause and resolution.
Relevance: The inclusion of root_cause and resolution fields makes this source particularly valuable,
as it directly contains the information necessary to understand and resolve the issues discussed in the
calls.
Even if some records are incomplete, the data provided is crucial for a chatbot aimed at speeding up
resolution identification.
Transcript Volume (Option E):
Contents: This Unity Catalog Volume contains recordings in .wav format and text transcripts in .txt files.
Relevance: The text transcripts of call recordings can provide in-depth context that the chatbot can
analyze to understand the nuances of each issue. The chatbot can use natural language processing
techniques to extract themes, identify problems, and suggest resolutions based on previous similar
interactions documented in the transcripts.
Why Other Options Are Less Suitable:
A (Call Cust History): While it provides insights into customer interactions with the HelpDesk, it
focuses more on the usage metrics rather than the content of the calls or the issues discussed.
B (Maintenance Schedule): This data is useful for understanding when services may not be available
but does not contribute directly to resolving user issues or identifying root causes.
C (Call Rep History): Though it offers data on call durations and start times, which could help in
assessing performance, it lacks direct information on the issues being resolved.
Therefore, Call Detail and Transcript Volume are the most relevant data sources for a chatbot
designed to assist with identifying and resolving issues in a HelpDesk Call Center setting, as they
provide direct and contextual information related to customer issues.
Question: 29
Explanation:

B)
C)
Assuming the API key was properly defined, what change does the Generative AI Engineer need to
make to fix their chain?
A)
Page 27

D)
A.
B.
C.
D.
Answer: C
Explanation:
Option A
Option B
Option C
Option D
To fix the error in the LangChain code provided for using a simple prompt template, the correct
approach is Option C. Here's a detailed breakdown of why Option C is the right choice and how it
addresses the issue:
Proper Initialization: In Option C, the LLMChain is correctly initialized with the LLM instance specified
as OpenAI(), which likely represents a language model (like GPT) from OpenAI. This is crucial as it
specifies which model to use for generating responses.
Correct Use of Classes and Methods:
The PromptTemplate is defined with the correct format, specifying that adjective is a variable within
the template. This allows dynamic insertion of values into the template when generating text.
The prompt variable is properly linked with the PromptTemplate, and the final template string is
passed correctly.
The LLMChain correctly references the prompt and the initialized OpenAI() instance, ensuring that the
template and the model are properly linked for generating output.

A Generative AI Engineer is developing an LLM application that users can use to generate
personalized birthday poems based on their names.
Which technique would be most effective in safeguarding the application, given the potential for
malicious user inputs?
A. Implement a safety filter that detects any harmful inputs and ask the LLM to respond that it is
unable to assist
B. Reduce the time that the users can interact with the LLM
C. Ask the LLM to remind the user that the input is malicious but continue the conversation with
the user
D. Increase the amount of compute that powers the LLM to process input faster
Page 29
In this case, the Generative AI Engineer is developing an application to generate personalized
birthday poems, but there’s a need to safeguard against malicious user inputs. The best solution is to
implement a safety filter (option A) to detect harmful or inappropriate inputs.
Safety Filter Implementation:
Safety filters are essential for screening user input and preventing inappropriate content from being
processed by the LLM. These filters can scan inputs for harmful language, offensive terms, or
malicious content and intervene before the prompt is passed to the LLM.
Graceful Handling of Harmful Inputs:
Once the safety filter detects harmful content, the system can provide a message to the user, such
as "I'm unable to assist with this request," instead of processing or responding to malicious input.
This protects the system from generating harmful content and ensures a controlled interaction
environment.
Why Other Options Are Incorrect:
Option A: Misuses the parameter passing in generate method by incorrectly structuring the dictionary.
Option B: Incorrectly uses prompt.format method which does not exist in the context of LLMChain and
PromptTemplate configuration, resulting in potential errors.
Option D: Incorrect order and setup in the initialization parameters for LLMChain, which would likely
lead to a failure in recognizing the correct configuration for prompt and LLM usage.
Thus, Option C is correct because it ensures that the LangChain components are correctly set up
and integrated, adhering to proper syntax and logical flow required by LangChain's architecture.
This setup avoids common pitfalls such as type errors or method misuses, which are evident in
other options.
Question: 29
Answer: A
Explanation:

A Generative AI Engineer developed an LLM application using the provisioned throughput
Foundation Model API. Now that the application is ready to be deployed, they realize their volume
of requests are not sufficiently high enough to create their own provisioned throughput endpoint.
They want to choose a strategy that ensures the best cost-effectiveness for their application.
What strategy should the Generative AI Engineer use?
A. Switch to using External Models instead
B. Deploy the model using pay-per-token throughput as it comes with cost guarantees
C. Change to a model with a fewer number of parameters in order to reduce hardware
constraint issues
D. Throttle the incoming batch of requests manually to avoid rate limiting issues
Page 30
B (Reduce Interaction Time): Reducing the interaction time won’t prevent malicious inputs from being
entered.
C (Continue the Conversation): While it’s possible to acknowledge malicious input, it is not safe to
continue the conversation with harmful content. This could lead to legal or reputational risks.
D (Increase Compute Power): Adding more compute doesn’t address the issue of harmful content and
would only speed up processing without resolving safety concerns.
Therefore, implementing a safety filter that blocks harmful inputs is the most effective technique for
safeguarding the application.
Problem Context: The engineer needs a cost-effective deployment strategy for an LLM application
with relatively low request volume.
Explanation of Options:
Option A: Switching to external models may not provide the required control or integration necessary
for specific application needs.
Option B: Using a pay-per-token model is cost-effective, especially for applications with variable or low
request volumes, as it aligns costs directly with usage.
Option C: Changing to a model with fewer parameters could reduce costs, but might also impact the
performance and capabilities of the application.
Option D: Manually throttling requests is a less efficient and potentially error-prone strategy for
managing costs.
Question: 30
Answer: B
Explanation:

A company has a typical RAG-enabled, customer-facing chatbot on its website.
Option B is ideal, offering flexibility and cost control, aligning expenses directly with the application's
usage patterns.
Select the correct sequence of components a user's questions will go through before the final output
is returned. Use the diagram above for reference.
A. 1.embedding model, 2.vector search, 3.context-augmented prompt, 4.response-generating LLM
B. 1.context-augmented prompt, 2.vector search, 3.embedding model, 4.response-generating LLM
C. 1.response-generating LLM, 2.vector search, 3.context-augmented prompt, 4.embedding model
D. 1.response-generating LLM, 2.context-augmented prompt, 3.vector search, 4.embedding model
Page 31
To understand how a typical RAG-enabled customer-facing chatbot processes a user's question, let’s
go through the correct sequence as depicted in the diagram and explained in option A:
Embedding Model (1):
The first step involves the user's question being processed through an embedding model. This model
converts the text into a vector format that numerically represents the text. This step is essential for
allowing the subsequent vector search to operate effectively.
Vector Search (2):
The vectors generated by the embedding model are then used in a vector search mechanism.
This search identifies the most relevant documents or previously answered questions that are
stored in a vector format in a database.
Context-Augmented Prompt (3):
The information retrieved from the vector search is used to create a context-augmented prompt. This
step involves enhancing the basic user query with additional relevant information gathered to ensure
the generated response is as accurate and informative as possible.
Question: 31
Answer: A
Explanation:

•
While NER is effective for identifying specific entities (e.g., product names, brands), it does not
perform sentiment analysis or extract the reasons for sentiment in text.
Response-Generating LLM (4):
Finally, the context-augmented prompt is fed into a response-generating large language model
(LLM).
This LLM uses the prompt to generate a coherent and contextually appropriate answer, which is
then delivered as the final output to the user.
B, C, D: These options suggest incorrect sequences that do not align with how a RAG system typically
processes queries. They misplace the role of embedding models, vector search, and response
generation in an order that would not facilitate effective information retrieval and response generation.
Thus, the correct sequence is embedding model, vector search, context-augmented prompt,
response- generating LLM, which is option A.
You are working for an e-commerce company that wants to analyze customer reviews and determine the
overall sentiment (positive, negative, or neutral) of each review. The company also wants to understand
the reasons behind the sentiment, such as mentions of specific product features. Which type of
generative AI model would be most effective in accomplishing this task? (Select two)
A. Named Entity Recognition (NER) Model
B. Text Classification Model focused on Sentiment Analysis
C. Text Classification Model with Aspect-Based Sentiment Analysis (ABSA)
D. Sequence-to-Sequence (Seq2Seq) Model for Text Generation
E. Topic Modeling for Latent Semantic Analysis
1.
This type of model is specifically designed to determine the overall sentiment of a text (positive,
negative, or neutral). It is highly effective for classifying customer reviews at a high level, providing
a clear understanding of the sentiment expressed in each review.
2. Text Classification Model with Aspect-Based Sentiment Analysis (ABSA) (C):
ABSA extends sentiment analysis by identifying sentiments associated with specific aspects or
features mentioned in the text. For example, it can analyze mentions of product features (e.g.,
"battery life," "design") and determine whether the sentiment about those features is positive,
negative, or neutral. This capability makes ABSA ideal for understanding the reasons behind the
sentiment, aligning perfectly with the e-commerce company's requirements.
Question: 32
Answer: B, C
Explanation:
Why not the others:
A. Named Entity Recognition (NER) Model:
Text Classification Model focused on Sentiment Analysis (B):

You have deployed a machine learning model on Databricks for serving through a REST API. To ensure
that only authorized users can access the model serving endpoint, you decide to implement token-based
authentication. Which of the following is the best approach to control access to the model serving
endpoint using token-based authentication?
A. Configure a Databricks Personal Access Token (PAT) for each user and validate it within the
serving endpoint.
B. Use an OAuth 2.0 access token issued by an external identity provider and verify it in a custom
validation layer before accessing the model endpoint.
C. Use Databricks' built-in role-based access control (RBAC) and assign specific users access to the
model serving endpoint via the workspace UI.
D. Set up API keys in Databricks Workspace and authenticate API requests by checking for the
presence of a valid API key in each request.
• D. Sequence-to-Sequence (Seq2Seq) Model for Text Generation: Seq2Seq models are typically
used for tasks like language translation, text summarization, or text generation. They are not
optimized for sentiment analysis or feature-based sentiment extraction.
E. Topic Modeling for Latent Semantic Analysis:
Topic modeling identifies overarching themes or topics in text but does not evaluate sentiment or
reasons for sentiment. It is less precise for this use case compared to sentiment-specific models.
•
B and C together address both aspects of the problem: determining overall sentiment and identifying the
reasons behind it, making them the most effective choices.
The best approach is to use an OAuth 2.0 access token issued by an external identity provider and
verify it in a custom validation layer before accessing the model endpoint. This method is widely
used for secure, scalable, and flexible access control in enterprise environments.
•
•
OAuth 2.0 Benefits:
o Secure and Standardized Authentication: OAuth 2.0 is a well-established standard for
token-based authentication.
Integration with Identity Providers: External identity providers (e.g., Azure AD, Okta,
Google Identity) can issue tokens, enabling single sign-on (SSO) and centralized access
management.
Granular Access Control: Tokens can carry claims or scopes, allowing fine-grained
authorization policies for different user roles or permissions.
Scalability: OAuth 2.0 supports dynamic token issuance and validation, making it suitable
for environments with multiple users or applications.
o
o
o
• Custom Validation Layer:
Before processing a request, a custom validation layer can verify the token by checking its
signature, expiration, and claims, ensuring that only authorized users can access the endpoint.
Explanation:
Why not the others:
Answer: B
Question: 33

A
throughput is the best choice for this use case because it provides a balance of high accuracy (low
perplexity), sufficient performance (moderate throughput), and manageable resource consumption
(moderate memory usage). This configuration ensures that the LLM can generate contextually relevant
and accurate technical reports while operating efficiently in terms of memory and response speed.
Why not the others:
•
o API keys provide basic authentication but lack the flexibility and scalability of OAuth 2.0.
They do not support claims-based authorization or integrate with enterprise identity
providers.
B is the best approach as it leverages a secure, standardized, and scalable authentication mechanism
while supporting enterprise-grade access control.
You are developing an AI-powered knowledge base application for a global research organization. The
application will generate detailed technical reports based on user queries. Evaluation metrics include
perplexity (response quality), throughput (tokens generated per second), and memory usage. The LLM
must deliver highly accurate, contextually relevant information, while minimizing resource consumption.
Which of the following LLM configurations would best meet the application's requirements for high
accuracy, moderate throughput, and efficient memory usage?
A. A 6-billion parameter model with moderate perplexity, low memory usage, and high throughput.
B. A 1-billion parameter model with high perplexity, low memory usage, and very high throughput.
C. A 13-billion parameter model with low perplexity, moderate memory usage, and moderate
throughput.
D. A 30-billion parameter model with very low perplexity but high memory usage and low throughput.
While it has high throughput and low memory usage, its moderate
perplexity indicates lower accuracy, which compromises the quality of technical reports.
B. 1-billion parameter model: Although it offers very high throughput and low memory usage, its
high perplexity suggests poor response quality, making it unsuitable for generating detailed and
accurate reports.
D. 30-billion parameter model: This model provides very low perplexity (high accuracy) but
PATs are designed for individual user access to the Databricks workspace and are less
scalable for shared or multi-user environments. They also don’t offer the flexibility of claims-
based authorization.
While RBAC can control access to the workspace and resources, it does not directly apply
to securing REST API requests for model serving.
•
•
•
Question: 34
•
•
A. 6-billion parameter model:
C. Use Databricks' built-in role-based access control (RBAC):
o
Explanation:
13-billion parameter model with low perplexity, moderate memory usage, and moderate
D. Set up API keys in Databricks Workspace and authenticate API requests by checking for
the presence of a valid API key:
A. Configure a Databricks Personal Access Token (PAT) for each user and validate it within
the serving endpoint:
o
Answer: C

You are working on a text summarization project and have tested several models. Below are the
ROUGE-1, ROUGE-2, and ROUGE-L scores for different models:
Page 35
Given that ROUGE-1 measures unigram overlap, ROUGE-2 measures bigram overlap, and ROUGE-L
focuses on the longest common subsequence (LCS), which model should you select for this
summarization task if your goal is to prioritize overall summary quality and coherence?
A. Model B B.
Model C C.
Model D D.
Model A
Answer: B
Explanation:
• consumes significant memory and has low throughput, making it inefficient for practical use in a
resource-constrained environment.
C strikes the right balance between accuracy, performance, and resource efficiency, meeting the
requirements of the application effectively.
Lower scores in all metrics compared to Model C (ROUGE-1: 0.55, ROUGE-2: 0.43,
ROUGE-L: 0.48).
Model B: Performs slightly better than Model A but scores lower than Model C in all metrics
(ROUGE-1: 0.60, ROUGE-2: 0.45, ROUGE-L: 0.52).
Model D: While it is closer to Model B, it is also outperformed by Model C in all metrics (ROUGE-
1: 0.58, ROUGE-2: 0.44, ROUGE-L: 0.50).
has the highest scores across all ROUGE metrics (ROUGE-1: 0.62, ROUGE-2: 0.46, ROUGE-L:
0.55), indicating superior overall summary quality and coherence.
• ROUGE-1 (0.62): Measures unigram overlap, reflecting coverage of individual words in the
summary.
ROUGE-2 (0.46): ROUGE-L (0.55):
•
•
Measures bigram overlap, indicating better fluency and local coherence.
Evaluates the longest common subsequence, capturing structural similarity and
overall summary coherence.
Question: 35
•
•
Model C
Why not the others:
• Model A:

Model C consistently delivers the best performance across all evaluation metrics, making it the optimal
choice for prioritizing summary quality and coherence.

Thank You for trying Databricks-Generative-AI-Engineer-Associate PDF Demo
https://www.certifiedumps.com/databricks/databricks-generative-ai-engineer-associate-dumps.html
[Limited Time Offer] Use Coupon "certs20" for extra 20% discount on the purchase of
PDF file. Test your Databricks-Generative-AI-Engineer-Associate preparation with
actual exam questions
Your Databricks-Generative-AI-Engineer-Associate Preparation

Databricks-Generative-AI-Engineer-Associate exam dumps

More Related Content

What's hot

Similar to Databricks-Generative-AI-Engineer-Associate exam dumps

More from 24servicehub

Recently uploaded

Databricks-Generative-AI-Engineer-Associate exam dumps