Implementing Function Calling LLMs without Fear.pdf

Implementing
Function Calling
LLMs Without
Fear
Benjamin Bengfort @ C4AI 2025

Super controversial?
AI should do the repetitive, boring tasks that
lead to inconsistencies and errors when
humans do them.
An Agent must autonomously act do
something on behalf of a user – and to act
they need tools, skills, or functions.
What is an Agent?
The Landscape Of Emerging Ai Agent Architectures
For Reasoning, Planning, And Tool Calling: A Survey
https://arxiv.org/pdf/2404.11584

Function Calling
An LLM Compiler for Parallel Function Calling
https://openreview.net/forum?id=uQ2FUoFjnF
“The ability of an LLM to execute external
function calls to overcome their inherent
limitations, such as knowledge cutoffs, poor
arithmetic skills, or lack of access to private
data.”
In short: give the LLM access to additional
context in order to formulate better
responses.

Red Team Hackathon
Function calling with VertexAI tutorial:
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling
It’s sounds easy but …
Goal: try to manipulate the prompt input to
force a function calling LLM to use insecure
inputs and reveal private information.
Target LLM: gemini-2.0-flash-001
API access via GCP VertexAI

Architecture of Target Application
https://github.com/bbengfort/bank-agent
Chat UI
Django
Server
VertexAI
Banking
Agent
DRF
Banking
API
ChatDB BankDB

Step 3: Handle f(x) calls requested by agent

Step 4: Provide Context and Generate Final Response

Doesn’t take much to
make bad choices.

Context isn’t taken into
account without extra
work.
Guardrails are required!

Reveals backend API
details to the user.

The agent wants to use
the tools even if it’s not
necessary.

Moreover it will make calls
to the API even when it
doesn’t need to or use the
response.

When it works though …
it makes life that much
easier!

Standardizing Function Calling

MCP
The Model Context Protocol
https://modelcontextprotocol.io/introduction
Popularized by Claude and Cursor,
MCP has taken over as the defacto
mechanism for function calling.
- Client/Server functions
- Integrated with more models
- Prebuilt Integrations
Goal: translates any tool or data
source into an LLM usable utility
without requiring custom integration
between tool developer and AI
developer.

A2A
The Agent2Agent Protocol
https://developers.googleblog.com/en/a2a-a-new
-era-of-agent-interoperability/
Complements MCP but focuses on
Agent to Agent communication to
create multi-agent systems.
- Multi-modal: text, audio, visual
- Async for long running tasks
- No shared memory
Whereas tools respond
immediately, agents need to be
collaborative.

Jailbreaking
A jailbreaking function exploits alignment
discrepancies, user coercion, and the absence
of rigorous safety filters to bypass controls and
extract PII or create harmful content.
Vulnerability Chaining
Rug Pulling
Tool Poisoning
Security Threats
The Dark Side of Function Calling:
Pathways to Jailbreaking Large Language Models
https://arxiv.org/html/2407.17915v2

Jailbreaking
API might only be accessible to the LLM, but
you can exploit security flaws in the API via
the LLM. Many low/medium risk exploits lead
to larger exploitation.
Rug Pulling
Tool Poisoning
Security Threats
https://ieeexplore.ieee.org/document/10494639

Jailbreaking
Rug Pulling
If tools can mutate their own definitions, then
the tool can appear safe one day but then
mutate into an unsafe function the next. Often
used with shadowing, to redirect user to a
harmful API server.
Tool Poisoning
Security Threats
The “S” in MCP Stands for Security
https://elenacross7.medium.com/-the-s-in-mcp-stands-for-security-91407b33ed6b

Jailbreaking
Rug Pulling
Tool Poisoning
A prompt injection attack that hides extra
instructions in the tool definition (usually not
visible) that cause the LLM to execute harmful
actions. Used as a vulnerability chain with
Shadowing and Rug Pulling
Security Threats
MCP Security Notification: Tool Poisoning Attacks
https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks

Guardrails for Safer Function Calling
mTLS
User
Conﬁrmation
Defensive
Prompts
API Response
Sanitization
OpenAI
Schemas
Version
Pinned Tools

Bigger is not Better!
We tend to think that the 280B parameter
models with large context sizes are “better” –
but really they are just more general.
We have found that fine-tuned 3B-70B
parameter models with tool use are more
effective at specific tasks and easier to add
guardrails to.
Improving Small-Scale Large Language Models
Function Calling for Reasoning Tasks
https://arxiv.org/pdf/2410.18890

Final Thoughts
What is the security context for the Agent?
a. Service account for the Agent itself
b. Use the security context of the user
c. Some combination of the two above?
Is there a “proxy agent” security model?
Do the LLMs need to improve or do the tools
need to adapt to Agentic contexts?
https://github.com/bbengfort/bank-agent

Happy to take comments and questions
online or chat after the talk!
benjamin@rotational.io
https://rotational.io
Thanks!
Some images in this presentation were AI generated using Gemini Pro
@bbengfort

Implementing Function Calling LLMs without Fear.pdf

More Related Content

Similar to Implementing Function Calling LLMs without Fear.pdf

More from Benjamin Bengfort

Recently uploaded

Implementing Function Calling LLMs without Fear.pdf