Implementing
Function Calling
LLMs Without
Fear
Benjamin Bengfort @ C4AI 2025
Super controversial?
AI should do the repetitive, boring tasks that
lead to inconsistencies and errors when
humans do them.
An Agent must autonomously act do
something on behalf of a user – and to act
they need tools, skills, or functions.
What is an Agent?
The Landscape Of Emerging Ai Agent Architectures
For Reasoning, Planning, And Tool Calling: A Survey
https://arxiv.org/pdf/2404.11584
Function Calling
An LLM Compiler for Parallel Function Calling
https://openreview.net/forum?id=uQ2FUoFjnF
“The ability of an LLM to execute external
function calls to overcome their inherent
limitations, such as knowledge cutoffs, poor
arithmetic skills, or lack of access to private
data.”
In short: give the LLM access to additional
context in order to formulate better
responses.
Red Team Hackathon
Function calling with VertexAI tutorial:
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling
It’s sounds easy but …
Goal: try to manipulate the prompt input to
force a function calling LLM to use insecure
inputs and reveal private information.
Target LLM: gemini-2.0-flash-001
API access via GCP VertexAI
Function calling is hard …
Architecture of Target Application
https://github.com/bbengfort/bank-agent
Chat UI
Django
Server
VertexAI
Banking
Agent
DRF
Banking
API
ChatDB BankDB
Function Calling Basics
Step 0: Define f(x)
Step 1: Describe Tools
Step 2: Process Prompt
Step 3: Handle f(x) calls requested by agent
Step 4: Provide Context and Generate Final Response
Function Calling in Action
Doesn’t take much to
make bad choices.
Even less funny.
Context isn’t taken into
account without extra
work.
Guardrails are required!
Reveals backend API
details to the user.
The agent wants to use
the tools even if it’s not
necessary.
Moreover it will make calls
to the API even when it
doesn’t need to or use the
response.
When it works though …
it makes life that much
easier!
Standardizing Function Calling
MCP
The Model Context Protocol
https://modelcontextprotocol.io/introduction
Popularized by Claude and Cursor,
MCP has taken over as the defacto
mechanism for function calling.
- Client/Server functions
- Integrated with more models
- Prebuilt Integrations
Goal: translates any tool or data
source into an LLM usable utility
without requiring custom integration
between tool developer and AI
developer.
A2A
The Agent2Agent Protocol
https://developers.googleblog.com/en/a2a-a-new
-era-of-agent-interoperability/
Complements MCP but focuses on
Agent to Agent communication to
create multi-agent systems.
- Multi-modal: text, audio, visual
- Async for long running tasks
- No shared memory
Whereas tools respond
immediately, agents need to be
collaborative.
Security Forethought
Jailbreaking
A jailbreaking function exploits alignment
discrepancies, user coercion, and the absence
of rigorous safety filters to bypass controls and
extract PII or create harmful content.
Vulnerability Chaining
Rug Pulling
Tool Poisoning
Security Threats
The Dark Side of Function Calling:
Pathways to Jailbreaking Large Language Models
https://arxiv.org/html/2407.17915v2
Jailbreaking
Vulnerability Chaining
API might only be accessible to the LLM, but
you can exploit security flaws in the API via
the LLM. Many low/medium risk exploits lead
to larger exploitation.
Rug Pulling
Tool Poisoning
Security Threats
Vulnerability Chaining
https://ieeexplore.ieee.org/document/10494639
Jailbreaking
Vulnerability Chaining
Rug Pulling
If tools can mutate their own definitions, then
the tool can appear safe one day but then
mutate into an unsafe function the next. Often
used with shadowing, to redirect user to a
harmful API server.
Tool Poisoning
Security Threats
The “S” in MCP Stands for Security
https://elenacross7.medium.com/-the-s-in-mcp-stands-for-security-91407b33ed6b
Jailbreaking
Vulnerability Chaining
Rug Pulling
Tool Poisoning
A prompt injection attack that hides extra
instructions in the tool definition (usually not
visible) that cause the LLM to execute harmful
actions. Used as a vulnerability chain with
Shadowing and Rug Pulling
Security Threats
MCP Security Notification: Tool Poisoning Attacks
https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks
Safety First
Guardrails for Safer Function Calling
mTLS
User
Confirmation
Defensive
Prompts
API Response
Sanitization
OpenAI
Schemas
Version
Pinned Tools
Bigger is not Better!
We tend to think that the 280B parameter
models with large context sizes are “better” –
but really they are just more general.
We have found that fine-tuned 3B-70B
parameter models with tool use are more
effective at specific tasks and easier to add
guardrails to.
Improving Small-Scale Large Language Models
Function Calling for Reasoning Tasks
https://arxiv.org/pdf/2410.18890
Final Thoughts
What is the security context for the Agent?
a. Service account for the Agent itself
b. Use the security context of the user
c. Some combination of the two above?
Is there a “proxy agent” security model?
Do the LLMs need to improve or do the tools
need to adapt to Agentic contexts?
https://github.com/bbengfort/bank-agent
Happy to take comments and questions
online or chat after the talk!
benjamin@rotational.io
https://rotational.io
Thanks!
Some images in this presentation were AI generated using Gemini Pro
@bbengfort

Implementing Function Calling LLMs without Fear.pdf

  • 1.
  • 2.
    Super controversial? AI shoulddo the repetitive, boring tasks that lead to inconsistencies and errors when humans do them. An Agent must autonomously act do something on behalf of a user – and to act they need tools, skills, or functions. What is an Agent? The Landscape Of Emerging Ai Agent Architectures For Reasoning, Planning, And Tool Calling: A Survey https://arxiv.org/pdf/2404.11584
  • 3.
    Function Calling An LLMCompiler for Parallel Function Calling https://openreview.net/forum?id=uQ2FUoFjnF “The ability of an LLM to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data.” In short: give the LLM access to additional context in order to formulate better responses.
  • 4.
    Red Team Hackathon Functioncalling with VertexAI tutorial: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling It’s sounds easy but … Goal: try to manipulate the prompt input to force a function calling LLM to use insecure inputs and reveal private information. Target LLM: gemini-2.0-flash-001 API access via GCP VertexAI
  • 5.
  • 6.
    Architecture of TargetApplication https://github.com/bbengfort/bank-agent Chat UI Django Server VertexAI Banking Agent DRF Banking API ChatDB BankDB
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Step 3: Handlef(x) calls requested by agent
  • 12.
    Step 4: ProvideContext and Generate Final Response
  • 13.
  • 14.
    Doesn’t take muchto make bad choices.
  • 15.
  • 16.
    Context isn’t takeninto account without extra work. Guardrails are required!
  • 17.
  • 18.
    The agent wantsto use the tools even if it’s not necessary.
  • 19.
    Moreover it willmake calls to the API even when it doesn’t need to or use the response.
  • 20.
    When it worksthough … it makes life that much easier!
  • 21.
  • 22.
    MCP The Model ContextProtocol https://modelcontextprotocol.io/introduction Popularized by Claude and Cursor, MCP has taken over as the defacto mechanism for function calling. - Client/Server functions - Integrated with more models - Prebuilt Integrations Goal: translates any tool or data source into an LLM usable utility without requiring custom integration between tool developer and AI developer.
  • 23.
    A2A The Agent2Agent Protocol https://developers.googleblog.com/en/a2a-a-new -era-of-agent-interoperability/ ComplementsMCP but focuses on Agent to Agent communication to create multi-agent systems. - Multi-modal: text, audio, visual - Async for long running tasks - No shared memory Whereas tools respond immediately, agents need to be collaborative.
  • 24.
  • 25.
    Jailbreaking A jailbreaking functionexploits alignment discrepancies, user coercion, and the absence of rigorous safety filters to bypass controls and extract PII or create harmful content. Vulnerability Chaining Rug Pulling Tool Poisoning Security Threats The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models https://arxiv.org/html/2407.17915v2
  • 26.
    Jailbreaking Vulnerability Chaining API mightonly be accessible to the LLM, but you can exploit security flaws in the API via the LLM. Many low/medium risk exploits lead to larger exploitation. Rug Pulling Tool Poisoning Security Threats Vulnerability Chaining https://ieeexplore.ieee.org/document/10494639
  • 27.
    Jailbreaking Vulnerability Chaining Rug Pulling Iftools can mutate their own definitions, then the tool can appear safe one day but then mutate into an unsafe function the next. Often used with shadowing, to redirect user to a harmful API server. Tool Poisoning Security Threats The “S” in MCP Stands for Security https://elenacross7.medium.com/-the-s-in-mcp-stands-for-security-91407b33ed6b
  • 28.
    Jailbreaking Vulnerability Chaining Rug Pulling ToolPoisoning A prompt injection attack that hides extra instructions in the tool definition (usually not visible) that cause the LLM to execute harmful actions. Used as a vulnerability chain with Shadowing and Rug Pulling Security Threats MCP Security Notification: Tool Poisoning Attacks https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks
  • 29.
  • 30.
    Guardrails for SaferFunction Calling mTLS User Confirmation Defensive Prompts API Response Sanitization OpenAI Schemas Version Pinned Tools
  • 31.
    Bigger is notBetter! We tend to think that the 280B parameter models with large context sizes are “better” – but really they are just more general. We have found that fine-tuned 3B-70B parameter models with tool use are more effective at specific tasks and easier to add guardrails to. Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks https://arxiv.org/pdf/2410.18890
  • 32.
    Final Thoughts What isthe security context for the Agent? a. Service account for the Agent itself b. Use the security context of the user c. Some combination of the two above? Is there a “proxy agent” security model? Do the LLMs need to improve or do the tools need to adapt to Agentic contexts? https://github.com/bbengfort/bank-agent
  • 33.
    Happy to takecomments and questions online or chat after the talk! benjamin@rotational.io https://rotational.io Thanks! Some images in this presentation were AI generated using Gemini Pro @bbengfort