Streamline & Secure LLM Traffic Using
APISIX AI Gateway
Yilia Lin, 3 July, 2025, API Days Munich
Agend
a
0
1
0
2
0
3
Apache APISIX
Overview
APISIX AI Gateway
Overview
Proxy Multi-LLMs and
Token
-based Rate Limiting
04 Q&A
About Speaker
 Apache APISIX Committer
 Technical Writer at API7.ai
 LinkedIn: linkedin.com/in/yilialin/
 GitHub: github.com/Yilialinn
Yilia Lin
Apache APISIX Overview
01
Apache APISIX Overview
Donated to Apache Software Foundation by API7.ai in 2019
Ultra High-Performance: > 23,000 single-core QPS
Low Latency: < 0.6 ms average delay
Lightweight Architecture: Decoupled control plane and data plane
High Scalability: >100 open-source plugins
Open-Source without Vendor Lock-in: Apache License 2.0
APISIX AI Gateway Overview
02
The Rise of AI and New Challenges
AI Application Characteristics
High-concurrency LLM Services
Token-based Pricing Model
Dynamic Scalability
Content Sensitivity
New Challenges
• Traffic Governance
• Cost Optimization
• Multi-Version Management
• Content Security
APISIX AI Gateway Features
AI plugins
APISIX AI Gateway Architecture
More Resources
• https://apisix.apache.org/docs/apisix/plugins
• https://docs.api7.ai/hub
APISIX AI Gateway Characteristics
Open-Source
Out-of-the-box
High Scalability
High Stability
High Security
Practical Application of APISIX AI Gateway
03
Proxy Multi-LLMs and Implement Token-Based Rate Limiting
Workflow
Configure Multi-LLMs and
Implement Token-Based Rate Limiting
 demo: https://app.storylane.io/share/cjpfweudrq1n
 doc: https://docs.api7.ai/hub/ai-proxy-multi#configure-
instance-priority-and-rate-limiting
Thank You!
Yilia Lin
yilialin
Yilialinn

apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway (API7)

  • 1.
    Streamline & SecureLLM Traffic Using APISIX AI Gateway Yilia Lin, 3 July, 2025, API Days Munich
  • 2.
    Agend a 0 1 0 2 0 3 Apache APISIX Overview APISIX AIGateway Overview Proxy Multi-LLMs and Token -based Rate Limiting 04 Q&A
  • 3.
    About Speaker  ApacheAPISIX Committer  Technical Writer at API7.ai  LinkedIn: linkedin.com/in/yilialin/  GitHub: github.com/Yilialinn Yilia Lin
  • 4.
  • 5.
    Apache APISIX Overview Donatedto Apache Software Foundation by API7.ai in 2019 Ultra High-Performance: > 23,000 single-core QPS Low Latency: < 0.6 ms average delay Lightweight Architecture: Decoupled control plane and data plane High Scalability: >100 open-source plugins Open-Source without Vendor Lock-in: Apache License 2.0
  • 6.
    APISIX AI GatewayOverview 02
  • 7.
    The Rise ofAI and New Challenges AI Application Characteristics High-concurrency LLM Services Token-based Pricing Model Dynamic Scalability Content Sensitivity New Challenges • Traffic Governance • Cost Optimization • Multi-Version Management • Content Security
  • 8.
  • 9.
    AI plugins APISIX AIGateway Architecture More Resources • https://apisix.apache.org/docs/apisix/plugins • https://docs.api7.ai/hub
  • 10.
    APISIX AI GatewayCharacteristics Open-Source Out-of-the-box High Scalability High Stability High Security
  • 11.
    Practical Application ofAPISIX AI Gateway 03 Proxy Multi-LLMs and Implement Token-Based Rate Limiting
  • 12.
  • 13.
    Configure Multi-LLMs and ImplementToken-Based Rate Limiting  demo: https://app.storylane.io/share/cjpfweudrq1n  doc: https://docs.api7.ai/hub/ai-proxy-multi#configure- instance-priority-and-rate-limiting
  • 14.

Editor's Notes

  • #1 Good day, everyone! I'm excited to talk about how we can streamline and secure LLM traffic using APISIX AI Gateway. Why should you care about this? AI applications are growing explosively. If you're building AI applications, you're probably dealing with multiple LLM providers, worrying about API costs spiraling out of control, and concerned about security. Today, I'll show you how to solve all these challenges with a single, open-source solution.
  • #2 Here's what we'll cover in the next 25 minutes: - Apache APISIX Overview - APISIX AI Gateway Overview - Demo - proxy multiple LLMs and token-based rate limiting - 5 minutes for your questions at the end Let's dive in!
  • #3 First, let me introduce myself. I'm Yilia Lin, Apache APISIX Committer and Technical Writer at API7.ai. I'm not an engineer but a language learner, and I'm working on content marketing. You can find me on LinkedIn and GitHub. I'm always happy to connect and discuss API gateway technologies.
  • #4 Let me start with the foundation - Apache APISIX
  • #5 - Apache APISIX was donated to the Apache Software Foundation by API7.ai in 2019, and it's become one of the fastest-growing API gateway projects in the cloud-native ecosystem. - It has ultra-high Performance: over 23,000 single-core QPS with less than 0.6 ms average latency. - It is a lightweight API gateway with a decoupled control plane and data plane. This means you can scale your data processing independently from your configuration management. - APISIX is highly scalable, with over 100 open-source plugins, covering authentication, monitoring, traffic management, and security, etc. - It's completely open-source without vendor lock-in with Apache License 2.0. You can customize it for your specific needs without worrying about licensing restrictions. Here are some APISIX users, including Zoom, KFC, McDonald's, SHEIN, and NASA, covering e-commerce, catering, financial services, electronics, and aerospace.
  • #6 Now, let's see how we've extended APISIX specifically for AI use cases.
  • #7 We're witnessing an incredible rise in AI applications. But, with this comes unique challenges that traditional API gateways were not designed to handle. AI applications have some characteristics, like: - LLM Services need to handle high-concurrency requests at a time with varied response times - Pricing is based on the token used instead of the number of requests - Systems must scale up or down quickly, as traffic can spike suddenly - Content sensitivity: both input prompts and output responses require filtering for content safety. These characteristics bring us new challenges - Traffic Governance Gets Tricky: Traditional load balancing doesn't work well with LLMs. Traditional load balancers just distribute requests randomly. But LLMs aren't interchangeable! - Cost Optimization: This is probably the biggest pain point. Since LLM APIs charge by tokens, not requests, costs can vary wildly from cheap to sky-high. - Multi-Version Headaches: There are so many AI versions. OpenAI alone has 3.5, 4, 4-Turbo, 4o, etc. How do you smoothly shift traffic between them or A/B test different models? - Content Security: How to ensure prompts aren't malicious? How to prevent sensitive data from being leaked in responses? Traditional API security doesn't cover prompt injection attacks or content moderation. This is exactly where APISIX AI Gateway comes in.
  • #8 Let me walk you through the features of APISIX AI Gateway: - Supports multiple LLM providers to avoid vendor lock-in. You can route between various LLMs. - Token-based rate limiting: This is crucial to prevent API abuse and optimize cost management. - AI Rag: to combine the enterprise knowledge base to improve the generated output. - Observability of token usage: Track token usage to prevent API abuse and excessive billing. - Retry and fallback to other LLM services, ensuring service stability and quality. - Security: Includes prompt filtering and content moderation to ensure compliance with AI applications. This is crucial for enterprises, as malicious prompts can lead to reputational harm or data breaches.
  • #9 Now, let's take a look at the architecture of the APISIX AI gateway. APISIX AI Gateway builds on the solid foundation of Apache APISIX. All AI-specific features are implemented as plugins. This architecture can maintain APISIX's modular and extensible features. You can see all the plugin documentation on the official website of APISIX or the API7 plugin hub.
  • #10 In summary, APISIX AI Gateway is - Fully open-source, without vendor lock-in - All the AI features are out-of-the-box, and you can also combine them according to your requirements - High Scalability: Effortlessly scale up or down, ensuring optimal performance even during traffic surges. - High Stability: Robust and reliable, minimizing downtime and ensuring consistent service delivery for your AI applications. - High Security: Equipped with advanced security measures to protect against threats and ensure the safe deployment of AI applications.
  • #11 Now let's see an example. I want to show you how to realize token-based rate limiting using AI plugins.
  • #12 First, let's see the workflow of this example. In this case, we have two LLM instances - one is OpenAI Instance, one is Deepseek Instance. We want to route traffic between them intelligently and implement token-based rate limiting to control costs. The workflow would be: 1. A client sends a Request to the APISIX AI Gateway. 2. APISIX passes the request to the ai-proxy-multi plugin to evaluate routing logic. 3. The ai-rate-limiting plugin checks if the high-priority OpenAI instance's rate limit is exceeded. - If OpenAI's rate limit is not exceeded, the request is sent to the OpenAI instance. - If OpenAI's rate limit is exceeded, the request is forwarded to the low-priority one, that is DeepSeek instance. 4. The response from either OpenAI or DeepSeek is forwarded back through the ai-proxy-multi plugin and APISIX to the client. This logic ensures that requests are handled efficiently, using the high-priority instance when available and falling back to the low-priority one when rate limits are reached.
  • #13 OK, after understanding the workflow, let's see this demo.
  • #14 OK, that's all for my talk. Thank you for your attention! Feel free to connect with me afterward or reach out on LinkedIn and GitHub. Now, I'd love to hear your questions!