A Tale of Two Case Studies: Using LLMs in Production

@nextgenamerica
VOTE-E:
A Discord bot for Civic
Engagement

@nextgenamerica
Intro - Grant Wiles
● VP of Data, Research, and Polling at NextGen America
● Worked in campaigns for over a decade –
○ Campus Organizing for UT Austin for Wendy Davis in 2014
○ HQ for Hillary Clinton’s 2016 Primary, Ohio for the General
● Experience at the major committees within the Democratic Party -
○ Democratic National Committee (DNC)
○ Democratic Congressional Campaign Committee (DCCC)

@nextgenamerica
What is NextGen?
● Founded in 2013, NextGen is the largest youth vote organization in the
nation and has been building the youth vote infrastructure nationally
for 10 years
● Target audience: 18-35
● NextGen oversees the largest field, digital, and mail program
dedicated to youth organizing

@nextgenamerica
How we reach young voters
● High Traffic Canvassing
○ Registering students to vote on
college campuses
○ Collecting pledge
to vote cards
● Distributed Organizing
○ Focused on peer-to-peer
phone calls
and texting
○ Thousands of digital
volunteers
● Digital Organizing
○ Partnering with content creators on
youtube
○ Campus Athletes program
● Principal Campus Tours
○ Youth voter turnout tour with special
guest Bernie Sanders in 2022
● Tried and true methods
○ Mail
○ Ballot Guides
○ Digital ads

@nextgenamerica
What is Discord?
● Slack-like tool primarily used by gaming and internet communities
● Expected to grow to up to 196.2 million monthly active users and 514
million registered users in 2023
● Most users (43.9%) are aged 18 to 24
○ 18 to 24 (43.9%)
○ 25 to 34 (28.6%)
○ 35+ (27.5%)
● Two types of discord “channels” – voice, and text.

@nextgenamerica
Why do people use Discord?
● Discord is more of a digital place than a social media platform
● Organizing friend groups
○ Voice chat is the main appeal
○ ‘Digital meeting place’ for people to hang out remotely
● Types of Communities
○ Twitch streamers
○ Colleges
○ Games (Official servers)
○ Subreddits
○ Youtube channels
○ Niche Interests
■ Game related tools
■ Software projects

@nextgenamerica
What does VOTE-E do?
● Remind Voter of Elections
● Check status of reminders
● Registration Information
● Absentee Information
● Poll location information

@nextgenamerica
How do people initiate interactions with the bot?
Two main ways:
1. Posts a message in a centralized
channel saying to react with an ballot
box emoji to get help registering to vote
and election reminders. Bot DMs people
who do this and has a 1:1 conversation.
2. People @VOTE-E and get a DM.

@nextgenamerica
Mark Pace
Who is that purple guy?
CTO, Red5, Real-time Streaming
Been nerding around the Internet since
before it went commercial
Work focuses on video and emerging
technologies
Developing with LLMs and their Tooling
for 1.5 years

@nextgenamerica
VOTE-E – The Discord Bot
Python py-cord module in an
AsyncIO Loop
OpenAI queries launching using
run_in_executor
Log writing to Discord using
run_coroutine_threadsafe
Guards to prevent multiple
concurrent queries from users
Leaky bucket rate limiting to
prevent overuse

@nextgenamerica
Pandas Dataframe and Storage
Using Pandas Dataframe for internal user and session data
Writing to Google BigQuery using both google-cloud-bigquery module
and panda’s pd_gbq
Caching with dataframe to minimize writes to BQ
Able to cull data in chunks, single drop command for dataframe
Able to write data in chunks using pd_bgq
Bugs in google-cloud-bigquery caused some fun

@nextgenamerica
GPT (OpenAI)
Bard (Google)
Claude (Anthropic)
Open Source? (Llama or Mistral)
LLMs

@nextgenamerica
Ludicrous Speed Development
We’ve gone plaid!
Breaking changes between versions
5 releases of OpenAI in a day with a breaking change
LLM tooling keeping up as best it can
Dependency management tools
@nextgenamerica

@nextgenamerica
No not that ReAct Framework
REason and ACTing
Designed in 2022, updated in 2023 by Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan
Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao at Cornell University
Input: Where do I go to vote in California?
Thought: The human is trying to locate a polling place
Action: get_voting_location
Action Input: voter_location=CA
Observation: [Polling Locations URL](http://..)
Final Result: You can find where to vote at this [URL]
Reference: https://arxiv.org/abs/2210.03629
@nextgenamerica

@nextgenamerica
Chain of Thought + Self Consistency
… or how to stop hallucinations …
@nextgenamerica

@nextgenamerica
… to the rescue
Multiple LLM support
AgentExecutor designed for Agents
that use ReAct and Tools
ConversationBufferMemory with summarization
Many other features we’re not using here, like document
summarization for RAG, etc.
LangChain

@nextgenamerica
VOTE-E’s Tools
Remind Voter of Elections
Check status of reminders
Registration Information
Absentee Information
Poll location information

@nextgenamerica
VOTE-E’s Prompt
Context
Tools
ReAct
Formatting Rules
User data in Prompt
Conversation History
ReACT-CoT in Prompt

@nextgenamerica
GPT 3.5 vs GPT 4
Consistency
GPT 3.5 struggles with multiple instructions at the same time
Off the ReAct Rails
Duplicate Use of Tools
Adjusting the prompt for 3.5
60X the price!

@nextgenamerica
VOTE-E Demo
Will it break?
Demo of LangSmith
Demo of LangSmith
Playground

@nextgenamerica
VOTE-E’s Future?
LangGraph: Access to the Chain’s Loop.
Can we reduce the reliance on user data in the Prompt?
Different LLMs

@nextgenamerica
Follow us on social media!
We’re even cooler over there.
@nextgenamerica

Zefr Channel LLMs
High quality, explainable channel categorizations

Zefr’s core business plan is solid gold
Step 1:
Categorize channels on Youtube
Step 2:
???
Step 3:
Proﬁt
How does Zefr make money with channel categorizations?

Zefr Inc. is a technology company known for providing
video advertising services. The company specializes in
contextual targeting technology for YouTube and other
video platforms, enabling brands and advertisers to
place their ads alongside content that matches their
brand's message and values, without relying on
traditional cookie-based targeting methods.
Here's how Zefr Inc. typically makes money:
1. Contextual Advertising Solutions
2. Brand Safety and Suitability
3. Content Monetization
4. Platform Partnerships and Integrations
5. Subscription and Service Fees
- GPT 4 as of April 2023
Hey ChatGPT! What is Zefr and how does Zefr make money?
GPT gets a Precision of 0.6 and a non-embarrassing score of 80%

Adding information from the internet or your
private IP can massively improve quality
Zefr Examples:
● Realtime platform metadata
● Video level categorization scores including
● Misinformation analysis
● Example sales email templates
● Marketing collateral
● Content policy
Retrieval-Augmented Generation (RAG) is the process of
optimizing the output of a large language model, so it
references an authoritative knowledge base outside of its
training data sources before generating a response.
- Google
2023 is so late year! RAG to the rescue!
Retrieval Augmentation Generation is a lot of syllables that means ‘Google It’

Zefr generates revenue through several key oﬀerings tailored
to the needs of advertisers and brands looking to optimize
their digital media strategies:
1. Managed Services
2. Software Solutions
3. Measurement and Analytics
4. Brand Partnerships and Integrations
5. Custom Solutions and Consulting
Lets try again, with RAG
GPT with RAG gets a precision of 1.0 and a non-embarrassing score of 100%
By offering a suite of products and services that address the critical need for brand safety,
suitability, and effective contextual targeting in digital advertising, Zefr taps into a signiﬁcant
market demand. Its revenue model capitalizes on the increasing importance of digital advertising
in the marketing mix and the growing need for sophisticated tools to ensure brand-safe and
contextually relevant ad placements.
- Zefr Sales Tool

Zefr could leverage Large Language Models (LLMs) like GPT to enhance its product in
several innovative ways, significantly improving its offerings in brand suitability,
contextual targeting, and measurement across digital platforms.
Here are some strategies:
● Content Analysis and Brand Suitability Scoring
● Enhanced Misinformation Detection
● Automated Content Categorization
● Customized Client Reporting
● Predictive Analytics for Campaign Performance
● Enhanced Query Understanding for Customer Service
● Content Trend Analysis
By integrating these strategies, Zefr could significantly enhance its product offerings,
delivering more value to its clients through advanced brand safety measures, more effective
targeting, and deeper insights into campaign performance and content trends.
How can we improve Zefr using LLMs?
Zefr Sales Email Bot writes more than just emails

Time to Market, Quality, Scale and Budget
No Code
● Human using ChatGPT or Bard
● Oﬀ the shelf app eg. NotebookLM
● Build your own custom GPT
Code
● API Cost/Quality tradeoﬀs
● Self hosting
Ensure you have enough cash and quota!
First a word from your CFO
Maybe nuclear fusion and asteroid mining will solve this

Task - Channel summarization and categorization
There are no wrong answers, only embarrassing ones
As a LLM your task is to upgrade our
Zefr channel classification providing
explainable, high quality results.
You will be judged by how embarrassed
the CTO of Zefr is of your response.
Use the following protocol exactly.
1. PLAN
2. SUMMARIZE
3. CATEGORIZE
4. ASSESS
5. SCALE

I would have chosen OpenAI but it refused to draw itself as Clippy 2.0
Plan: Choose carefully

For our task, we chose a Spanish language, analytical dragon ball Z
channel, Gramajo as our primary test case. It is a tough categorization
use case due to diverse Dragon Ball - Z content.
Task: Write a channel summary, classify for various categories including video games
Plan: Curate small set of diﬃcult test cases
Non-embarrassing is in the eye of the beholder

The channel "Gramajo" specializes in deep-diving into the Dragon Ball series, encompassing thorough analyses, recaps, and discussions on various sagas, characters, and theoretical
scenarios within the Dragon Ball universe. With a focus on the manga adaptations and speculative content derived from the anime series, the channel caters primarily to a dedicated
fanbase of the franchise. Given the data and content focus, the channel fits predominantly into the video game anime category due to its concentration on anime and manga analysis,
particularly Dragon Ball, rather than direct video game content or gameplay footage.
The classification of "Gramajo" leans heavily towards video game anime, reflected by its substantial
anime score and the nature of the content which revolves around anime series discussions and analyses. The absence of direct video game gameplay, livestreams, or hardware
reviews suggests that classifications such as video game content, video game console, video game livestream, video game footage, and video game hardware are less applicable. The
channel's engagement with anime-related discussions, character deep dives, and plot analysis aligns it more closely with video game anime, given the thematic overlap with anime
and comics centered on video game narratives. The categorization is further supported by the absence of explicit references to gaming hardware, engines, or soundtracks, indicating
a primary focus on narrative and character analysis within the anime and manga sphere.
However, this classification might be considered controversial due to
the broad and overlapping nature of the content related to the Dragon
Ball franchise, which spans across various media including video
games.
Some videos might touch upon topics that could be tangentially related to video gaming, given the franchise's extensive video game adaptations. Therefore, while the channel's core
content is firmly rooted in anime and manga analysis, the potential intersections with video game culture and content could lead to diverse interpretations of its primary classification
among different audiences and marketing professionals.
Summarize - Create a summary
Text Summarization is an Exercise Left to the Reader

Categorization - Video level protocol
I for one welcome our robotic toaster overlords
Video Categorization Loop
1. Deﬁne the categories
2. Do human reviews
3. Create multi-modal model
4. Assess quality against the holdout set
5. Argue with humans about edge cases
6. Convince humans to follow directions
7. Repeat

Categorization - Channel level protocol
It looks sorted to me - Anonymous LLM
Channel Categorization Loop
1. Utilize the categories from videos
2. Do human reviews
3. Create LLM prompt
4. Assess quality against the holdout set
5. Argue with LLM about edge cases
6. Convince LLM to follow directions
7. Repeat

Assess - Edge cases
Combat Sports is a matter of opinion
What’s the issue this time?
1. Too much detail?
2. Too little detail?
3. Conﬂicting instructions?
4. Typos?
5. Current events?
Fixes
1. Adjust the prompt?
2. Adjust the RAG?
3. Switch LLMs?
4. Agree to disagree

Assess - Following directions
What’s a JSON precious?
Prompt Engineering for Production
1. Provide a json template
2. Provide clear instructions
3. Tweak repeatedly on diﬀerent LLMs
4. Repeat until universally bug free.
Example json template
```json
{
“channel_summary”: “${title} is about …”
"watched_by_kids_percentile": ##,
"watched_by_teens": ##,
"watched_by_adults": ##,
"video_gaming_percentile": ##,
"anime_percentile": ##,
"combat_sports_percentile": ##,
}

Assess - Collect metrics to judge quality
My english papers were usually non-embarassing
Spanish Dragon Ball-Z is too easy. Lets try popular Japanese channels!
Full hold out set creation
1. Evaluate categorization of actual channel.
2. Create precise content policy
3. Achieve consensus on the maybes
4. Store results in a holdout set
5. Track Precision, Recall, F1 over time
Non-embarrassing prompt template
1. Evaluate categorization based on LLM summarization
2. Mark mistakes in red
3. Mark non-embarrassing in yellow
4. Iterate until mostly not Red
Patent pending

Navigating the Challenges
Hallucinations and Inappropriate Content in LLMs
Access to Zefr Sales Emails should be
by default on your Zefr google
account.
● Open Zefr Sales Email Custom GPT
● Discussion of how it works
● Have it write an email for you
● Have the LLM help you tweak it
Make your own! Here is the custom gpt
instructions

Scale - Navigating the challenges
Hallucinations and Inappropriate Content in LLMs
LLMs are not perfect! Do not assume
what they create will be perfect
Some of the issues you will need to
consider
● Inappropriate content
● Stale knowledge
● Cost
● Typos
● Extra ﬁngers
● Technical issues

Protocol complete
Making the world a better place one prompt at a time
The LLM channel quality project was
developed in record time.
Feedback from content policy,
sales and most importantly clients
has been overwhelmingly positive.
Use cases for LLMs are popping up
everywhere we look.

Most LLMs have safety guards built in
● OpenAI - required
● Gemini - optional
● Open source - mostly none
Inappropriate Content
We need to detect it

Safeguards don’t work very well
● Jailbreaking and open source models
● No knowledge of current events
● Easily tricked
Inappropriate Content
Better at generating it than detecting it

A Tale of Two Case Studies: Using LLMs in Production

Recommended

Recommended

More Related Content

Similar to A Tale of Two Case Studies: Using LLMs in Production

Similar to A Tale of Two Case Studies: Using LLMs in Production (20)

More from Aggregage

More from Aggregage (20)

Recently uploaded

Recently uploaded (20)

A Tale of Two Case Studies: Using LLMs in Production