HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

•

0 likes•16 views

dyyjkd

Enjoy HARE~

Technology

HARE: Explainable Hate Speech Detection with
Step‑by‑Step Reasoning
1
Yongjin Yang, Joonkee Kim, Yujin Kim, Namgyu Ho, James Thorne, Se-Young Yun
OSI LAB @ KAIST AI

HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
2
• Hate Speech Detec
ti
on is one of the tasks that needs to be most automated due to online
media.
• However, it is challenging because hate speeches are o
ft
en made implicitly, not necessarily
through explicit words.
• Previous Researches have annotated the meaning implied in hate speech and trained those
together.
Implicit Hate (ElSherief et al. 2021)
SBIC (Sap et al. 2021)

3
• Does training with annota
ti
ons really helps detec
ti
on? -> No!
• How about LLM with zero-shot inference, including Chain-of-Thought (CoT)? -> No!
• However, we found that using CoT for detec
ti
on may result in lower accuracy, but the reasoning
steps are sa
ti
sfying.
✓Bridging the reasoning gap between labels and implica
ti
ons, focusing on the conclusion process.
✓Providing various perspec
ti
ves regarding hate speech.
HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning

HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
4
• We aimed to have the language model learn from the reasoning steps generated by LLMs,
a
tt
emp
ti
ng to
fi
ll the reasoning gap.
• Without human annota
ti
on informa
ti
on (Fr-HARE) and the other with it (Co-HARE)
• We extract mul
ti
ple ra
ti
onales, which correctly predict the label.
Fr-HARE
Co-HARE

HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
5
Do LLM-generated ra
ti
onales improve
detec
ti
on performance?
• Fr-HARE and Co-HARE consistently
outperform other baseline methods,
regardless of the model size.
• Furthermore, the performance of our
method consistently improves as the
model size increases, in contrast to
baselines.

HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
6
Are HARE models more generalizable?
• Results on other two benchmarks indica
ti
ng
that our methods enhance the generalizability
by improving their understanding.
Does HARE improve the quality of generated
explana
ti
ons?
• Yes, and Fr-HARE exhibi
ti
ng slightly superior
performance, sugges
ti
ng that the
fl
exibility of
Fr-HARE leads to a more quali
fi
ed explana
ti
on.
• The ra
ti
onales generated by Co-HARE are
aligned more to human-wri
tt
en ra
ti
onales than
the ones generated by the model trained
directly with human-wri
tt
en ra
ti
onales.
• Fr-HARE and Co-HARE can be u
ti
lized for
di
ff
erent purposes.

HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
7
Case Study
• Our approach correctly iden
ti
fi
es
underlying hateful context in statements
that super
fi
cial models might classify as
non-o
ff
ensive.
• Our model also accurately recognizes the
historical background of Anne Frank,
discerning harassment against a Jewish
vic
ti
m, unlike baseline methods that miss
this signi
fi
cance.

HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
8
Conclusion
• In this research, we present HARE framework to improve the ability of the
language model to understand hate speech and provide clearer explana
ti
ons for
its decisions.
• We propose u
ti
lizing CoT reasonings extracted from LLMs in two variants to
overcome the logical gaps in human-annotated ra
ti
onales.
• When
fi
ne-tuned on the SBIC and Implicit Hate datasets, our methods achieve
superior detec
ti
on performance and be
tt
er quali
fi
ed explana
ti
ons.

Similar to HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

What We Learned from Four Years of Sciencing the Crap Out of DevOps - Nicole ...SeniorStoryteller

DOES16 San Francisco - Nicole Forsgren & Jez Humble - The Latest: What We Lea...Gene Kim

DOES 2016 Sciencing the Crap Out of DevOpsNicole Forsgren

Are you playing with me? - Towards VoisJonathan Bishop

Nlp research presentationSurya Sg

The Secrets of High Performance: Science Edition - Nicole Forsgren - Codemoti...Codemotion

DL.pptxSravaniRaparla

Community Teaching Plan Teaching Experience Paper 1Unsatisf.docxdonnajames55

Annotating Hate Speech Three Schemes At ComparisonRenee Lewis

Fypca4Haha Teh

NLP_KASHK:Evaluating Language ModelHemantha Kulathilake

Paradigms for conducting so tl and the research questions class 3tjcarter

Explanations and hypotheses ch 2Omar (TUBBS 128) Ventura VII

Harvard graduate student housing surveyUtkarsh Shivam

Sentiment Analysis.pptxShaliniVerma380300

Cross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docxannettsparrow

Natural Language Processing: L01 introductionananth

EMOTION DETECTION FROM TEXTcscpconf

2206 FAccT_inpersonWarNik Chow

Similar to HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning (20)

What We Learned from Four Years of Sciencing the Crap Out of DevOps - Nicole ...

DOES16 San Francisco - Nicole Forsgren & Jez Humble - The Latest: What We Lea...

DOES 2016 Sciencing the Crap Out of DevOps

Are you playing with me? - Towards Vois

Nlp research presentation

The Secrets of High Performance: Science Edition - Nicole Forsgren - Codemoti...

DL.pptx

Community Teaching Plan Teaching Experience Paper 1Unsatisf.docx

Annotating Hate Speech Three Schemes At Comparison

Fypca4

NLP_KASHK:Evaluating Language Model

Paradigms for conducting so tl and the research questions class 3

Explanations and hypotheses ch 2

Harvard graduate student housing survey

Sentiment Analysis.pptx

Cross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docx

Natural Language Processing: L01 introduction

EMOTION DETECTION FROM TEXT

2206 FAccT_inperson

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

A Year of the Servo Reboot: Where Are We Now?Igalia

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

🐬 The future of MySQL is Postgres 🐘RTylerCroy

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Slack Application Development 101 Slidespraypatel2

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

A Call to Action for Generative AI in 2024Results

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

A Year of the Servo Reboot: Where Are We Now?

Boost Fertility New Invention Ups Success Rates.pdf

🐬 The future of MySQL is Postgres 🐘

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Advantages of Hiring UIUX Design Service Providers for Your Business

The Codex of Business Writing Software for Real-World Solutions 2.pptx

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Slack Application Development 101 Slides

Exploring the Future Potential of AI-Enabled Smartphone Processors

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Data Cloud, More than a CDP by Matt Robison

Breaking the Kubernetes Kill Chain: Host Path Mount

Automating Google Workspace (GWS) & more with Apps Script

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

A Call to Action for Generative AI in 2024

HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

1. HARE: Explainable Hate Speech Detection with Step‑by‑Step Reasoning 1 Yongjin Yang, Joonkee Kim, Yujin Kim, Namgyu Ho, James Thorne, Se-Young Yun OSI LAB @ KAIST AI

2. HARE: Explainable Hate Speech Detection with Step ‑ by ‑ Step Reasoning 2 • Hate Speech Detec ti on is one of the tasks that needs to be most automated due to online media. • However, it is challenging because hate speeches are o ft en made implicitly, not necessarily through explicit words. • Previous Researches have annotated the meaning implied in hate speech and trained those together. Implicit Hate (ElSherief et al. 2021) SBIC (Sap et al. 2021)

3. 3 • Does training with annota ti ons really helps detec ti on? -> No! • How about LLM with zero-shot inference, including Chain-of-Thought (CoT)? -> No! • However, we found that using CoT for detec ti on may result in lower accuracy, but the reasoning steps are sa ti sfying. ✓Bridging the reasoning gap between labels and implica ti ons, focusing on the conclusion process. ✓Providing various perspec ti ves regarding hate speech. HARE: Explainable Hate Speech Detection with Step ‑ by ‑ Step Reasoning

4. HARE: Explainable Hate Speech Detection with Step ‑ by ‑ Step Reasoning 4 • We aimed to have the language model learn from the reasoning steps generated by LLMs, a tt emp ti ng to fi ll the reasoning gap. • Without human annota ti on informa ti on (Fr-HARE) and the other with it (Co-HARE) • We extract mul ti ple ra ti onales, which correctly predict the label. Fr-HARE Co-HARE

5. HARE: Explainable Hate Speech Detection with Step ‑ by ‑ Step Reasoning 5 Do LLM-generated ra ti onales improve detec ti on performance? • Fr-HARE and Co-HARE consistently outperform other baseline methods, regardless of the model size. • Furthermore, the performance of our method consistently improves as the model size increases, in contrast to baselines.

6. HARE: Explainable Hate Speech Detection with Step ‑ by ‑ Step Reasoning 6 Are HARE models more generalizable? • Results on other two benchmarks indica ti ng that our methods enhance the generalizability by improving their understanding. Does HARE improve the quality of generated explana ti ons? • Yes, and Fr-HARE exhibi ti ng slightly superior performance, sugges ti ng that the fl exibility of Fr-HARE leads to a more quali fi ed explana ti on. • The ra ti onales generated by Co-HARE are aligned more to human-wri tt en ra ti onales than the ones generated by the model trained directly with human-wri tt en ra ti onales. • Fr-HARE and Co-HARE can be u ti lized for di ff erent purposes.

7. HARE: Explainable Hate Speech Detection with Step ‑ by ‑ Step Reasoning 7 Case Study • Our approach correctly iden ti fi es underlying hateful context in statements that super fi cial models might classify as non-o ff ensive. • Our model also accurately recognizes the historical background of Anne Frank, discerning harassment against a Jewish vic ti m, unlike baseline methods that miss this signi fi cance.

8. HARE: Explainable Hate Speech Detection with Step ‑ by ‑ Step Reasoning 8 Conclusion • In this research, we present HARE framework to improve the ability of the language model to understand hate speech and provide clearer explana ti ons for its decisions. • We propose u ti lizing CoT reasonings extracted from LLMs in two variants to overcome the logical gaps in human-annotated ra ti onales. • When fi ne-tuned on the SBIC and Implicit Hate datasets, our methods achieve superior detec ti on performance and be tt er quali fi ed explana ti ons.

HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

Recommended

Recommended

More Related Content

Similar to HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

Similar to HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning (20)

Recently uploaded

Recently uploaded (20)

HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning