HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning
1. HARE: Explainable Hate Speech Detection with
Step‑by‑Step Reasoning
1
Yongjin Yang, Joonkee Kim, Yujin Kim, Namgyu Ho, James Thorne, Se-Young Yun
OSI LAB @ KAIST AI
2. HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
2
• Hate Speech Detec
ti
on is one of the tasks that needs to be most automated due to online
media.
• However, it is challenging because hate speeches are o
ft
en made implicitly, not necessarily
through explicit words.
• Previous Researches have annotated the meaning implied in hate speech and trained those
together.
Implicit Hate (ElSherief et al. 2021)
SBIC (Sap et al. 2021)
3. 3
• Does training with annota
ti
ons really helps detec
ti
on? -> No!
• How about LLM with zero-shot inference, including Chain-of-Thought (CoT)? -> No!
• However, we found that using CoT for detec
ti
on may result in lower accuracy, but the reasoning
steps are sa
ti
sfying.
✓Bridging the reasoning gap between labels and implica
ti
ons, focusing on the conclusion process.
✓Providing various perspec
ti
ves regarding hate speech.
HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
4. HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
4
• We aimed to have the language model learn from the reasoning steps generated by LLMs,
a
tt
emp
ti
ng to
fi
ll the reasoning gap.
• Without human annota
ti
on informa
ti
on (Fr-HARE) and the other with it (Co-HARE)
• We extract mul
ti
ple ra
ti
onales, which correctly predict the label.
Fr-HARE
Co-HARE
5. HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
5
Do LLM-generated ra
ti
onales improve
detec
ti
on performance?
• Fr-HARE and Co-HARE consistently
outperform other baseline methods,
regardless of the model size.
• Furthermore, the performance of our
method consistently improves as the
model size increases, in contrast to
baselines.
6. HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
6
Are HARE models more generalizable?
• Results on other two benchmarks indica
ti
ng
that our methods enhance the generalizability
by improving their understanding.
Does HARE improve the quality of generated
explana
ti
ons?
• Yes, and Fr-HARE exhibi
ti
ng slightly superior
performance, sugges
ti
ng that the
fl
exibility of
Fr-HARE leads to a more quali
fi
ed explana
ti
on.
• The ra
ti
onales generated by Co-HARE are
aligned more to human-wri
tt
en ra
ti
onales than
the ones generated by the model trained
directly with human-wri
tt
en ra
ti
onales.
• Fr-HARE and Co-HARE can be u
ti
lized for
di
ff
erent purposes.
7. HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
7
Case Study
• Our approach correctly iden
ti
fi
es
underlying hateful context in statements
that super
fi
cial models might classify as
non-o
ff
ensive.
• Our model also accurately recognizes the
historical background of Anne Frank,
discerning harassment against a Jewish
vic
ti
m, unlike baseline methods that miss
this signi
fi
cance.
8. HARE: Explainable Hate Speech Detection with Step
‑
by
‑
Step Reasoning
8
Conclusion
• In this research, we present HARE framework to improve the ability of the
language model to understand hate speech and provide clearer explana
ti
ons for
its decisions.
• We propose u
ti
lizing CoT reasonings extracted from LLMs in two variants to
overcome the logical gaps in human-annotated ra
ti
onales.
• When
fi
ne-tuned on the SBIC and Implicit Hate datasets, our methods achieve
superior detec
ti
on performance and be
tt
er quali
fi
ed explana
ti
ons.