Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Digging into Human Rights Documents


Published on

The objective of this research project is to develop a software toolset that mines a large set of unstructured text archives of human rights abuses. The software tool is designed to discover stories of hidden human rights victims and unidentified perpetrators. These stories do not exist in one document but as fragments of text embedded across multiple documents. Thus, these stories can be identified only when reading across a large number of related documents. The current approach of manually reading to identify such stories is incredibly tedious, time-consuming, unsystematic, and error-prone. Human readers find it difficult to correlate the identity of victims, perpetrators, and details of abuse that reside across multiple documents. Thus, the success of this project has significant implications for the human rights community, as currently there is a lack of adequate tool support for automatically reading and identifying stories from large-scale unstructured text document sets.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Digging into Human Rights Documents

  1. 1. Digging By Karthikeyan Umapathy, Associate Professor, School of Computing, University of North Florida. Digging into Human Rights Violations Human rights corpora contain reports from survivors and witnesses to human rights violations describing and relating information about a traumatic event. This is problematic as it creates multiple accounts of a singular event. As each person would recall the incidents and details regarding the event differently resulting in narrations that may not always line up with one another perfectly. Despite these variations, scholars need to create a narrative history that make up a collective memory and shape cultural identity regarding the traumatic event. Currently, humanities scholars perform co-reference analysis manually using traditional methods such as qualitative coding and string matching. Humanities scholars need analytical tools to dig into text corpora and extract relevant information. Unidentified victims, perpetrators, and other details of human rights violations are camouflaged by the scale of archival records of witness reports. Human Rights Violations are acts typically deemed as crimes against humanity. Examples of such acts include genocide, torture, slavery, rape, enforced sterilization or medical experimentation, and deliberate starvation. Most severe violations are committed by state (or non-state) when they abuse, ignore, or deny basic human rights due to wars of aggression, war crimes, and crimes against humanity. Human Rights Violation Data Lord’s Resistance Army NGO reports and statements (~3,000 reports and documents) Extraordinary Chambers in the Courts of Cambodia (2,672 court documents and transcripts) South African Truth and Reconciliation Commission (2,004 court documents and transcripts) World Trade Center Task Force Interviews (503 interviews) Bosnian Historical Memories (84 life stories) into Human Rights Documents Project 1: Context The objective of this research is to develop a software toolset that mines a large set of unstructured text archives of human rights abuses. The software tool is designed to discover stories of hidden human rights victims and unidentified perpetrators. These stories do not exist in one document, but as fragments of text embedded across multiple documents. We developed a framework to mine human rights corpora and construct narratives of abuses by identifying cross-documents co-reference of violation event patterns. Problem Team: This project was an international effort consisting of academic and industry team members. Ben Miller (Georgia State University, US), Karthikeyan Umapathy (University of North Florida, US), and Lu Xiao (Western University, Canada). Solution Human Rights Data Corpus Preprocessing Entity Recognizer and Anaphora Resolution Time Tagger Event Extractor Person Location Time Sequence of Events Story Timeline Fuzzy Cluster Anaphora Resolution Co-reference Logic Based on Event Types Database Similarity Scores Visualizations Constructed Storylines Team: Joshua Joiner (Master Thesis Student) and Karthikeyan Umapathy. Developed a Natural Language Processing system that facilitates cross-document co-reference of traumatic events from corpus containing collection of witness statements and interviews. Reconstructed stories of hidden victims and unidentified perpetrators from text fragments scattered across a large collection of related documents. Automating CIRI Ratings of Human Rights Reports Using GATE Project 2: This project involves parsing human rights reports produced by the U.S Government and rating the human practices for various countries. The U.S Human Rights Reports are annual reports that cover internationally recognized human rights practices in regards to individual, civil, political, and worker rights. F-Measure scores in the above table show accuracy of ratings by the automated system correctly. Project Objective CIRI coders rely on a manual process of reading through the Human Rights Reports and then applying ratings to each human rights practice for each country. The objective of this project is to automate the process of scouring the human rights country reports. Generating CIRI Rating using GATE Text Mining Tool GATE is an open source text mining platform used for developing custom text processing solutions. CIRI Ratings Comparison Denmark Empowerment Rights CIRI (Cingranelli-Richards) Human Rights Data Project rates the human rights practices of the U.S. Human Rights country reports. Students, scholars, policymakers, and analysts use the CIRI ratings for practical and research purposes.