This document provides an overview of Anastasia Zhukova's research on automated identification of media bias through word choice and labeling in news articles. It includes her CV and background in natural language processing. Her motivation is that word choice and labeling can impact public perception and decision making. Her research question is how to identify instances of bias in concepts referring to groups of people across news articles without training data. The document outlines her multi-step merging approach to cluster related mentions and evaluate its performance on different concept types. It also discusses drawbacks of the initial approach and goals for improving the methodology.
This document discusses open data and data journalism. It provides definitions of open data, describes various open data portals from around the world, and organizations that support open data like the Open Data Institute. It also defines data journalism and discusses data expeditions that have been conducted in Russia to train journalists in open data skills. Case studies and outcomes of previous data expeditions in Russia are summarized.
Not-so-obvious Online Data Sources for Demographic ResearchIngmar Weber
Slides from ICWSM'17 workshop on Social Media for Demographic Research (https://sites.google.com/site/smdrworkshop/program). Data sets include Facebook's ad audience estimates, Google Correlate, online genealogy and much more. Contact Ingmar directly to learn more.
"Fake news" and disinformation management is something that is connected to individuals but also to organizations. How do we have to deal with fake news? Are fake news an opportunity for librarians in order to be a main node in society?
Digital Breadcrumbs- Investigating Internet Crime with Open Source Intellige...Nicholas Tancredi
This document discusses how open source intelligence (OSINT) tools and techniques can help law enforcement investigate internet crimes. It provides examples of how social media analysis and dark web investigations have helped identify suspects and combat issues like child pornography. The document also references statistics and studies on topics like drug markets on the dark web and how social media is exploited for criminal activities. It advocates for law enforcement to make use of social media searches and data mining frameworks to facilitate cybercrime investigations and intelligence gathering.
The document summarizes a presentation given by Dr Ibrar Bhatt and Dr Alison MacKenzie at the SRHE Annual Conference in December 2017 on the topic of digital literacy and ignorance. It discusses how literacy relates to the production and cultivation of knowledge as well as ignorance. Examples provided include a 1829 Georgia law prohibiting the education of slaves and a quote about the spread of misinformation on social media. The document also outlines a current pilot study investigating undergraduate writing and digital literacy practices across different disciplines and how this relates to knowledge production and the sustenance of ignorance. It closes with some final points and questions about the implications of "post-truthism" and how universities can help students engage critically with information.
Presentation by Gabriela Jacomella at the 2019 CMPF Summer School for Journalists and Media Practitioners - Covering Political Campaigns in the Age of Data, Algorithms & Artificial Intelligence
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...Liliana Bounegru
The document discusses the rise of data journalism and its impact on the field of journalism. It provides three key points:
1) Data journalism is transforming how news is sourced, produced, and delivered through the use of data, quantitative methods, and computational techniques. This includes the rise of programmer-journalists and data-driven investigative reporting.
2) While data journalism has faced criticisms around objectivity and democratic representation, it also provides benefits like enhancing transparency, accountability, and efficiency. It allows for new forms of storytelling and knowledge production.
3) Data journalism is discussed and studied in academic literature around its implications for the democratic functions of media, computational culture, and reconfiguring traditional journalism epistem
Digital Breadcrums: Investigating Internet Crime with Open Source Intelligenc...Nicholas Tancredi
Capstone project for a 12-week online course with the International Association of Crime Analysts. My topic was on how crime and intelligence analysts are using open source intelligence (OSINT) to investigate Internet crime.
This document discusses open data and data journalism. It provides definitions of open data, describes various open data portals from around the world, and organizations that support open data like the Open Data Institute. It also defines data journalism and discusses data expeditions that have been conducted in Russia to train journalists in open data skills. Case studies and outcomes of previous data expeditions in Russia are summarized.
Not-so-obvious Online Data Sources for Demographic ResearchIngmar Weber
Slides from ICWSM'17 workshop on Social Media for Demographic Research (https://sites.google.com/site/smdrworkshop/program). Data sets include Facebook's ad audience estimates, Google Correlate, online genealogy and much more. Contact Ingmar directly to learn more.
"Fake news" and disinformation management is something that is connected to individuals but also to organizations. How do we have to deal with fake news? Are fake news an opportunity for librarians in order to be a main node in society?
Digital Breadcrumbs- Investigating Internet Crime with Open Source Intellige...Nicholas Tancredi
This document discusses how open source intelligence (OSINT) tools and techniques can help law enforcement investigate internet crimes. It provides examples of how social media analysis and dark web investigations have helped identify suspects and combat issues like child pornography. The document also references statistics and studies on topics like drug markets on the dark web and how social media is exploited for criminal activities. It advocates for law enforcement to make use of social media searches and data mining frameworks to facilitate cybercrime investigations and intelligence gathering.
The document summarizes a presentation given by Dr Ibrar Bhatt and Dr Alison MacKenzie at the SRHE Annual Conference in December 2017 on the topic of digital literacy and ignorance. It discusses how literacy relates to the production and cultivation of knowledge as well as ignorance. Examples provided include a 1829 Georgia law prohibiting the education of slaves and a quote about the spread of misinformation on social media. The document also outlines a current pilot study investigating undergraduate writing and digital literacy practices across different disciplines and how this relates to knowledge production and the sustenance of ignorance. It closes with some final points and questions about the implications of "post-truthism" and how universities can help students engage critically with information.
Presentation by Gabriela Jacomella at the 2019 CMPF Summer School for Journalists and Media Practitioners - Covering Political Campaigns in the Age of Data, Algorithms & Artificial Intelligence
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...Liliana Bounegru
The document discusses the rise of data journalism and its impact on the field of journalism. It provides three key points:
1) Data journalism is transforming how news is sourced, produced, and delivered through the use of data, quantitative methods, and computational techniques. This includes the rise of programmer-journalists and data-driven investigative reporting.
2) While data journalism has faced criticisms around objectivity and democratic representation, it also provides benefits like enhancing transparency, accountability, and efficiency. It allows for new forms of storytelling and knowledge production.
3) Data journalism is discussed and studied in academic literature around its implications for the democratic functions of media, computational culture, and reconfiguring traditional journalism epistem
Digital Breadcrums: Investigating Internet Crime with Open Source Intelligenc...Nicholas Tancredi
Capstone project for a 12-week online course with the International Association of Crime Analysts. My topic was on how crime and intelligence analysts are using open source intelligence (OSINT) to investigate Internet crime.
The Caucasus School of Journalism and Media Management trains students in investigative journalism through hands-on skills-based programs. It discusses challenges in teaching journalism in the Caucasus region affected by conflicts. Topics discussed include the changing role of journalists due to technology, the definition of news, and crowdsourcing data. Initiatives at the school include the Friedman-GIPA investigative journalism prize and collaboration labs. It aims to support transparency, accuracy, and teaching reporting on sensitive issues impartially. The school also provides professional training and joins the European data journalism network to develop specialized courses focusing on data in journalism.
This research project analyzes how perceptions of immigration and immigrants have changed over time according to media portrayals. Newspapers were analyzed from 1985-2013 to determine how they frame discussions of immigrants and immigration policy. Articles from 24 papers discussing immigration were coded based on topics like economic impacts, origins of immigrants, and politics. Future directions include completing the coding, analyzing public opinion data, and exploring additional media like Spanish-language outlets to broaden perspectives.
Ramon van den Akker. Fairness of machine learning models an overview and prac...Lviv Startup Club
This document discusses fairness in machine learning models. It begins with motivating examples of algorithms that were found to be biased, such as a recidivism prediction tool that was biased against black individuals. It then covers operationalizing fairness through frameworks like transparency and explainability. Finally, it discusses approaches for achieving fairness by design, such as preprocessing the data, adding randomness to predictions, or tailoring new algorithms with fairness constraints. The author notes there are inherent tradeoffs between performance and fairness that require difficult choices.
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...Liliana Bounegru
The document discusses reimagining data journalism through the lens of data infrastructures. It provides examples from digital methods research that investigate digital platforms and data creation. These include mapping right-wing groups in Europe using web analysis, examining counter-jihadist networks on Facebook, and analyzing climate change negotiations through transcripts and indicators. Examples from journalism that engage with data infrastructures include reverse engineering Netflix's film genres, mapping misinformation spread on Twitter, examining email targeting models, and making memory politics on social media visible. The document promotes accounting for socio-technical conditions of data and investigating how data infrastructures could be composed differently.
AI-generated news and misinformation during electionsPaige Morrow
This document discusses issues around AI-generated news and misinformation during elections. It notes that while AI is used for investigative journalism, automated reporting, content moderation and fact-checking, there are also human rights concerns around algorithmic bias, uncertainty in outputs, and the generation and spread of misinformation like deepfakes and bot accounts. Several studies cited find misinformation poses a major problem for democracy by eroding trust and enabling less informed decisions. The document explores challenges like reliance on ad revenue and social media gatekeeping, and reviews some regulatory responses in Germany, France and the UK to address these issues.
I gave this presentation at Deutsche Telekom AG's Digital Ethics Conference in Bonn on March 13 2019. It provides the background for how biases may occur in machine learning systems and what may go wrong if not corrected (or minimized).
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...ALexandruDaia1
Our primarly goal was to detect clusters via gensim libraries in news data consisting ofinformation regarding health and threats. We identified clusters for the periodscorresponding: i) Jannuary 2006 until the end of 2019, as December 2019 is considered thefirst month in which information about CORONVIRUS COVID-19 was made public; ii)between the 1st of Jannuary 2019 and 31st December 2019; and iii) between the 31st ofDecember 2019 and the 14th of April 2020. We conducted experiments using naturallanguage on open source intelligence data offered generously by brica.de, a providerspecialized in Business Risk Intelligence & Cyberthreat Awareness.
Each question should be done on a separate word document, with referwildmandelorse
Each question should be done on a separate word document, with references.
Question 1
Revelations about the collection of vast amounts of data on telephone and computer use by the National Security Agency (NSA) have raised concerns about the threat to privacy. At the same time, many private companies collect extensive information on computer users. Read Craig Mundie (2014) "
Privacy
(Links to an external site.)Links to an external site.
pragmatism: A
(Links to an external site.)Links to an external site.
focus on data use, not data collection (Links to an external site.)Links to an external site.
." (right-click to open in new window)Â
Foreign Affairs,
Mar.-Apr. 2014. Retrieved from Columbia College online library,
Global Issues in Context
database. Look for a copy in
Files
Also, take a look at this one page brief by the
National Review (Links to an external site.)Links to an external site.
on the unmasking of U.S. Citizens. Does this review give you a reason for heartburn over not just the unmasking, but then leaking to the press? Click on the National Review link or see in course materials Unit 4. For background on the legal requirements of NSA unmasking and the 702 warrants see Hirsch and Maxey (March 24, 2017).
What are the arguments for changing the way we think about privacy in the modern world?
What concerns do you have about the extent to which government and corporations store and use information collected from citizens to spy on those same citizens? What benefits do you see?
References
Hirsch, S. and Maxey, L., (March 24, 2017). "Unmasking of U.S. Citizens in NSA Intercepts," The Cipher Brief. Retrieved from
https://www.thecipherbrief.com/article/exclusive/north-america/unmasking-us-citizens-found-nsa-intercepts-1091 (Links to an external site.)
Question 2
Watch the Frontline episode “Top Secret America: 9/11 to the Boston Bombings” that recounts a history of American intelligence efforts since 9/11. The program touches on many highly controversial intelligence issues including the justification for the Iraq War, the use of rendition and torture to gather intelligence, the use of drone weapons, broad surveillance of telephone and computer communications, widespread use of license plate and facial recognition technology, and camera surveillance. There has been a substantial investment in intelligence capabilities to conduct the War on Terror and prevent terrorist attacks. Is this effort relevant today?
Please address the following questions with brief answers (about 150 to 200 words):
Write using third person perspective
Justify using authoritative sources (peer review journals, published sources, FBI, etc.)
What questions does “Top Secret America” raise about U.S. intelligence gathering?
Is the gathering of intelligence infringing on civil liberties? Justify your answer.
Is the intelligence effort making America safe? Justify your answer.
Are these efforts relevant today?
Video
Frontline (2013). ...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...g8briel
In light of new revelations about government warrantless wiretapping and electronic surveillance what role do librarians have in educating our patrons about digital privacy and security issues? Given that digital privacy is further complicated by for-profit Internet companies services, such as those provided by Facebook and Google, are our users savvy enough to understand threats to their information in this increasingly complex digital landscape? This presentation will explore issues related to current events and information security with an eye towards the implications for information literacy standards; brief examination of tools used to enhance information privacy; and discuss how librarians might play a role in helping users become more information aware.
What Actor-Network Theory (ANT) and digital methods can do for data journalis...Liliana Bounegru
Slides from a talk I gave at the University of Ghent on 21 October 2014 about how Actor-Network Theory (ANT) and digital methods can be used to study and inform data journalism.
The document is a study guide for the Human Rights Council that discusses two topics: the right to privacy in the digital age and addressing the increase in domestic violence. For topic A on the right to privacy, the summary provides background on worldwide surveillance programs like the Five Eyes alliance and how digital technology has impacted privacy. It outlines different bloc positions, with China and Russia expressing concerns about privacy violations and data collection, while the UK and US take different regulatory approaches. The timeline highlights key events in surveillance programs and social media privacy issues.
Attitudes Of Second Year Computer Science Undergraduates Toward PlagiarismSean Flores
This document summarizes a study that examined the attitudes of 85 second-year computer science students toward plagiarism at a university in the Caribbean. The study found that while 74% of students could define plagiarism, only 5% included self-plagiarism in their definition. Students had a moderate, positive attitude toward plagiarism, which is undesirable, and low to moderate subjective norms. There were statistically significant differences in positive attitudes and subjective norms between genders and ages. Many students believed self-plagiarism should not be punished. The author recommends expanding the plagiarism policy to include self-plagiarism and creating a systematic academic honesty education program tailored to gender and age.
1. Citizen journalism, where ordinary people play a role in news gathering and reporting, is growing through new technologies like mobile devices. Some predict that by 2021, half of all news will be produced through citizen journalism.
2. Citizen journalism provides both benefits like immediacy and multiple perspectives, as well as challenges around reliability and verifying information.
3. For citizen journalism to reach its potential, people need media literacy skills to be informed consumers and producers of news. Education has an important role in cultivating these skills.
Ethical Issues in Machine Learning Algorithms. (Part 3)Vladimir Kanchev
The presentation deals with ethical issues in a few currently widely used machine learning (or AI) technologies and algorithms. The ML applications are described in details, their current state of the art, their specific challenges and ethical problems. Current solutions from academic and industrial perspective are given. A mixture of academic and applied sources are used for the presentation - it aims to be more interesting for students and practitioners.
In your responses, review at least one of the articles provided by y.docxannettsparrow
In your responses, review at least one of the articles provided by your peer and expand on their description. Raise
the level of discussion by considering these things: What open-ended questions can
you
ask? How can you point classmates to sources that could be of interest or use to them?
Minimum of 75 words.
1.
Lorne
-Agrawal, Gans, and Goldfarb (2019) wrote my scholarly source and detailed the impact artificial intelligence has on prediction as it relates to the labor market. In their journal entry, the authors go into detail on how machine learning and AI are using "prediction" based on data to replace specific career fields such as demand forecasting and human resources (Agrawal, Gans, & Goldfarb, 2019). The authors do not seem to have any biases present. The article uses facts using data and is backed up by references. The article also is reliable as it is in a peer-reviewed journal. The strengths of the article show, the authors provide excellent and precise detail of their findings. One limitation of the article is the minimal use of graphs or charts that give visuals to certain parts of their data.
Florida (2019) wrote my popular source, and the bulk of the article summarizes a study done by the Brookings Institution, where a study conducted to see artificial intelligence's impact on high skilled jobs. The author of the article details critical takeaways from the study, although there is little information about the conduction of the research nor the methods used. The author's biases shine through somewhat, as the article is suggesting that AI will have a direct impact on highly skilled labor and asks open-ended questions leaving the reader to decide based on the item itself. There is no evidence of the article being peer-reviewed, thus questioning its reliability. The strength of the material is its appeal to readers by making information and data "bite-sized" and easy to follow. However, there are vital aspects the article is lacking, such as the methods used during the study.
2.
Melissa
-I chose to investigate two sources that deal with a lack of inclusive education for students with disabilities.
I used the Ashford University Library to find my scholarly source. The article “Missing the mark or scoring a goal? Achieving non-discrimination for students with disability in primary and secondary education in Australia: A scoping review” (2020), addresses a lack of inclusive education for students with disabilities, even though it is illegal to discriminate against a student because of their disability in Australia (Duncan et al., 2020). As this source is peer-reviewed and recently published, it has strong reliability and strength in the information presented. The review discusses 18 peer-reviewed published articles dealing with legislature and case law regarding the education of students with disabilities (in Australia) (Duncan et al., 2020). The authors site the limitation of not including recent articles etc. pro.
Presentation version of the paper "Data is our Future, Welcome to the Age of Infomagination" by Matt Sadler (full paper available here: http://infomagination.typepad.com/files/matt-sadler---infomagination-1.pdf)
Era of Sociology News Rumors News Detection using Machine Learningijtsrd
In this paper we have perform the political fact checking and fake news detection using various technologies such as Python libraries , Anaconda , and algorithm such as Naïve Bayes, we present an analytical study on the language of news media. To find linguistic features of untrustworthy text, we compare the language of real news with that of satire, hoaxes, and propaganda. We are also presenting a case study based on PolitiFact.com using their factuality judgments on a 6 point scale to prove the feasibility of automatic political fact checking. Experiments show that while media fact checking remains an open research issue, stylistic indications can help determine the veracity of the text. Chandni Jain | S. Vignesh ""Era of Sociology News Rumors News Detection using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23534.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23534/era-of-sociology-news-rumors-news-detection-using-machine-learning/chandni-jain
What's in the News? Towards Identification of Bias by Commission, Omission, a...Anastasia Zhukova
The document proposes a pipeline to identify media bias through the commission, omission, and source selection (COSS) of information. The pipeline would analyze a seed article against other event-related articles to identify original versus reused content. It would also analyze patterns of information flow across articles to explore how information is reused over time and how the framing may change. The goal is to leverage techniques in plagiarism detection and paraphrase identification to better understand source origins and how reused information altered in polarity.
This document provides an overview of a workshop on using data for science journalism. It discusses several approaches for incorporating data into stories, including: mapping controversies on issues like climate change; using data to tell stories in science and technology; and analyzing networks to reveal connections. Specific techniques are illustrated, such as mapping the influence of climate change skeptics online and connections between counter-jihadist groups on Facebook. The document also reviews several tools and resources for data journalism.
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Anastasia Zhukova
Submitted: 2017/03
Abstract: While following the news, one can notice the same story can have different impact depending on which news agent tells it. One reason for this is how the facts are framed. Framing is described by communication sciences as an instrument
influencing on how people perceive, interpret and convey information. It can be obtained by use of specific word choice
and labeling that describe event or problem from a particular perspective, e.g. positive or negative. In order to derive a frame, social sciences usually perform a manual qualitative analysis, but recently a computer-assist quantitative approaches commence to be an essential way of conducting framing analysis. This work provides a literature review on the existing frame derivation methods based on problem of word choice and labeling.
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...Anastasia Zhukova
Submitted to University of Konstanz
Date: February2019
Abstract: The term media bias denotes the differences of the news coverage about the same event. Slanted news coverage occurs when journalists frame the information favorably, ie, they report with different word choice about the same concept, thus leading to the readers’ distorted information perception. A word choice and labeling (WCL) analysis system was implemented to reveal biased language in news articles. In the area of Artificial Intelligence (AI), the WCL analysis system imitates well-established methodologies of content and framing analyses employed by the social sciences. The central thesis contribution is a development and implementation of the multistep merging approach (MSMA) that unlike state-of-the-art natural language preprocessing (NLP) techniques, eg, coreference resolution, identifies coreferential phrases of a broader sense, eg,“undocumented immigrants” and “illegal aliens.” An evaluation of the approach on the extended NewsWCL50 dataset was made achieving the performance of 𝐹1= 0.84, which is twice higher than a best performing baseline. Finally, to enable visual exploration of the identified entities, a four-visualization usability prototype was proposed and implemented, which enables exploring entity composition of the analyzed news articles and phrasing diversity of the identified entities.
More Related Content
Similar to Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles
The Caucasus School of Journalism and Media Management trains students in investigative journalism through hands-on skills-based programs. It discusses challenges in teaching journalism in the Caucasus region affected by conflicts. Topics discussed include the changing role of journalists due to technology, the definition of news, and crowdsourcing data. Initiatives at the school include the Friedman-GIPA investigative journalism prize and collaboration labs. It aims to support transparency, accuracy, and teaching reporting on sensitive issues impartially. The school also provides professional training and joins the European data journalism network to develop specialized courses focusing on data in journalism.
This research project analyzes how perceptions of immigration and immigrants have changed over time according to media portrayals. Newspapers were analyzed from 1985-2013 to determine how they frame discussions of immigrants and immigration policy. Articles from 24 papers discussing immigration were coded based on topics like economic impacts, origins of immigrants, and politics. Future directions include completing the coding, analyzing public opinion data, and exploring additional media like Spanish-language outlets to broaden perspectives.
Ramon van den Akker. Fairness of machine learning models an overview and prac...Lviv Startup Club
This document discusses fairness in machine learning models. It begins with motivating examples of algorithms that were found to be biased, such as a recidivism prediction tool that was biased against black individuals. It then covers operationalizing fairness through frameworks like transparency and explainability. Finally, it discusses approaches for achieving fairness by design, such as preprocessing the data, adding randomness to predictions, or tailoring new algorithms with fairness constraints. The author notes there are inherent tradeoffs between performance and fairness that require difficult choices.
From Telling Stories with Data to Telling Stories with Data Infrastructures: ...Liliana Bounegru
The document discusses reimagining data journalism through the lens of data infrastructures. It provides examples from digital methods research that investigate digital platforms and data creation. These include mapping right-wing groups in Europe using web analysis, examining counter-jihadist networks on Facebook, and analyzing climate change negotiations through transcripts and indicators. Examples from journalism that engage with data infrastructures include reverse engineering Netflix's film genres, mapping misinformation spread on Twitter, examining email targeting models, and making memory politics on social media visible. The document promotes accounting for socio-technical conditions of data and investigating how data infrastructures could be composed differently.
AI-generated news and misinformation during electionsPaige Morrow
This document discusses issues around AI-generated news and misinformation during elections. It notes that while AI is used for investigative journalism, automated reporting, content moderation and fact-checking, there are also human rights concerns around algorithmic bias, uncertainty in outputs, and the generation and spread of misinformation like deepfakes and bot accounts. Several studies cited find misinformation poses a major problem for democracy by eroding trust and enabling less informed decisions. The document explores challenges like reliance on ad revenue and social media gatekeeping, and reviews some regulatory responses in Germany, France and the UK to address these issues.
I gave this presentation at Deutsche Telekom AG's Digital Ethics Conference in Bonn on March 13 2019. It provides the background for how biases may occur in machine learning systems and what may go wrong if not corrected (or minimized).
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...ALexandruDaia1
Our primarly goal was to detect clusters via gensim libraries in news data consisting ofinformation regarding health and threats. We identified clusters for the periodscorresponding: i) Jannuary 2006 until the end of 2019, as December 2019 is considered thefirst month in which information about CORONVIRUS COVID-19 was made public; ii)between the 1st of Jannuary 2019 and 31st December 2019; and iii) between the 31st ofDecember 2019 and the 14th of April 2020. We conducted experiments using naturallanguage on open source intelligence data offered generously by brica.de, a providerspecialized in Business Risk Intelligence & Cyberthreat Awareness.
Each question should be done on a separate word document, with referwildmandelorse
Each question should be done on a separate word document, with references.
Question 1
Revelations about the collection of vast amounts of data on telephone and computer use by the National Security Agency (NSA) have raised concerns about the threat to privacy. At the same time, many private companies collect extensive information on computer users. Read Craig Mundie (2014) "
Privacy
(Links to an external site.)Links to an external site.
pragmatism: A
(Links to an external site.)Links to an external site.
focus on data use, not data collection (Links to an external site.)Links to an external site.
." (right-click to open in new window)Â
Foreign Affairs,
Mar.-Apr. 2014. Retrieved from Columbia College online library,
Global Issues in Context
database. Look for a copy in
Files
Also, take a look at this one page brief by the
National Review (Links to an external site.)Links to an external site.
on the unmasking of U.S. Citizens. Does this review give you a reason for heartburn over not just the unmasking, but then leaking to the press? Click on the National Review link or see in course materials Unit 4. For background on the legal requirements of NSA unmasking and the 702 warrants see Hirsch and Maxey (March 24, 2017).
What are the arguments for changing the way we think about privacy in the modern world?
What concerns do you have about the extent to which government and corporations store and use information collected from citizens to spy on those same citizens? What benefits do you see?
References
Hirsch, S. and Maxey, L., (March 24, 2017). "Unmasking of U.S. Citizens in NSA Intercepts," The Cipher Brief. Retrieved from
https://www.thecipherbrief.com/article/exclusive/north-america/unmasking-us-citizens-found-nsa-intercepts-1091 (Links to an external site.)
Question 2
Watch the Frontline episode “Top Secret America: 9/11 to the Boston Bombings” that recounts a history of American intelligence efforts since 9/11. The program touches on many highly controversial intelligence issues including the justification for the Iraq War, the use of rendition and torture to gather intelligence, the use of drone weapons, broad surveillance of telephone and computer communications, widespread use of license plate and facial recognition technology, and camera surveillance. There has been a substantial investment in intelligence capabilities to conduct the War on Terror and prevent terrorist attacks. Is this effort relevant today?
Please address the following questions with brief answers (about 150 to 200 words):
Write using third person perspective
Justify using authoritative sources (peer review journals, published sources, FBI, etc.)
What questions does “Top Secret America” raise about U.S. intelligence gathering?
Is the gathering of intelligence infringing on civil liberties? Justify your answer.
Is the intelligence effort making America safe? Justify your answer.
Are these efforts relevant today?
Video
Frontline (2013). ...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...g8briel
In light of new revelations about government warrantless wiretapping and electronic surveillance what role do librarians have in educating our patrons about digital privacy and security issues? Given that digital privacy is further complicated by for-profit Internet companies services, such as those provided by Facebook and Google, are our users savvy enough to understand threats to their information in this increasingly complex digital landscape? This presentation will explore issues related to current events and information security with an eye towards the implications for information literacy standards; brief examination of tools used to enhance information privacy; and discuss how librarians might play a role in helping users become more information aware.
What Actor-Network Theory (ANT) and digital methods can do for data journalis...Liliana Bounegru
Slides from a talk I gave at the University of Ghent on 21 October 2014 about how Actor-Network Theory (ANT) and digital methods can be used to study and inform data journalism.
The document is a study guide for the Human Rights Council that discusses two topics: the right to privacy in the digital age and addressing the increase in domestic violence. For topic A on the right to privacy, the summary provides background on worldwide surveillance programs like the Five Eyes alliance and how digital technology has impacted privacy. It outlines different bloc positions, with China and Russia expressing concerns about privacy violations and data collection, while the UK and US take different regulatory approaches. The timeline highlights key events in surveillance programs and social media privacy issues.
Attitudes Of Second Year Computer Science Undergraduates Toward PlagiarismSean Flores
This document summarizes a study that examined the attitudes of 85 second-year computer science students toward plagiarism at a university in the Caribbean. The study found that while 74% of students could define plagiarism, only 5% included self-plagiarism in their definition. Students had a moderate, positive attitude toward plagiarism, which is undesirable, and low to moderate subjective norms. There were statistically significant differences in positive attitudes and subjective norms between genders and ages. Many students believed self-plagiarism should not be punished. The author recommends expanding the plagiarism policy to include self-plagiarism and creating a systematic academic honesty education program tailored to gender and age.
1. Citizen journalism, where ordinary people play a role in news gathering and reporting, is growing through new technologies like mobile devices. Some predict that by 2021, half of all news will be produced through citizen journalism.
2. Citizen journalism provides both benefits like immediacy and multiple perspectives, as well as challenges around reliability and verifying information.
3. For citizen journalism to reach its potential, people need media literacy skills to be informed consumers and producers of news. Education has an important role in cultivating these skills.
Ethical Issues in Machine Learning Algorithms. (Part 3)Vladimir Kanchev
The presentation deals with ethical issues in a few currently widely used machine learning (or AI) technologies and algorithms. The ML applications are described in details, their current state of the art, their specific challenges and ethical problems. Current solutions from academic and industrial perspective are given. A mixture of academic and applied sources are used for the presentation - it aims to be more interesting for students and practitioners.
In your responses, review at least one of the articles provided by y.docxannettsparrow
In your responses, review at least one of the articles provided by your peer and expand on their description. Raise
the level of discussion by considering these things: What open-ended questions can
you
ask? How can you point classmates to sources that could be of interest or use to them?
Minimum of 75 words.
1.
Lorne
-Agrawal, Gans, and Goldfarb (2019) wrote my scholarly source and detailed the impact artificial intelligence has on prediction as it relates to the labor market. In their journal entry, the authors go into detail on how machine learning and AI are using "prediction" based on data to replace specific career fields such as demand forecasting and human resources (Agrawal, Gans, & Goldfarb, 2019). The authors do not seem to have any biases present. The article uses facts using data and is backed up by references. The article also is reliable as it is in a peer-reviewed journal. The strengths of the article show, the authors provide excellent and precise detail of their findings. One limitation of the article is the minimal use of graphs or charts that give visuals to certain parts of their data.
Florida (2019) wrote my popular source, and the bulk of the article summarizes a study done by the Brookings Institution, where a study conducted to see artificial intelligence's impact on high skilled jobs. The author of the article details critical takeaways from the study, although there is little information about the conduction of the research nor the methods used. The author's biases shine through somewhat, as the article is suggesting that AI will have a direct impact on highly skilled labor and asks open-ended questions leaving the reader to decide based on the item itself. There is no evidence of the article being peer-reviewed, thus questioning its reliability. The strength of the material is its appeal to readers by making information and data "bite-sized" and easy to follow. However, there are vital aspects the article is lacking, such as the methods used during the study.
2.
Melissa
-I chose to investigate two sources that deal with a lack of inclusive education for students with disabilities.
I used the Ashford University Library to find my scholarly source. The article “Missing the mark or scoring a goal? Achieving non-discrimination for students with disability in primary and secondary education in Australia: A scoping review” (2020), addresses a lack of inclusive education for students with disabilities, even though it is illegal to discriminate against a student because of their disability in Australia (Duncan et al., 2020). As this source is peer-reviewed and recently published, it has strong reliability and strength in the information presented. The review discusses 18 peer-reviewed published articles dealing with legislature and case law regarding the education of students with disabilities (in Australia) (Duncan et al., 2020). The authors site the limitation of not including recent articles etc. pro.
Presentation version of the paper "Data is our Future, Welcome to the Age of Infomagination" by Matt Sadler (full paper available here: http://infomagination.typepad.com/files/matt-sadler---infomagination-1.pdf)
Era of Sociology News Rumors News Detection using Machine Learningijtsrd
In this paper we have perform the political fact checking and fake news detection using various technologies such as Python libraries , Anaconda , and algorithm such as Naïve Bayes, we present an analytical study on the language of news media. To find linguistic features of untrustworthy text, we compare the language of real news with that of satire, hoaxes, and propaganda. We are also presenting a case study based on PolitiFact.com using their factuality judgments on a 6 point scale to prove the feasibility of automatic political fact checking. Experiments show that while media fact checking remains an open research issue, stylistic indications can help determine the veracity of the text. Chandni Jain | S. Vignesh ""Era of Sociology News Rumors News Detection using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23534.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23534/era-of-sociology-news-rumors-news-detection-using-machine-learning/chandni-jain
What's in the News? Towards Identification of Bias by Commission, Omission, a...Anastasia Zhukova
The document proposes a pipeline to identify media bias through the commission, omission, and source selection (COSS) of information. The pipeline would analyze a seed article against other event-related articles to identify original versus reused content. It would also analyze patterns of information flow across articles to explore how information is reused over time and how the framing may change. The goal is to leverage techniques in plagiarism detection and paraphrase identification to better understand source origins and how reused information altered in polarity.
This document provides an overview of a workshop on using data for science journalism. It discusses several approaches for incorporating data into stories, including: mapping controversies on issues like climate change; using data to tell stories in science and technology; and analyzing networks to reveal connections. Specific techniques are illustrated, such as mapping the influence of climate change skeptics online and connections between counter-jihadist groups on Facebook. The document also reviews several tools and resources for data journalism.
Similar to Talk: Automated Identification of Media Bias by Word Choice and Labeling in News Articles (20)
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Anastasia Zhukova
Submitted: 2017/03
Abstract: While following the news, one can notice the same story can have different impact depending on which news agent tells it. One reason for this is how the facts are framed. Framing is described by communication sciences as an instrument
influencing on how people perceive, interpret and convey information. It can be obtained by use of specific word choice
and labeling that describe event or problem from a particular perspective, e.g. positive or negative. In order to derive a frame, social sciences usually perform a manual qualitative analysis, but recently a computer-assist quantitative approaches commence to be an essential way of conducting framing analysis. This work provides a literature review on the existing frame derivation methods based on problem of word choice and labeling.
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...Anastasia Zhukova
Submitted to University of Konstanz
Date: February2019
Abstract: The term media bias denotes the differences of the news coverage about the same event. Slanted news coverage occurs when journalists frame the information favorably, ie, they report with different word choice about the same concept, thus leading to the readers’ distorted information perception. A word choice and labeling (WCL) analysis system was implemented to reveal biased language in news articles. In the area of Artificial Intelligence (AI), the WCL analysis system imitates well-established methodologies of content and framing analyses employed by the social sciences. The central thesis contribution is a development and implementation of the multistep merging approach (MSMA) that unlike state-of-the-art natural language preprocessing (NLP) techniques, eg, coreference resolution, identifies coreferential phrases of a broader sense, eg,“undocumented immigrants” and “illegal aliens.” An evaluation of the approach on the extended NewsWCL50 dataset was made achieving the performance of 𝐹1= 0.84, which is twice higher than a best performing baseline. Finally, to enable visual exploration of the identified entities, a four-visualization usability prototype was proposed and implemented, which enables exploring entity composition of the analyzed news articles and phrasing diversity of the identified entities.
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Anastasia Zhukova
The document describes research on developing an automated system to identify framing and media bias in news articles through analysis of word choice and labeling (WCL). It presents an approach that uses natural language processing to identify semantic concepts that may be targets of bias and then compares how those concepts are framed across multiple news articles reporting on the same event. The approach involves preprocessing text, identifying semantic concepts, analyzing framing of concepts, and measuring framing similarity. It also details a multi-step merging methodology to align candidate concepts across articles and evaluates the approach on an annotated corpus, finding it outperforms baselines at identifying concepts with various levels of complexity in word choice.
Putting News in a Perspective: Framing by Word Choice and LabelingAnastasia Zhukova
While following the news, one can notice the same story can
have different impact depending on which news agent tells
it. One reason for this is how the facts are framed. Framing is described by communication sciences as an instrument
influencing on how people perceive, interpret and convey information. It can be obtained by use of specific word choice
and labeling that describe event or problem from a particular perspective, e.g. positive or negative. In order to derive a frame, social sciences usually perform a manual qualitative analysis, but recently a computer-assist quantitative
approaches commence to be an essential way of conducting
framing analysis. This work provides a literature review on
the existing frame derivation methods based on problem of
word choice and labeling.
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Anastasia Zhukova
Authors: Anastasia Zhukova, Felix Hamborg, Bela Gipp
Publication date: 2020/08/05
Journal: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL)
Topic modeling is a technique used in a broad spectrum of use
cases, such as data exploration, summarization, and classification.
Despite being a crucial constituent of many use cases, established
topic models, such as LDA, often produce statistically valid yet
non-meaningful topics, i.e., that cannot easily be interpreted by
humans. In turn, the usability of topic modeling approaches, e.g.,
in document summarization, is non-optimal. We propose a topic
modeling approach that uses TCA, a method for also near-identity
cross-document coreference resolution. TCA showed promising
results when resolving mentions of not only persons and other
named entities, but also broad, vague, or abstract concepts. In a
preliminary evaluation on news articles, we compare the approach with state-of-the-art topic modeling. We find that (1) the
four baselines produce statistically valid yet hollow topics or topics that only refer to events in the dataset but not the events’ topical composition. (2) TCA is the only approach that extracts topics
that distinctively describe meaningful parts of the dataset.
PDF: https://dl.acm.org/doi/pdf/10.1145/3383583.3398564
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Anastasia Zhukova
Authors: Anastasia Zhukova, Felix Hamborg, Bela Gipp
Publication date: 2020/08/05
Journal: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL)
Abstract: Dataset exploration is a set of techniques crucial in many research
and data science projects. For textual datasets, commonly used
techniques include topic modeling, document summarization, and
methods related to dimension reduction. Despite their robustness,
these techniques suffer from at least one of the following drawbacks: document summarization does not explicitly set documents
in relation, the others yield summaries or topics that often are difficult to interpret and yield poor results for topics that consist of
context-dependent terms. We propose a method for dataset exploration that employs cross-document near-identity resolution of
mentions of semantic concepts, such as persons, other named entity types, events, actions. The method not only sets documents in
relation and thus allows for comparative dataset exploration, but
also yields well interpretable document representations. Additionally, due to the underlying approach for cross-document resolution of concept mentions, the method is able to set documents
in relation as to their near-identity terms, e.g., synonyms that are
not universally valid but only in the given dataset.
PDF: https://dl.acm.org/doi/pdf/10.1145/3383583.3398562
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Anastasia Zhukova
The document discusses comparing two datasets for cross-document coreference resolution (CDCR): ECB+ and NewsWCL50. ECB+ annotates event-centric coreference chains with strict identity relations, while NewsWCL50 annotates concept-centric chains with loose identity and bridging relations. The authors qualitatively and quantitatively compare the datasets' structure, annotation schemes, relations, and lexical diversity. They propose a new metric called Phrasing Diversity (PD) to better measure lexical variation in coreference chains. PD shows ECB+ has lower diversity while NewsWCL50 has higher diversity, indicating CDCR models need evaluation on both lexical disambiguation and diversity challenges.
Concept Identification of Directly and Indirectly Related Mentions Referring ...Anastasia Zhukova
Authors: Anastasia Zhukova, Felix Hamborg, Karsten Donnay, Bela Gipp
Publication date: 2021/02/26
Conference: Diversity, Divergence, Dialogue: 16th International Conference, iConference 2021, Beijing, China, March 17–31, 2021,
Proceedings, Part I 16, Pages
514-526, Publisher Springer International Publishing
Abstract: Unsupervised concept identification through clustering, i.e., identification of semantically related words and phrases, is a common approach to identify contextual primitives employed in various use cases, e.g., text dimension reduction, i.e., replace words with the concepts to reduce the vocabulary size, summarization, and named entity resolution. We demonstrate the first results of an unsupervised approach for the identification of groups of persons as actors extracted from a set of related articles. Specifically, the approach clusters mentions of groups of persons that act as non-named entity actors in the texts, e.g., “migrant families” “asylum-seekers.” Compared to our baseline, the approach keeps the mentions of the geopolitical entities separated, e.g., “Iran leaders” “European leaders,” and clusters (in)directly related mentions with diverse wording, e.g., “American officials” “Trump Administration.”
https://www.gipp.com/wp-content/papercite-data/pdf/zhukova2021.pdf
XCoref: Cross-document Coreference Resolution in the WildAnastasia Zhukova
Authors: Anastasia Zhukova, Felix Hamborg, Karsten Donnay, Bela Gipp
Publication date: 2022/2/28
Conference: International Conference on Information, Pages
272-291, Publisher Springer, Cham
Abstract: Datasets and methods for cross-document coreference resolution (CDCR)
focus on events or entities with strict coreference relations. They lack, however, annotating and resolving coreference mentions with more abstract or loose relations
that may occur when news articles report about controversial and polarized events.
Bridging and loose coreference relations trigger associations that may expose
news readers to bias by word choice and labeling. For example, coreferential
mentions of “direct talks between U.S. President Donald Trump and Kim” such as
“an extraordinary meeting following months of heated rhetoric” or “great chance
to solve a world problem” form a more positive perception of this event. A step
towards bringing awareness of bias by word choice and labeling is the reliable
resolution of coreferences with high lexical diversity. We propose an unsupervised
method named XCoref, which is a CDCR method that capably resolves not only
previously prevalent entities, such as persons, e.g., “Donald Trump,” but also
abstractly defined concepts, such as groups of persons, “caravan of immigrants,”
events and actions, e.g., “marching to the U.S. border.” In an extensive evaluation,
we compare the proposed XCoref to a state-of-the-art CDCR method and a previous method TCA that resolves such complex coreference relations and find that
XCoref outperforms these methods. Outperforming an established CDCR model
shows that the new CDCR models need to be evaluated on semantically complex
mentions with more loose coreference relations to indicate their applicability of
models to resolve mentions in the “wild” of political news articles.
PDF: https://www.gipp.com/wp-content/papercite-data/pdf/zhukova2022.pdf
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsAnastasia Zhukova
Authors: Anastasia Zhukova, Felix Hamborg, Bela Gipp
Publication date: 2021/9/30
In Proceedings of the 2nd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE 2021) co-located with JCDL 2021
Abstract: Named entity recognition (NER) is an important task that aims to resolve universal categories of named entities, e.g., persons, locations, organizations, and times. Despite its common and viable use in many use cases, NER is barely applicable in domains where general categories are suboptimal, such as engineering or medicine. To facilitate NER of domain-specific types, we propose ANEA, an automated (named) entity annotator to assist human annotators in creating domain-specific NER corpora for German text collections when given a set of domain-specific texts. In our evaluation, we find that ANEA automatically identifies terms that best represent the texts' content, identifies groups of coherent terms, and extracts and assigns descriptive labels to these groups, i.e., annotates text datasets into the domain (named) entities.
PDF: https://ceur-ws.org/Vol-3004/paper1.pdf
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...Advanced-Concepts-Team
Presentation in the Science Coffee of the Advanced Concepts Team of the European Space Agency on the 07.06.2024.
Speaker: Diego Blas (IFAE/ICREA)
Title: Gravitational wave detection with orbital motion of Moon and artificial
Abstract:
In this talk I will describe some recent ideas to find gravitational waves from supermassive black holes or of primordial origin by studying their secular effect on the orbital motion of the Moon or satellites that are laser ranged.
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
2. Short CV
2
2008 – 2014
Information technology, M.Eng.
Moscow Aviation Institute
2015 – 2019
Computer and Information Science, M. Sc.
University of Konstanz
2018
Graduate Student Researcher
Natural Language Processing group
National Institute of Informatics
2019 – present
Doctoral Researcher, Ph.D. Candidate
Data & Knowledge Engineering group
University of Wuppertal
5. Media Bias Model
5
Ideological
View
Target
Audience
Owners Advertisers
Business Interest
Funding
...
Political Interest
Reputation
...
Gathering
Writing
Editing
News
Reality
News
Event
Perception
Consumers
News Production and Consumption Process
Presentation Style
• Placement
• Size Allocation
• Picture Selection
• Picture Explanation
Writing Style
• Labeling
• Word choice
Fact Selection
• Event Selection
• Source Selection
• Commission
• Omission
Political
View
Consumer Context
• Background Knowledge
• Attitude
• Social Status
• Country
Spin
Government
Reasons
Process
Forms
an arrogant person
Word Choice (WC)
Labeling (L)
a genius
F. Hamborg, K. Donnay, and B. Gipp, “Automated Identification of Media Bias in News Articles: An Interdisciplinary Literature Review,” International Journal on Digital Libraries (IJDL), 2018
a smart person
6. WCL problem
6
Word choice & labeling…
• strongly impacts the
public perception of news topics
• disturbs decision making process
• leads to false information propagation
Hurricane Katrina, 2005
F. Hamborg, A. Zhukova, and B. Gipp, “Illegal Aliens or Undocumented Immigrants? Towards the Automated Identification of Bias by Word Choice and Labeling,” in Proceedings of the iConference 2019, 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
7. Social science background
7
Content analysis
“What to think about”
Frame analysis
“How to think about it”
Event-related articles
Putin
president
savior
tyrant
humble man
thief
Cross-document coreference resolution
president
savior
Putin
tyrant
humble man
thief
Sentiment analysis
president
savior
tyrant
humble man
thief
Putin
war
sanctions Crimea
army
Candidate extraction
Social
sciences
Computer
science
Identified actors, actions, events, concepts, etc. Concept polarization
Content analysis
Cross-document coreference resolution
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
A. Zhukova, F. Hamborg, and B. Gipp, “Interpretable and Comparative Textual Dataset Exploration Using Near-Identity Mention Relations,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2020
8. Research question
8
Given
- No training set
- A set of event-related articles
- Extracted candidate phrases of groups of persons
Goal
- Find phrases referring to the same concepts
- Use only phrases themselves, i.e., no context information
- Exploratory unsupervised task
illegal aliens
undocumented immigrants
Directly referring mentions
White House officials
American authorities
Indirectly referring mentions
How can an automated approach identify
instances of bias by word choice and labeling
in the concepts (in)directly referring to groups of people
in a set of English news articles reporting on the same event?these
instances?
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
9. Multi-step merging approach (MSMA) 1.0
9
Corefs. & NPs ↓ number of mentions
…
…
Extraction of a
specific
attribute
…
recursion
Pairwise
comparison &
merging
…
“Winner takes it all” strategy
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
10. Merge using similar heads
10
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
illegal aliens who were brought as children
nearly 800,000 illegal aliens
illegal aliens
young illegal aliens
headsets
{illegals} {immigrants} {aliens}
similar in the vector space
Entity 1 Entity 2 Entity 3
the word alone is related to
the UFO; it will be merged
later as “illegal alien” at the
third step
Merge entities
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
headsets
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
11. Merge using representative phrases
11
A1: young immigrants,
A2: illegal immigrants,
A3: young illegals
young immigrants,
undocumented immigrants,
illegal immigrants,
young illegals,
endangered immigrants,
additional illegals
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Entity 1 Entity 2
Merged entities
Representative
labeling
phrases
B1: young people,
B2: foreign people
young people,
foreign people,
bad people,
estimated people
Sim.matrix
A1
A2
A3
B1 B2
1
1
1
0
0
0
3
2×3
≥ 0.3 → similar in the vector space
young illegals
the illegals
illegals who arrived as children
DACA illegals
roughly 800,000 young undocumented immigrants
young immigrants
illegal immigrants
undocumented immigrants
endangered immigrants
additional illegals
this group of young people
nearly 800,000 people
a people
people who are American in every way except through birth
foreign people
bad people
people affected by the move
the estimated 800,000 people
these people
young people
Labeling
phrases
Representative
labeling
phrases
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A.nZhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
12. MSMA 1.0 evaluation
12
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
Init. Step 1 Step 2 Step 3 Step 4
F1 of concept types
Actor Country Misc Group
Core modifiers
Core meaning
Evaluation of the simplified version of NewsWCL50 annotation.
F. Hamborg, A. Zhukova, and B. Gipp, “Automated Identification of Media Bias by Word Choice and Labeling in News Articles,” in Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A. Zhukova “Automated Identification of Framing by Word Choice and Labeling to Reveal Media Bias in News” University of Konstanz, Germany, 2019
13. MSMA 1.0 Drawbacks
13
• Overparametrization
• Lack of stability
– small variation in wording affected results
• Few head modifiers used
– only adjectives
• Frequently falsely merged concepts
– American people – young immigrant people
– Chinese officials – American officials
• Low recall & low precision
– smaller related entities remain unmerged
– unrelated entities are merged
• “Winner takes it all” strategy is not optimal
Problems of MSMA 1.0 Goals of MSMA 2.0
• Self-controlled merging
• Default set of parameters for all datasets
• Stable performance in case of added phrases
• Use all head’s modifiers
• Keep concepts fine-grained
• Improve merging related smaller entities
Same challenge: unsupervised learning,
no training set
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
14. MSMA 2.0: Preprocessing
14
non-NE persons
NE (ORG) persons
core mentions
non-NE group
non-NE person
ORG person
generalizing mentions
specializing mentions
Republican establishment
GOP leaders,
Republicans
a red attorney general,
a Republican
Americans
U.S.
citizens
U.S. + citizens
2*U.S. + citizens
young
young + 2*U.S. + citizens
1. Concept’s sub-type prioritization 4. Weighting of the NE components
3. NE-grid: operation restriction or similarity amplification
immigrants
young + immigrants
GOP Republicans Republican United_States U.S. American Americans Spanish Mexico
GOP
Republicans
Republican
United_States
U.S.
American
Americans
Spanish
Mexico
5. Multiple similarity levels
- Head-similarity matrix SH
- Phrase-similarity matrix SP
- Core-phrase-similarity matrix SCP
- Ratio-matrix RM
2. More head modifiers
adjectival, noun, compound modifiers
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
young + citizens
15. MSMA 2.0: Pipeline
15
2. Forming cluster bodies
min 𝑠𝑖𝑚 𝑆𝑃
𝑚,𝑐𝑐 = 0.5
𝑐𝑐 ∈ 𝐶𝐶𝑖
𝑚
𝑪𝑩𝒊 ∩ 𝑪𝑩𝒋 →conflicts
𝑪𝑩𝒊 𝑪𝑩𝒋
𝑪𝑪𝒊 𝑪𝑪𝒋
1. Identification of cluster cores
border points? noise?
0.4
𝒎
𝑐𝑏
∀𝒄𝒃 ∋ 𝑪𝑩𝒊
𝑆𝑃𝑚,𝑐𝑏 = 3
∀𝒄𝒃 ∋ 𝑪𝑩𝒋
𝑆𝑃𝑚,𝑐𝑏 = 2
𝑪𝒊 𝑪𝒋
3. Adding border points
4. Forming non-core clusters
5. Merging final clusters
𝑪𝒊 𝑪𝒋
𝑐𝑚 ∈ 𝐶𝑀
𝑆𝑃𝐶𝑐𝑚𝑖,𝑐𝑚𝑗
≥ 0.4 and 𝑆𝐻𝑐𝑚𝑖,𝑐𝑚𝑗
≥ 0.4
𝑅𝑀𝑐𝑚𝑖,𝑐𝑚𝑗
≥ log5000|𝑀|
.7
.8
.8
.8
.8
.7
𝑐𝑚
0
𝑐𝑚
1
𝑐𝑚
3
𝑐𝑚
5
𝑐𝑚
6
𝑐𝑚0
𝑐𝑚1
𝑐𝑚3
𝑐𝑚5
𝑐𝑚6
∃𝑐𝑐 ∈ 𝐶𝐶𝑖: 𝑆𝑃
𝑚,𝑐𝑐 ≥ 0.5 and
normalized similarity to 𝐶𝐶𝑖
is larger than to 𝐶𝐶𝑗
min 𝑆𝑃𝑚,∀𝑐𝑏∋𝐶𝐵 ≤ 0.4 ≥ 2 and
normalized similarity to 𝐶𝐵𝑖
is larger than to 𝐶𝐵𝑗
.7
.8
.8
.8
.8
.7
𝑐0
𝑐1
𝑐3
𝑐5
𝑐6
• Use all modifiers
• On concept level
• TF-IDF-weighted concept-
similarity matrix
𝑐
0
𝑐
1
𝑐
3
𝑐
5
𝑐
6
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp,” Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons”, Manuscript submitted for publication, 2020
16. Evaluation and results
16
Democrats,
Democratic leaders
Illinois Democrat
American public,
American families,
U.S. citizens,
Poor unskilled American workers
Voice of Americans
Demonstrators,
DACA protesters,
Opposition
Administration officials,
USCIS employees,
Executive authority,
DHS officials,
Chief of White House,
Acting secretary
Mexican,
Spanish,
Mexican officials
GOP senators,
Republicans,
Republican leaders,
A group of red state attorneys
European ally,
The Europeans,
European leaders,
Western European Diplomats
Israeli officials,
Israeli Ambassador,
The Israelis
Russian agents,
Russian nationals,
The Russians
caravan participants,
asylum-seeking immigrant caravan,
members of the caravan,
more than a few hundred asylum seekers,
150 migrants, many of whom were children,
asylum-seekers,
the people that are waiting outside,
these large “caravans” of people,
unauthorized immigrants,
refugees,
people traveling without documents,
a caravan of hundreds of Central Americans,
a group of about 100 people,
Central American migrants and supporters
one of the chief critics of DACA,
opponents of the policy,
some immigration critics,
immigration hard-liners,
groups who support stricter immigration controls
Indirect mentions: ORG Indirect mentions: GPEs Direct mentions
F1 Direct Indirect
CoreNLP 27.9 31.4
Hier.Clust. 37.2 29.1
EECDCR 41.6 42.6
MSMA 1.0 44.7 40.9
MSMA 2.0
ELMo
42.1 40.1
MSMA 2.0
fastText
48.3 43.6
MSMA
2.0
word2vec
48.5 44.3
A. Zhukova, F. Hamborg, K. Donnay, B. Gipp “Towards a cross-document coreference resolution dataset with linguistically diverse and semantically complex concepts”, Manuscript submitted for publication, 2020
A. Zhukova, F. Hamborg, B. Gipp “XCoref: Cross-document Coreference Resolution in the Wild” Manuscript submitted for publication, 2020
17. Conclusion
17
• Bias by WCL has strong influence of the readers
• Revealing bias is a step towards mitigating it
• MSMA 1.0 & 2.0 successfully resolve biased mentions
Help social sciences with
frame analyses
Help news readers become
aware of bias in media
Newsalyze
news readers,
researchers
Help make the world a better place
Objectivity
Frame 2
Frame 1
https://github.com/fhamborg/newsalyze-backend
Soon to be publicly available