Solving?
Ian Morse
or just
measuring?
Data collection
• Copy and paste
• Excel – .txt – full – by day – by article
• 1065 articles
PROCESS
 Stats
 Today’s Zaman
 Gezi
tools
• Terms radio
• Scatterplot
• RezoViz
• Antconc (keyness, correlates, concordance,
clusters/N-grams)
• …all free
Keyness
Where to go from here
• More corpora
– Newspapers, other sections of TZ
• Other terms
– media, freedom, “Today’s Zaman”, indicators of
ranges of sources
• Tool tools tools
– Concordance plot, collocates, gephi
• Timeline
– Correlating with other variables in the journalistic
environment definition
• I’ve been doing topics, but tone is harder to do,
requires deeper analysis
Solution Oriented Press Freedoms

Solution Oriented Press Freedoms

  • 1.
  • 3.
    Data collection • Copyand paste • Excel – .txt – full – by day – by article • 1065 articles PROCESS  Stats  Today’s Zaman  Gezi
  • 4.
    tools • Terms radio •Scatterplot • RezoViz • Antconc (keyness, correlates, concordance, clusters/N-grams) • …all free
  • 5.
  • 13.
    Where to gofrom here • More corpora – Newspapers, other sections of TZ • Other terms – media, freedom, “Today’s Zaman”, indicators of ranges of sources • Tool tools tools – Concordance plot, collocates, gephi • Timeline – Correlating with other variables in the journalistic environment definition • I’ve been doing topics, but tone is harder to do, requires deeper analysis

Editor's Notes

  • #2 So started out that I was last semester abroad in turkey, that red country – not quite as bad as black, but the problems were visible, unmistakable, met with some journalist sot speak about their experiences and, including a press freedom advocate – I was surprised his job was solely to report to an international organization about what went on inside his country, besides publicizing the information, it was difficult to work towards solutions So when thinking about solutions, start form the problem. RWB creates this measure that’s pretty popular – they give questionnaires to experts in all countries, maybe 100 questions are in it, and form that they produce the value so it is comparable across countries, and it looks pretty Only one value from a massive questionnaire with several different parts – so while Freedom house does only slightly better (shorter, more general and subjective questionnaire) – they publish 3 sections of their score, legal, political, economic components – so where does mexico fit in where whole towns are controlled by something close to gangs that regularly threaten journalists with their lives, and often carry out the threat? They talk about finland and why its good – so why don’t they talk about the bad and why its bad and realize that countries with very similar scores suffer from very different diagnoses? So here are the reason people recognize finland has been the first in 11 out of the 13 years RWB has had their measure
  • #3 If you remain abstract in a project for too long, you get nowhere – so this is how I can make it concrete and testable so what really matters to those who care about press freedom is the resultant affect on the press itself and its affect on society – the quality of the journalism But quality is really difficult to measure, but the environment in which the press is created has a big affect on the press, and that is much more easily measured – this is what goes into the press freedom measures – but these values are not released for countries, only that last number And you’ll notice the first item in the components of a journalistic environment is that text analysis – no organization or even academic I have seen has researched the direct effects of changes in the environment on the publications themselves – that is the crux of what I am doing I have easily found other statistics to fill the other gaps, but the text analysis is crucial for the effects So what if we could actually see in the press itself how damaging press freedom violations were?
  • #4 So I pretty familiar with english-language news publications in turkey – we have direct access through lexisnexis to three important ones, one of which is TZ I chose an event around which I knew there was press freedom violations and that can be viewed as a watershed moment in terms of politics and the representation of them in newspapers, gezi park protests – database didn’t go back far enough, so I created my own database of the necessary articles – from the national section of the dates May 18 to July 15 – 59 days before and after the protests My vision was to through topic/sentiment/content analysis view multiple newspapers and multiple events and view how they change (relative to topic and tone from previous slide) Get a feel for tangible effects, changes in papers – how do they go about their journalistic exercise? How do they change it? How do they depict stories differently? How do they choose which stories to cover? First lesson – data collection, especially with text – will occupy the vast majority of your time – do not underestimate the length of time it takes to clean your data to be machine-readable But once you do get it machine-readable, it is of course extremely interesting and fun saw weird patterns: repeated, big jump then example of a reason why you should use more than one tool and measure – when I looked further into where it jumped and the keywords that jumped – I saw some weird results – found out in individual text files the data was repeating random parts of other files, help of paul miller, fixed it
  • #6 Two different types of keyness – chi-squared and logarithmic – I can only find a mathematical explanation for logarithmic, so I will only use that I expect to do a lot of work with keyness, so I will try to keep this short Basically keyness is the measure of how unusual a word is, given a “normal” text, or a reference corpus So I can make my text to compare against any period within my corpus, or the entire corpus, or any other corpus (there are some popular “standardized” corpora for specific topics like journalism, but they barely relate to what I want) so for now, I will only compare within more corpus, which is incredibly revealing – so you see here exactly how topics change. Not only the big topics are interesting – but further down you see things the include that aren’t big stories – so why did they choose them? Like Gülen, a controversial figure and owner of the newspaper
  • #7 Another example, the most interesting thing here is that compared to the full corpus, the use of the word “turkey” the first 10 days is strange
  • #8 So remember that point on TermsRadio where police and protesters started to be a lot more frequent? Well that doesn’t tell us much about how each was portrayed So…these are the antconcs for before the protests started On the right and left sides, this is how police appears before and after that point Click, and it will bring you to concordance, and you can see even further into each case Sometimes this is necessary because you would see as you go in that they all come from the same article
  • #9 First thing can see is a higher frequency of using police and also a wider distribution of the words that surround the word police So not only are police mentioned more – stories about police that simply say they launched or are conducting an investigation against a potentially dangerous person do not appear What does this mean about they way the newspaper is shifting their representation of the police?
  • #10 After the height of the protests, appearance of police in the corpus is spiking Discussion about the police departments return, but little difference in frequency with “intervention” and “violence” And individual “police officers” are much more prominent than the departments – so zooming in and you can clearly see that aggressive language surrounds “police officers” So you have the data in a new way – then you have a hypothesis – then you can check your hypothesis
  • #11 Or why do mention of the CHP mention the head of the party as well as the head of their rival the same amount of times? This is why I love DH – barclay’s words that you can ask yourself new questions that couldn’t previously have been asked
  • #13 So terms radio measures the frequency of words – so the natural thing to try to look for is why they spike or dip in certain areas – but there is also the option to search for words that spike or fall together, or spike when the other falls Police not necessarily associated with the protests, Now, whenever there is a public demonstration, the police are always nearby with full riot gear, regardless of the potential for violence (sit down protest for mentally disabled people locked up in an asylum – police vehicles and police with quasi-army uniforms on) Were the police always associated with protests like this? Did it actually start in this period? Is it siginificant that the mentioned of police and protesters continued even after the gezi park protests ended? And careful at the end – the dip at the end is because that was a Saturday when only two articles were published The dip just before however, is not on a Saturday – taksim dips down but not gezi park – does this mark a period when the protests begin to be referred to as the gezi park protests instead of the protests at taksim about gezi park? Now what I wish is if there was a tool to measure the correlation between frequencies – clearly protests, gezi, and taksim are spoken about together and thus fluctuate in frequency together – what others?
  • #14 More corpora – other newspapers Other sections of TZ – world section can be very revealing, further into august Other terms – media, freedom, “Today’s Zaman”, indicators of sources (because people who try to judge the quality of journalism also look at the range of sources used) Tool tools tools –  many other tools, even inside antconc there’s one called concordance plot that shows where terms appear in each text – because journalistic pieces typically follow an inverted pyramid structure that by nature identifies what is most important in an article – this can also be useful Timeline – Correlating with other variables in the journalistic environment definition – still a big part I’ve been doing topics, but tone is harder to do, requires deeper analysis, and looking at words like “according to” or negative words like “despite” Also a paper may be in the future, because nothing yet has been published with this kind of analysis that I have seen
  • #15 Hopefully this is where I want to end up The timeline is especially important, but as of yet, I have not found a tool that allows me to combine graphs and timeline data points The point of the txt analysis is to measure the real effects of press freedom on the thing advocacy groups are trying to protect So the degree to which the effects on the press are observable should also be considered