Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OSFair2017 Workshop | Data Analytics meets Social Sciences: New Frontiers of collaboration

182 views

Published on

Haris Papageorgiou talks about Data Analytics in Social Sciences

Workshop title:TDM unlocking a goldmine of information

Training overview:
Text and Data Mining (TDM) is a natural ‘next step’ in open science. It can lead to new and unexpected discoveries and increase the impact of publications and repositories. This workshop showcases examples of successful TDM and infrastructural solutions for researchers. We will also discuss what is needed to make most of infrastructures and how publishers and repositories can open up their content.

DAY 2 - PARALLEL SESSION 4 & 5

Published in: Science
  • Be the first to comment

  • Be the first to like this

OSFair2017 Workshop | Data Analytics meets Social Sciences: New Frontiers of collaboration

  1. 1. Data Analytics meets Social Sciences: New Frontiers of collaboration Haris Papageorgiou ATHENA RC
  2. 2. Computational Social Science  Computational social science means that computers are used to model, simulate, and analyze social phenomena. It refers to the academic sub-disciplines concerned with computational approaches to the social sciences.  Computational social science, revolutionizes both fundamental legs of the scientific method: empirical research, especially through big data, by analyzing the digital footprint left behind through social online activities; and scientific theory, especially through computer simulation model building through social simulation. Wikipedia
  3. 3. Text Analytics  The term text analytics describes a set of linguistic statistical, and machine learning techniques that model and structure the information content of textual sources. (Wikipedia)  In other words, enhancing the value of text content by extracting entities, features, context, relationships and emotion.
  4. 4. Goals  Predict the outcome of elections, referendum (BREXIT) etc.  Examine xenophobia during the economic and migration crisis  Correlate the political points of view and perspectives with the financial crisis  Predict conflict regions  Describe & Analyze the evolution and dynamics of social movements  Causal relationships between disasters and job offers  Correlate radicalization with verbal aggressiveness on religious-related events  Explain collective behaviour  Test and Validate hypotheses etc. ………….
  5. 5. The Big three Ingredients
  6. 6. Methodology Data Collection, Filtering Data Exploration - Visualization Data Analytics Data Modeling, Stats, Prediction Validation – Evaluation - Viz Interpretation
  7. 7. PALOMAR PALOMAR Architecture
  8. 8. Data User Requests Analytics operation workflow Processing Layer Kafka stream data transport… Crawler Server 1 OSN Data Server 2 Data Server n aggregator 2 2 2 5 3 3 3 4 Data 6 ElasticSearch Indexers Data Data Data Collection Tier Mongo DB sync Text process or NLP/Text pipelines Tok Sent Sentiment Lemmatiser Entity Linking Parser Tagger NERC Summari ser Tokeniser Lexica Storage HDFS Job scheduler Text Analytics Stack services3rd Party Distributed arch Big Data arch Data Access Tier … Web Server 1 Web Server 2 Web Server n
  9. 9. SoDaMap1, Xenophobia2 Data Collection, Filtering Data Exploration - Visualization Data Analytics Data Modeling, Stats, Prediction Validation – Evaluation - Viz Interpretation 1. The Sodamap Project: a longitudinal study on social movements, in collaboration with sociologist Theoni Stathopoulou at National Center for Social Research-NCSR. 2. The Xenophobia Project: examinining xenophobia during the economic and migration crisis in Greece, in collaboration with Professor of Political Science, Vassiliki Georgiadou, at the Department of Political Science and History of the Panteion University.
  10. 10. Data Collection: Overview Gathering & Web crawling Cleansing & Preparation Preprocessing Management & Storage Data Sources News articles Comments of the public Social Media (Twitter) Opinion Columns & Blogs Parliamentary debates Videos with politicians interviews and speeches Political Party manifestos retrieval Explorative Analysis Data Analytics
  11. 11. Data Collection • Νewspapers, ≈ 2.541.000 articles, time period 1996-2014 • ≈ 564.000.000 tweets, time period 2012-2016
  12. 12. Data Exploration QUERIES SEARCH ENGINE Data Collections
  13. 13. Data Analytics • Topics / Themes • Event Extraction Methodology • Verbal Aggressiveness /Opinion Mining
  14. 14. SCOPE • Code specific event types • Build computational tools for extracting event instances • Record results to a database • Visualize results • Analyze outcome • Correlate with impact KPIs
  15. 15. Claim vs Event  A claim is defined as a purposive unit of strategic or communicative action in the public sphere aiming at a collective stake and having the form of physical or verbal action.  In NLP, an event denotes a specific instance of an event type, e.g. a killing incident happening at a specific location, time and involving certain participants.
  16. 16. Claim  A valid claim comprises the following components:  The Form (How), namely the form of action through which the claim is introduced in the public sphere, or the realization of a protest event is located in a text.  The Actor (Who), or the claimant, meaning the entity that is making the claim.  The Addressee (To Whom), namely the entity to whom the action is addressed.  The Issue (What) of a Claim, i.e. what the protest is about, the subject matter of a Claim.  The Location (Where), viz where the event took place, the location of a claim in space.  The Time (When) at which the event happened, or location of a claim in time.
  17. 17. Event Extraction Workflow 1 2 3 Codebook design Tools design and development Event Database population Events Data Explorati on Codebo ok Data Analytics Results - Evaluati on
  18. 18. Several Event Categories 1 Physical Violence Events 2 Protest Events 3 Quotations - Statements
  19. 19. EVENT TAXONOMY Physical Attack • Assault • Assault against life • Violent Assault • Sexual Assault • Verbal Attack • Attack against Property • Attack against Personal Property • Attack against Religious Property Protest • Protest • Demonstration/March • Motorized March • Sit Down • Strike • Hunger Strike • Blockade • Occupation • Signature Collection • Boycott • Symbolic Violence • Revolt • Violent Demonstration
  20. 20. VARIOUS TYPES OF INFORMATION • Medium • Title • Date (Day, Month, Year) • Location (Category) • Event (Event type) • Confidence (Degree) • Issue
  21. 21. ACTOR & TARGET ATTRIBUTES • Target Group • Nationality • Category • Age • Sex • Status
  22. 22. Protest Events Codebook CLAIM=EVENT <Actor, Form, Addressee, Issue, Loc, Time> The Law Society of Piraeus has voted to occupy the Mortgage Registries of Piraeus and Salamis, on April 26th and 27th 2006, in protest against the serious operational problems it is facing.
  23. 23. Protest Events Codebook CLAIM=EVENT <Actor, Form, Addressee, Issue, Loc, Time> The Law Society of Piraeus has voted to occupy the Mortgage Registries of Piraeus and Salamis, on April 26th and 27th 2006, in protest against the serious operational problems it is facing.
  24. 24. Protest Events Codebook CLAIM=EVENT <Actor, Form, Addressee, Issue, Loc, Time> The Law Society of Piraeus has voted to occupy the Mortgage Registries of Piraeus and Salamis, on April 26th and 27th 2006, in protest against the serious operational problems it is facing.
  25. 25. Protest Events Codebook CLAIM=EVENT <Actor, Form, Addressee, Issue, Loc, Time> The Law Society of Piraeus has voted to occupy the Mortgage Registries of Piraeus and Salamis, on April 26th and 27th 2006, in protest against the serious operational problems it is facing.
  26. 26. Protest Events Codebook CLAIM=EVENT <Actor, Form, Addressee, Issue, Loc, Time> The Law Society of Piraeus has voted to occupy the Mortgage Registries of Piraeus and Salamis, on April 26th and 27th 2006, in protest against the serious operational problems it is facing.
  27. 27. Protest Events Codebook CLAIM=EVENT <Actor, Form, Addressee, Issue, Loc, Time> The Law Society of Piraeus has voted to occupy the Mortgage Registries of Piraeus and Salamis, on April 26th and 27th 2006, in protest against the serious operational problems it is facing.
  28. 28. Event Extraction Form Actor/ Addressee Issue Time/ Location Events Named Entity Recognition Chunking Dependency Parsing Co-reference Resolution Aggregation Analytics Stack ILSP-NLP IE workflow Summarize/ Export
  29. 29. Form – Event types • Labor related protests • Strike • Abstention from duties • ... • Demonstrative protests • March/Demonstration • Sit-down • ... • Confrontational protests • Hunger strike • Boycott • ... • Violent protests • Arson, bomb attacks • Symbolic violence • ...
  30. 30. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Loc, Time> Recorded coverage of the statement made in the House of Commons by home secretary Theresa May on the 1989 Hillsborough disaster, from Wednesday 27 April.
  31. 31. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Loc, Time> Recorded coverage of the statement made in the House of Commons by home secretary Theresa May on the 1989 Hillsborough disaster, from Wednesday 27 April.
  32. 32. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Loc, Time> Recorded coverage of the statement made in the House of Commons by home secretary Theresa May on the 1989 Hillsborough disaster, from Wednesday 27 April.
  33. 33. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Loc, Time> Recorded coverage of the statement made in the House of Commons by home secretary Theresa May on the 1989 Hillsborough disaster, from Wednesday 27 April.
  34. 34. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Loc, Time> Recorded coverage of the statement made in the House of Commons by home secretary Theresa May on the 1989 Hillsborough disaster, from Wednesday 27 April.
  35. 35. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Time, Loc> "Of course a rise in tension and terrorist activity cannot but heighten concern. It is further proof of how fragile the situation is in Syria and demonstrates the necessity to continue active steps towards resuming talks," Kremlin spokesman Dmitry Peskov told reporters yesterday in Moscow.
  36. 36. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Time, Loc> "Of course a rise in tension and terrorist activity cannot but heighten concern. It is further proof of how fragile the situation is in Syria and demonstrates the necessity to continue active steps towards resuming talks," Kremlin spokesman Dmitry Peskov told reporters yesterday in Moscow.
  37. 37. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Time, Loc> "Of course a rise in tension and terrorist activity cannot but heighten concern. It is further proof of how fragile the situation is in Syria and demonstrates the necessity to continue active steps towards resuming talks," Kremlin spokesman Dmitry Peskov told reporters yesterday in Moscow.
  38. 38. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Time, Loc> "Of course a rise in tension and terrorist activity cannot but heighten concern. It is further proof of how fragile the situation is in Syria and demonstrates the necessity to continue active steps towards resuming talks," Kremlin spokesman Dmitry Peskov told reporters yesterday in Moscow.
  39. 39. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Time, Loc> "Of course a rise in tension and terrorist activity cannot but heighten concern. It is further proof of how fragile the situation is in Syria and demonstrates the necessity to continue active steps towards resuming talks," Kremlin spokesman Dmitry Peskov told reporters yesterday in Moscow.
  40. 40. Coding Quotations QUOTATION=EVENT <Actor, Form, Addressee, Issue, Time, Loc> "Of course a rise in tension and terrorist activity cannot but heighten concern. It is further proof of how fragile the situation is in Syria and demonstrates the necessity to continue active steps towards resuming talks," Kremlin spokesman Dmitry Peskov told reporters yesterday in Moscow.
  41. 41. Outline Introduction PALOMAR Architecture Data Exploration - Visualization Data Analytics, Prediction, Stats Validation – Evaluation - Viz Infrastructure
  42. 42. 0 100 200 300 400 500 600 700 800 900 1000 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Event Types Timeline (Pr) Protest Demonstration/March Motorized March Sit Down Strike Hunger Strike Blockade Occupation Signature Collection Boycott Symbolic Violence Revolt Violent Demonstration Visualizations
  43. 43. Event Analysis The post-election lows seem to be related to what political scientists call the electoral cycle (Miller and Mackie, 1973), referring to the fact that a newly elected government usually enjoys a brief honeymoon period characterised by increased popular support due to its fresh mandate.
  44. 44. Verbal Aggressiveness rate per Target Group 0 0.01 0.02 0.03 0.04 0.05 0.06 TG1: Pakistani TG2: Albanians TG3: Romanians TG5: Islam/Muslims TG4: Syrians TG6: Jews TG7: Germans TG8: Roma TG9: Immigrants TG0: Refugees 2013 2014 2015 2016
  45. 45. Analysis: Examining Xenophobia in Greece Verbal Aggressiveness as an Indicator of Xenophobic Attitudes in Twitter The economic crisis encourages the development of defensive nationalism and the perception of vulnerability. Not significant increase in overall VAMs. In terms of formulation, continuity for certain TGs (Albanians) and resurgence of new TGs (Germans). Anti-Muslim attitudes on the rise. (Could Islamophobia evolve into an important component of Greek xenophobia?). The strength of xenophobic beliefs seems to be rooted in Greek political culture and is not triggered by the economic crisis. The effect of the refugee crisis on public beliefs is an open question.
  46. 46. Issues of concern
  47. 47. Data is just as biased as the People who collected it
  48. 48. Data Bias
  49. 49. Media Bias
  50. 50. Media Bias
  51. 51. ..our approach
  52. 52. Interpretability
  53. 53. Haris Papageorgiou ILSP/Athena R.C. @xpapag xaris@ilsp.gr

×