DIGITAL METHODS
FOR SOCIAL SCIENCES
Marta Severo – Université de Lille 3, Laboratoire Gériico
marta.severo@univ-lille3.fr
...
PROGRAMME
1 Day : Digital methods: definitions & objects
2 Day : Scientometrics and network analysis
3 Day : Web mapping
4...
KEY QUESTIONS
1.  Why do we analyze the data on the web?
2.  Which data can we find?
3.  Which objects can we study?
4.  H...
WHY DO WE ANALYZE
THE DATA ON THE WEB?
DIGITAL METHODS
A series of methods that share
the fact of being based on the
digital traces as a source of
information fo...
THE WEB AS AN OBJECT OF STUDY
Photo credit – Brandon Doran via Flickr - ©
THE WEB AS A SOURCE OF INFORMATION
Chris Harrison, 2007
Internet map (World City-to-City Connections)
THE RISE OF DIGITAL METHODS
Virtual reality
Late ‘80-early ‘90 (Barlow, Turkle, Negroponte, Rheingold)
Virtual society?
19...
DIGITAL METHODS
 The issue no longer is how much of society and
culture is online, but rather how to detect
cultural chan...
PORTRAIT GOOGLE OF MARK L***
http://www.le-tigre.net/Marc-L.html
FACEBOOK FRIENDS NETWORK
Paul Butler, 2010, Visualizing Friendships
RICH DATA AOL user 711391 search history
www.minimovies.org/documentaires/view/ilovealaska
By Lernert Engelberts and Sande...
GOOGLE FLU TRENDS
LARGE POPULATIONS AND RICH DATA
Google Flu www.google.org/flutrends
Ginsberg J. et al.,
« Detecting influenza
epidemics using search
engine query data »,
Nature 457, 1012-1014
(19 February 2...
http://www.google.org/flutrends/intl/pt_br/br/#BR
EPIDEMIOLOGY OF DISEASES
Dengue Trends www.google.org/denguetrends
EPIDEMIOLOGY OF RECIPES
 Thanksgiving Trends
http://www.nytimes.com/interactive/2009/11/26/
us/20091126-search-graphic.ht...
GOOGLE INSIGHT FOR SEARCH AND
GOOGLE CORRELATE google.com/trends
google.com/trends/correlate/
(BE CAREFUL THOUGH)
  Askitas, N., & Zimmermann, K. (2011). Health and Well-Being
in the Crisis. IZA Discussion Paper
(BE CAREFUL THOUGH)
(BE CAREFUL THOUGH)
  http://googlesystem.blogspot.fr/2008/08/google-
suggest-enabled-by-default.html
WHICH DATA CAN WE FIND?
APOCALYPSE 2012
HTTP://WWW.YOUTUBE.COM/WATCH?V=QZBDSYWQNMC
 Documentaire maya sui dati….
INTERNET IN NUMBERS
 677 million active Web sites in the world
in 2012
 In 2008, Google stated that their robots
had cra...
Invisible web
Social
networks
Google
Open
data
Data on the web
INVISIBLE WEB
SURFACE WEB VS INVISIBLE WEB
 The surface Web is made up of all the
pages indexed by various search engines
  The invisi...
WHY INVISIBLE
  Documents, web pages and websites or
databases too large to be fully indexed. (ie.
Internet Movie Databas...
DATABASE
 These resources are changing. Few
years ago they could accessed with fee
 Today, more and more quality
informa...
INTERNET ARCHIVE
 Internet archive (http://archive.org/index.php)
is a digital library designed to preserve all
digital d...
GOOGLE NEWS ARCHIVE
 https://news.google.com/news/
advanced_news_search?as_drrb=a Google
News Archive, which allows you t...
DIRECTORIES
 Selective directories identify professional
Internet resources selected on qualitative
criteria ...
 Sites ...
PORTALS
 Sites combining many resources
(articles, forums, news...) that can be
organized around a theme (vertical
portal...
NEWS
 Search engine news services, push
news, custom press releases, press
releases, news portals, daily newspaper
sites ...
NEWS ARCHIVE
 With fee:
  Factiva
  Europresse
 Without fee:
  http://voxaleadnews.labs.exalead.com/
  http://emm.ne...
NEWS
 http://emm.newsexplorer.eu/
NEWS (HTTPS://NEWS.GOOGLE.COM/ )
https://news.google.com/
OPEN DATA
 Open data is the idea that certain data
should be freely available to everyone to
use and republish as they wi...
SOME EXAMPLES
 International organisations
 Countries
 Cities
 General catalogues
OECD (HTTP://STATS.OECD.ORG/)
FAO (HTTP://FAOSTAT3.FAO.ORG/)
WHO (
HTTP://WHO.INT/RESEARCH/FR/INDEX.HTML )
WORLD BANK (HTTP://DATA.WORLDBANK.ORG/ )
NATIONAL DATABASE
REGIONAL DATABASE (
HTTP://WWW.GOVERNOABERTO.SP.GOV.BR/)
BRAZILIAN FEDERAL SENATE
PUBLIC OR PRIVATE ENTERPRISES
 Ratp (public transport in Paris)
http://data.ratp.fr/
 SNCF (train in France)
http://www....
HTTP://INFOAMAZONIA.ORG/
  Gustavo Faleiros, Coordenador do Projeto, Knight
International Fellow – O Eco
CATALOGUES
 http://www.data-publica.com/
 http://dashboard.opengovernmentdata.org/
  http://datacatalogs.org/
 http://...
HANS ROSLING MOSTRA AS MELHORES
ESTATÍSTICAS QUE VOCÊ JÁ VIU
http://www.ted.com/talks/lang/pt-br/
hans_rosling_shows_the_b...
HTTP://WWW.GAPMINDER.ORG/
DU WEB 1.0 AU WEB 3.0
WEB 2.0
 The classics : newsletters, newsgroups and
forums
 Facebook
 Professional networks
 Twitter
 Wikipedia
 Blo...
TYPE OF DATA
 Texts
 Images
 Video
 Audio
 …
WHICH OBJECTS CAN WE
STUDY?
DATA
Web pages
Documents
Blogs
Forums
Tweets
Facebook
Google search
Wiki …
OBJECTS
Actors
Connections
Events
Products
Sent...
1. ACTORS
 Discourse mapping on the web :
From a corpus of web documents we can
trace the connections between the
authors...
2. CONNECTIONS - NETWORKS
 Web mapping : is based on the idea that
hyperlinks created on the web can be
used as a proxy f...
EXEMPLES
M. Severo, T. Venturini, "Intangible Cultural Heritage Webs Comparing national
networks with digital methods", in...
RÉSEAU FRANÇAIS
MEME STUDY
 A memetracker is a tool for studying the
migration of memes across a group of
people
 A meme is « an idea, b...
  Leskovec, J., L. Backstrom, and J. Kleinberg. 2009. Meme-tracking
and the Dynamics of the News Cycle. In Proceedings of...
3. EVENT STUDY
 Just-in-time identification
of international media events:
the case of the Wukan’s protests
http://jitso....
HOW TO BUILD A WEB-BASED CORPUS?
 First step: identify the goals of your
research (which objects?)
 Second step: identify the data sources
(which data?)
...
FROM THE GOAL TO THE CORPUS
 The analyst is one dimension of the
analysis. We must clarify his position to give
keys to w...
PROBLEMS OF WEB DATA
 The corpus is exhaustive ? (ex. Factiva)
 The corpus is homogeneous ?
 The corpus is representati...
3 DEGREES OF FREEDOM OF TREATMENT
 Difference between the data and the
actual objects
 Difference between the data and t...
TOOLS FOR ANALYSING THE CORPUS
 There is some interpretation from the
cleaning of the corpus
 We can transform data in t...
THE CHOICE OF CORPUS IS NOT NEUTRAL
 In the source of the corpus
 In the format of documents
 In the search query
 We ...
DISCUSSION
 Your questions?
 Your objects?
 Your data?
Digital methods for Social Sciences: origin and definitions
Digital methods for Social Sciences: origin and definitions
Digital methods for Social Sciences: origin and definitions
Digital methods for Social Sciences: origin and definitions
Upcoming SlideShare
Loading in …5
×

Digital methods for Social Sciences: origin and definitions

699 views

Published on

First day lesson of the course "Epistemologias Reticulares" at the University of Sao Paulo, Escola de comunicaçoes et artes (ECA/USP)

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
699
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Digital methods for Social Sciences: origin and definitions

  1. 1. DIGITAL METHODS FOR SOCIAL SCIENCES Marta Severo – Université de Lille 3, Laboratoire Gériico marta.severo@univ-lille3.fr 9 August 2013, University of Sao Paulo, Escola de comunicaçoes et artes (ECA/USP)
  2. 2. PROGRAMME 1 Day : Digital methods: definitions & objects 2 Day : Scientometrics and network analysis 3 Day : Web mapping 4 Day : Collecting and analysing data
  3. 3. KEY QUESTIONS 1.  Why do we analyze the data on the web? 2.  Which data can we find? 3.  Which objects can we study? 4.  How to build a web-based corpus?
  4. 4. WHY DO WE ANALYZE THE DATA ON THE WEB?
  5. 5. DIGITAL METHODS A series of methods that share the fact of being based on the digital traces as a source of information for studying social phenomena. R. Rogers, « Internet Research: The Question of Method », Journal of Information Technology and Politics, 7, 2/3, 2010, 241-260
  6. 6. THE WEB AS AN OBJECT OF STUDY Photo credit – Brandon Doran via Flickr - ©
  7. 7. THE WEB AS A SOURCE OF INFORMATION Chris Harrison, 2007 Internet map (World City-to-City Connections)
  8. 8. THE RISE OF DIGITAL METHODS Virtual reality Late ‘80-early ‘90 (Barlow, Turkle, Negroponte, Rheingold) Virtual society? 1997-2002 (Steve Woolgar et al.) Cultural analytics 2007 (Lev Manovich) Digital methods 2009 (Richard Rogers)
  9. 9. DIGITAL METHODS  The issue no longer is how much of society and culture is online, but rather how to detect cultural change and societal conditions with the Internet.  The conceptual point of departure for the research programme is the recognition that the Internet is not only an object of study, but also a source. R. Rogers, "Internet Research: The Question of Method," Journal of Information Technology and Politics, 7, 2/3, 2010, 241-260
  10. 10. PORTRAIT GOOGLE OF MARK L*** http://www.le-tigre.net/Marc-L.html
  11. 11. FACEBOOK FRIENDS NETWORK Paul Butler, 2010, Visualizing Friendships
  12. 12. RICH DATA AOL user 711391 search history www.minimovies.org/documentaires/view/ilovealaska By Lernert Engelberts and Sander Plug
  13. 13. GOOGLE FLU TRENDS
  14. 14. LARGE POPULATIONS AND RICH DATA Google Flu www.google.org/flutrends
  15. 15. Ginsberg J. et al., « Detecting influenza epidemics using search engine query data », Nature 457, 1012-1014 (19 February 2009)
  16. 16. http://www.google.org/flutrends/intl/pt_br/br/#BR
  17. 17. EPIDEMIOLOGY OF DISEASES Dengue Trends www.google.org/denguetrends
  18. 18. EPIDEMIOLOGY OF RECIPES  Thanksgiving Trends http://www.nytimes.com/interactive/2009/11/26/ us/20091126-search-graphic.html
  19. 19. GOOGLE INSIGHT FOR SEARCH AND GOOGLE CORRELATE google.com/trends google.com/trends/correlate/
  20. 20. (BE CAREFUL THOUGH)   Askitas, N., & Zimmermann, K. (2011). Health and Well-Being in the Crisis. IZA Discussion Paper
  21. 21. (BE CAREFUL THOUGH)
  22. 22. (BE CAREFUL THOUGH)   http://googlesystem.blogspot.fr/2008/08/google- suggest-enabled-by-default.html
  23. 23. WHICH DATA CAN WE FIND?
  24. 24. APOCALYPSE 2012 HTTP://WWW.YOUTUBE.COM/WATCH?V=QZBDSYWQNMC  Documentaire maya sui dati….
  25. 25. INTERNET IN NUMBERS  677 million active Web sites in the world in 2012  In 2008, Google stated that their robots had crawled 1 trillion of url  In 2002, we count 25 billion documents, 7.5 million new pages per day, 150 terabytes of information, 690 billion pages in intranet sites.
  26. 26. Invisible web Social networks Google Open data Data on the web
  27. 27. INVISIBLE WEB
  28. 28. SURFACE WEB VS INVISIBLE WEB  The surface Web is made up of all the pages indexed by various search engines   The invisible web or deep consists of non-indexed pages. It is hidden part of the web. Few people know of its existence and yet it is a huge source of information
  29. 29. WHY INVISIBLE   Documents, web pages and websites or databases too large to be fully indexed. (ie. Internet Movie Database www.imdb.fr)  Pages protected by copyright (meta tag which stops the robot).  Dynamically generated pages  Pages protected with login and password  Formats not read by search engines (Flash)
  30. 30. DATABASE  These resources are changing. Few years ago they could accessed with fee  Today, more and more quality information, particularly through the databases, become free.  High profile databases such as Lexis Nexis, Dialog Datastar, Factiva, STN International, Questel .... are only just over 1% of the Deep Web.
  31. 31. INTERNET ARCHIVE  Internet archive (http://archive.org/index.php) is a digital library designed to preserve all digital documents of the internet in order to preserve them from a complete disappearance. The IA provides documents since 1996 (10 billion web pages but also usenet, movies and ancestor ARPANET).  The Internet Wayback Machine (developed especially with Alexa Internet) allows the user to find archived websites by simply typing its URL and the desired date)
  32. 32. GOOGLE NEWS ARCHIVE  https://news.google.com/news/ advanced_news_search?as_drrb=a Google News Archive, which allows you to search among the archives of the News ..... 200 years!  You can easily search through keywords within news from free and paid sources. In fact, Google has agreements with prestigious news sources such as Time, the Wall Street Journal, New York Times, the BBC, the Guardian or The Washington Post
  33. 33. DIRECTORIES  Selective directories identify professional Internet resources selected on qualitative criteria ...  Sites are selected by information professionals to cover the areas of university research and the overall education  Exemple : http://aip.completeplanet.com/  http://www.ipl.org/ (performed by librarians)
  34. 34. PORTALS  Sites combining many resources (articles, forums, news...) that can be organized around a theme (vertical portals).  http://www.enfin.com/
  35. 35. NEWS  Search engine news services, push news, custom press releases, press releases, news portals, daily newspaper sites or business newspapers, directories of national and international media, press archives ...
  36. 36. NEWS ARCHIVE  With fee:   Factiva   Europresse  Without fee:   http://voxaleadnews.labs.exalead.com/   http://emm.newsexplorer.eu/   Pickanews www.pickanews.com
  37. 37. NEWS  http://emm.newsexplorer.eu/
  38. 38. NEWS (HTTPS://NEWS.GOOGLE.COM/ ) https://news.google.com/
  39. 39. OPEN DATA  Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.  Philosophy: The data collected for the public good must return to public.
  40. 40. SOME EXAMPLES  International organisations  Countries  Cities  General catalogues
  41. 41. OECD (HTTP://STATS.OECD.ORG/)
  42. 42. FAO (HTTP://FAOSTAT3.FAO.ORG/)
  43. 43. WHO ( HTTP://WHO.INT/RESEARCH/FR/INDEX.HTML )
  44. 44. WORLD BANK (HTTP://DATA.WORLDBANK.ORG/ )
  45. 45. NATIONAL DATABASE
  46. 46. REGIONAL DATABASE ( HTTP://WWW.GOVERNOABERTO.SP.GOV.BR/)
  47. 47. BRAZILIAN FEDERAL SENATE
  48. 48. PUBLIC OR PRIVATE ENTERPRISES  Ratp (public transport in Paris) http://data.ratp.fr/  SNCF (train in France) http://www.data.sncf.com/  Eau France (water in France) http://www.services.eaufrance.fr/
  49. 49. HTTP://INFOAMAZONIA.ORG/   Gustavo Faleiros, Coordenador do Projeto, Knight International Fellow – O Eco
  50. 50. CATALOGUES  http://www.data-publica.com/  http://dashboard.opengovernmentdata.org/   http://datacatalogs.org/  http://www.quora.com/Data/What-is-the-most- comprehensive-list-of-international-open- government-datasets  http://www.data.gov/opendatasites  http://www.gapminder.org/  http://www.statista.com (registration)
  51. 51. HANS ROSLING MOSTRA AS MELHORES ESTATÍSTICAS QUE VOCÊ JÁ VIU http://www.ted.com/talks/lang/pt-br/ hans_rosling_shows_the_best_stats_you_ve_ever_seen.html
  52. 52. HTTP://WWW.GAPMINDER.ORG/
  53. 53. DU WEB 1.0 AU WEB 3.0
  54. 54. WEB 2.0  The classics : newsletters, newsgroups and forums  Facebook  Professional networks  Twitter  Wikipedia  Blogosphere  ….
  55. 55. TYPE OF DATA  Texts  Images  Video  Audio  …
  56. 56. WHICH OBJECTS CAN WE STUDY?
  57. 57. DATA Web pages Documents Blogs Forums Tweets Facebook Google search Wiki … OBJECTS Actors Connections Events Products Sentiments …
  58. 58. 1. ACTORS  Discourse mapping on the web : From a corpus of web documents we can trace the connections between the authors of the documents through the analysis of occurrences or co- occurrences of keywords  2 examples:   Media representations of the Mediterranean solar plan (http://www.martasevero.com/?p=154 )   Egyptians abroad ( http://www.e-diasporas.fr/working-papers/ Severo&Zuolo-Egyptian-EN.pdf )
  59. 59. 2. CONNECTIONS - NETWORKS  Web mapping : is based on the idea that hyperlinks created on the web can be used as a proxy for social ties. (analysis of the graph of the network created by hyperlinks on a set of web pages)
  60. 60. EXEMPLES M. Severo, T. Venturini, "Intangible Cultural Heritage Webs Comparing national networks with digital methods", in New Media & Society, forthcoming (pre-print http://goo.gl/FPpTx)
  61. 61. RÉSEAU FRANÇAIS
  62. 62. MEME STUDY  A memetracker is a tool for studying the migration of memes across a group of people  A meme is « an idea, behavior, or style that spreads from person to person within a culture »  MemeTracker.org (Jure Leskovec), a tool celebrated for allowing a new way of examining media through watching how quotes spread through professional and citizen media -
  63. 63.   Leskovec, J., L. Backstrom, and J. Kleinberg. 2009. Meme-tracking and the Dynamics of the News Cycle. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 497–506.
  64. 64. 3. EVENT STUDY  Just-in-time identification of international media events: the case of the Wukan’s protests http://jitso.org/2012/12/02/the-wukans- protests-just-in-time-identification-of- international-media-events-revised/
  65. 65. HOW TO BUILD A WEB-BASED CORPUS?
  66. 66.  First step: identify the goals of your research (which objects?)  Second step: identify the data sources (which data?)  Third step: define the exploratory method for collecting and analysing data
  67. 67. FROM THE GOAL TO THE CORPUS  The analyst is one dimension of the analysis. We must clarify his position to give keys to who will read the analysis (context)  It should make explicit the criteria   Cover a maximum of Items = open issue   Production of corpus as the research goes on = issue focused on a specific point  Are there standards for the preparation of the corpus, defined research hypotheses (which are gradually refined)?
  68. 68. PROBLEMS OF WEB DATA  The corpus is exhaustive ? (ex. Factiva)  The corpus is homogeneous ?  The corpus is representative ? Corpus Analysis
  69. 69. 3 DEGREES OF FREEDOM OF TREATMENT  Difference between the data and the actual objects  Difference between the data and the analysed data  Difference between the research output and the interpretation of the researcher or reader
  70. 70. TOOLS FOR ANALYSING THE CORPUS  There is some interpretation from the cleaning of the corpus  We can transform data in the different phases of the analysis  Visualisation can modify the data
  71. 71. THE CHOICE OF CORPUS IS NOT NEUTRAL  In the source of the corpus  In the format of documents  In the search query  We have a clear idea of the corpus to extract
  72. 72. DISCUSSION  Your questions?  Your objects?  Your data?

×