Data Extraction Tools
Universidade NOVA de Lisboa I NOVA FCSH
iNOVA Media Lab ˚ @CristianCJRuiz ˚
Cristian Jiménez Ruiz
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
1. SMART Goals.
2. API approach.
a. Technical (and ethical) debate.
3. How to extract and collect data.
a. Entry points.
b. How to use Netlytic.
c. How to use DMI: Netvizz.
d. How to use Socioviz
4. Tool’s Output.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Social Media
Methods
Internet as a source of
data, method, technique
(Rogers, 2013)
Internet-related research
Where to find data?
API Approach
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Technical (and Ethical) debate
Data we extract are human being!!!
Strong dependence on platform politics and policies
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Strong dependence on platform
environment, politics and policies.
2017/18 Platform Updates:
Current situations that affects the company, may
hurt collaterally digital method and platform studies
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Technical (and Ethical) debate
Beyond limitations and debate, some key points that shape your
research design:
1. Extraction and analysis software
2. What are the entry points used to collect social media
grammars?
3. What entry points cannot be captured anymore?
4. What grammars (digital objects) can be part of my
study?
5. How far back in time can data be retrieved?
6. What are the standard output files? (Omena, JJ 2018)
How to extract and collect
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Entry Points
Native digital objects (Rogers, 2013):
Hashtags, keywords, ID’s, locations, likes, links, etc.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic
1. Create/open your account
2. Be aware of your type account
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Twitter
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Facebook
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Instagram
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Youtube
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netlytic: Datasets
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netvizz
2017 2018
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Netvizz
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Socioviz
1. Create/open your account
2. Connect your Twitter account
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Socioviz
This tool retrieves per
keyword, not per digital
object separately (Hashtag,
account, mentions, etc)
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Output .gdf; .tab; .cvs; etc.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
What to do with the output?
Go to a software analysis and introduce your files!
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.
References
Rieder B. (2013). Studying Facebook via data extraction: the Netvizz application. In WebSci '13 Proceedings of the
5th Annual ACM Web Science Conference (pp. 346-355). New York: ACM.
Omena JJ. (2018) The Grammars of Social Media: Thinking platform data under the modes of technicity. Digital
Media Winter Institute 2018 Smart Data Sprint: Interpreters of Platform Data, Jan. 29 - Feb.2, Universidade Nova de
Lisboa, Lisbon, Portugal.
https://pt.slideshare.net/jannajoceli/the-grammars-of-social-media-thinking-platform-data-under-the-modes-of-
technicity
Gruzd, A. (2016). Netlytic: Software for Automated Text and Social Network Analysis. Available at http://Netlytic.org
Rogers, R. (2013). Digital methods. MIT press.
Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon,
Portugal.

Data extraction tools (2019 Version)

  • 1.
    Data Extraction Tools UniversidadeNOVA de Lisboa I NOVA FCSH iNOVA Media Lab ˚ @CristianCJRuiz ˚ Cristian Jiménez Ruiz Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal.
  • 2.
    1. SMART Goals. 2.API approach. a. Technical (and ethical) debate. 3. How to extract and collect data. a. Entry points. b. How to use Netlytic. c. How to use DMI: Netvizz. d. How to use Socioviz 4. Tool’s Output. Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal.
  • 3.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Social Media Methods Internet as a source of data, method, technique (Rogers, 2013) Internet-related research
  • 4.
    Where to finddata? API Approach Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal.
  • 5.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal.
  • 6.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Technical (and Ethical) debate Data we extract are human being!!! Strong dependence on platform politics and policies
  • 7.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Strong dependence on platform environment, politics and policies.
  • 8.
    2017/18 Platform Updates: Currentsituations that affects the company, may hurt collaterally digital method and platform studies
  • 9.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Technical (and Ethical) debate Beyond limitations and debate, some key points that shape your research design: 1. Extraction and analysis software 2. What are the entry points used to collect social media grammars? 3. What entry points cannot be captured anymore? 4. What grammars (digital objects) can be part of my study? 5. How far back in time can data be retrieved? 6. What are the standard output files? (Omena, JJ 2018)
  • 10.
    How to extractand collect Digital Media Winter Institute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal.
  • 11.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Entry Points Native digital objects (Rogers, 2013): Hashtags, keywords, ID’s, locations, likes, links, etc.
  • 12.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Netlytic 1. Create/open your account 2. Be aware of your type account
  • 13.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Netlytic: Twitter
  • 14.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Netlytic: Facebook
  • 15.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Netlytic: Instagram
  • 16.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Netlytic: Youtube
  • 17.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Netlytic: Datasets
  • 18.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Netvizz 2017 2018
  • 19.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Netvizz
  • 20.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Socioviz 1. Create/open your account 2. Connect your Twitter account
  • 21.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Socioviz This tool retrieves per keyword, not per digital object separately (Hashtag, account, mentions, etc)
  • 22.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. Output .gdf; .tab; .cvs; etc.
  • 23.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. What to do with the output? Go to a software analysis and introduce your files!
  • 24.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal.
  • 25.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal. References Rieder B. (2013). Studying Facebook via data extraction: the Netvizz application. In WebSci '13 Proceedings of the 5th Annual ACM Web Science Conference (pp. 346-355). New York: ACM. Omena JJ. (2018) The Grammars of Social Media: Thinking platform data under the modes of technicity. Digital Media Winter Institute 2018 Smart Data Sprint: Interpreters of Platform Data, Jan. 29 - Feb.2, Universidade Nova de Lisboa, Lisbon, Portugal. https://pt.slideshare.net/jannajoceli/the-grammars-of-social-media-thinking-platform-data-under-the-modes-of- technicity Gruzd, A. (2016). Netlytic: Software for Automated Text and Social Network Analysis. Available at http://Netlytic.org Rogers, R. (2013). Digital methods. MIT press.
  • 26.
    Digital Media WinterInstitute 2019 I SMART Data Sprint: Beyond Visible Engagement I Lisbon, Portugal.