Good morning!

Enjoy your coffee and install
Putty and NotepadPlus via "Software Maintance/Application
Catalgue". And the P...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Hands-on-Workshop
Big (Twi...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

The next one and a half da...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

In this session (1/4):
1 B...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What’s...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What a...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What a...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What a...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What a...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What a...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

It’s o...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

But wh...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

But wh...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Some exampl...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent ma...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent ma...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent ma...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d be...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d be...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d be...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d be...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

D...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

It’s just o...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent ba...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent ba...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent ba...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d be...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d be...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d be...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d be...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

D...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

D...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Frame adopt...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Frame adopt...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

A scheme

The process: col...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

dataco...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

dataco...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

dataco...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

dataco...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Python

#bigda...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to ru...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to ru...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to ru...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not hav...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not hav...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not hav...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not hav...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the f...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the f...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the f...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the f...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You need somet...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

mypath ="C:Use...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

This is not to...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

When to us...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

1st group ...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

2nd group ...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

3rd group ...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

When n...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Maybe ...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Maybe ...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

And, l...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

And, l...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Recap
...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

After ...
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Vragen of opmerkingen?

Da...
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Upcoming SlideShare
Loading in...5
×

Analyzing social media with Python and other tools (1/4)

1,546

Published on

Published in: Education, Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,546
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Analyzing social media with Python and other tools (1/4)

  1. 1. Good morning! Enjoy your coffee and install Putty and NotepadPlus via "Software Maintance/Application Catalgue". And the Pattern-package (see my e-mail). Thanks.
  2. 2. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Hands-on-Workshop Big (Twitter) Data Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 30 January 2014 9.30 #bigdata Damian Trilling
  3. 3. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? The next one and a half days You’ll hear about • Collecting social media data via APIs, RSS and scraping (and the tools for it) • Technical infrastructure (via surfsara) • Python • Sentiment analysis • Automated coding • Frequencies and other statistics • Social network analysis with Gephi • ... #bigdata Damian Trilling
  4. 4. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? In this session (1/4): 1 Big Data? What are we talking about? Exploring the field Some examples 2 The process: collect, store, analyze A scheme Our implementation 3 Python What it is When to use it When not to use it 4 Questions? #bigdata Damian Trilling
  5. 5. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What’s big data? What are we talking about? #bigdata Damian Trilling
  6. 6. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, it’s a hands-on workshop, so let’s keep this important (!) discussion for later. #bigdata Damian Trilling
  7. 7. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? So, no definition, but some brief thoughts • Existing data ( = experiments or surveys) • Too big to code manually • Too big to handle with normal tools • New research questions • Call to revisit the relationship between theory and empirical research #bigdata Damian Trilling
  8. 8. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, . . . • we are not going to talk about REALLY BIG data, • but we will have some exercises on datasets a normal computer can handle #bigdata Damian Trilling
  9. 9. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, . . . • we are not going to talk about REALLY BIG data, • but we will have some exercises on datasets a normal computer can handle Tomorrow, . . . • we will also learn about scaling up these techniques • SurfSARA provides infrastructure for this #bigdata Damian Trilling
  10. 10. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Some sources • Social Network Sites • RSS-feeds • Databases • Scraping text from the web • ... #bigdata Damian Trilling
  11. 11. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field It’s out there! You only have to collect it. #bigdata Damian Trilling
  12. 12. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field But why should we care? We can answer new questions • Find needles in haystacks • Identify networks, co-word analysis, linguistic analysis, . . . • Verify our theories in larger datasets #bigdata Damian Trilling
  13. 13. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field But why should we care? We can answer new questions • Find needles in haystacks • Identify networks, co-word analysis, linguistic analysis, . . . • Verify our theories in larger datasets It makes sense • There are things that computers are simply better at than humans, e.g. in counting things • Having human coders look for words in texts is like calculating a regression analysis by hand #bigdata Damian Trilling
  14. 14. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Some examples #bigdata Damian Trilling
  15. 15. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack #bigdata Damian Trilling
  16. 16. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack Imagine you want to analyze some very rare content. #bigdata Damian Trilling
  17. 17. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack Imagine you want to analyze some very rare content. Normal sampling won’t work, that’s for sure. #bigdata Damian Trilling
  18. 18. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  19. 19. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  20. 20. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. 2 Filter articles containing specific keywords. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  21. 21. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. 2 Filter articles containing specific keywords. 3 Those 292 articles where then manually coded. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  22. 22. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  23. 23. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples It’s just one line of code! url.txt http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food ... ... ... #bigdata wget-commando wget -i urls.txt Damian Trilling
  24. 24. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets #bigdata Damian Trilling
  25. 25. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. #bigdata Damian Trilling
  26. 26. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. Do you really want to go through thousands of tweets by hand? #bigdata Damian Trilling
  27. 27. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  28. 28. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  29. 29. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. She used a Python-script to check which type of words was used to refer to opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  30. 30. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. She used a Python-script to check which type of words was used to refer to opponents. For further analysis, the results where imported in SPSS. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  31. 31. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  32. 32. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  33. 33. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Frame adoption on Twitter Which phrases used by Merkel and Steinbrück on TV make it to the #tvduell discussion on Twitter? Identify frequently used words in the transcript of the debate and in tweets. Find co-occurrances. #bigdata Damian Trilling
  34. 34. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Frame adoption on Twitter #bigdata Damian Trilling
  35. 35. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? A scheme The process: collect, store, analyze A scheme #bigdata Damian Trilling
  36. 36. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl #bigdata Damian Trilling
  37. 37. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. #bigdata Damian Trilling
  38. 38. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. rsshond Calls the RSS-feeds of news sites 1x/hour, saves title, time, header, and teaser of all new articles into a CSV-table, follows the link to the full text and downloads them. #bigdata Damian Trilling
  39. 39. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. rsshond Calls the RSS-feeds of news sites 1x/hour, saves title, time, header, and teaser of all new articles into a CSV-table, follows the link to the full text and downloads them. snapshot Visits some URLs every 4x/day and downloads them. #bigdata Damian Trilling
  40. 40. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? #bigdata Damian Trilling
  41. 41. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. #bigdata Damian Trilling
  42. 42. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. SSH (scp) Transfer data directly to your computer or another server (like speeltuin.followthenews-uva.cloudlet.sara.nl) #bigdata Damian Trilling
  43. 43. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. SSH (scp) Transfer data directly to your computer or another server (like speeltuin.followthenews-uva.cloudlet.sara.nl) Beehub Connect the server to beehub, which can be mounted like the "p-schijf" or accessed online. #bigdata Damian Trilling
  44. 44. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Python #bigdata Damian Trilling
  45. 45. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? #bigdata Damian Trilling
  46. 46. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? Of course there are ready-made tool for some of the questions we want to answer. But for many, there isn’t. Python offers us the possibility to build exactly the tool we need. #bigdata Damian Trilling
  47. 47. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? Of course there are ready-made tool for some of the questions we want to answer. But for many, there isn’t. Python offers us the possibility to build exactly the tool we need. fun! #bigdata And it’s Damian Trilling
  48. 48. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform #bigdata Damian Trilling
  49. 49. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform • And yet it is easy to learn! #bigdata Damian Trilling
  50. 50. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform • And yet it is easy to learn! It is widely used for content analysis • Many online ressources and toolkits • Books about NLP and Web Scraping with Python #bigdata Damian Trilling
  51. 51. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. #bigdata Damian Trilling
  52. 52. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. #bigdata Damian Trilling
  53. 53. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. (But if you have ever had contact with whatever programming language, it helps.) #bigdata Damian Trilling
  54. 54. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. (But if you have ever had contact with whatever programming language, It’s enough if you can read and modify the code. it helps.) #bigdata Damian Trilling
  55. 55. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? #bigdata Damian Trilling
  56. 56. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 #bigdata The data structure: You have a folder with articles Damian Trilling
  57. 57. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 2 #bigdata The data structure: You have a folder with articles The desired output: You want a table with the file names and a column per actor, counting how often they are mentioned Damian Trilling
  58. 58. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 2 The desired output: You want a table with the file names and a column per actor, counting how often they are mentioned 3 #bigdata The data structure: You have a folder with articles A typical task for a short Python script! Damian Trilling
  59. 59. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You need someting like this: for every file in folder: read the file count actors add new row to table with filename and actor counts save table (such a notation is called pseudo-code) #bigdata Damian Trilling
  60. 60. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is mypath ="C:UsersRicardaDocumentsArtikelen" regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’) filename_list=[] matchcount54=0 matchcount54_list=[] onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ] for f in onlyfiles: matchcount54=0 artikel=open(join(mypath,f),"r") for line in artikel: matches54 = regex54.findall(line) for word in matches54: matchcount54=matchcount54+1 filename_list.append(f) matchcount54_list.append(matchcount54) artikel.close() output=zip(filename_list,matchcount54_list) writer = csv.writer(open("overzichtstabel.csv", ’wb’)) writer.writerows(output) #bigdata Damian Trilling
  61. 61. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is This is not too different from a script Jelle uses for his dissertation. The main difference: He doesn’t code regular expressions, but calculates document similarity. slides-jelle.pdf #bigdata Damian Trilling
  62. 62. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it When to use Python #bigdata Damian Trilling
  63. 63. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 1st group of tasks Highly repetitive tasks Simple tasks (counting things, comparing texts, . . . ) that can be described in a formalized way. Saves time even with few cases, but there is virtually no size limit. Example: Retweets start with RT, optionally followed by a space, and some letters. So it is very easy to identify them automatically #bigdata Damian Trilling
  64. 64. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 2nd group of tasks Task for which specific Python modules exist There are thousands of modules suitable for text analysis. You basically only have to write code for data input and output. Example: Sentiment analysis #bigdata Damian Trilling
  65. 65. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 3rd group of tasks API’s, RSS, webscraping . . . You can use Python if you want to collect and store information. Example: Collecting bio’s of Twitter users, scraping the web (data journalism!), downloading Facebook data #bigdata Damian Trilling
  66. 66. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it When not to use Python #bigdata Damian Trilling
  67. 67. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Maybe you do not need to write a Python script . . . . . . when there are already suitable tools available. Sometimes, the perfect ready-made tool already exists. Example: Axel Bruns’ awk-scripts for Twitter analysis (www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in Python, but hey, he did it already with awk and it works. #bigdata Damian Trilling
  68. 68. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Maybe you do not need to write a Python script . . . . . . when there are already suitable tools available. Sometimes, the perfect ready-made tool already exists. But still, sometimes it is more efficient to write something that does exactly what you want Example: Axel Bruns’ awk-scripts for Twitter analysis (www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in Python, but hey, he did it already with awk and it works. #bigdata Damian Trilling
  69. 69. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it And, let’s face it,. . . . . . we are no programmers. So maybe, some tasks are too complex for us to program ourselves. #bigdata Damian Trilling
  70. 70. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it And, let’s face it,. . . . . . we are no programmers. So maybe, some tasks are too complex for us to program ourselves. But there is a huge online community that helps you. #bigdata Damian Trilling
  71. 71. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Recap 1 Big Data? What are we talking about? Exploring the field Some examples 2 The process: collect, store, analyze A scheme Our implementation 3 Python What it is When to use it When not to use it 4 Questions? #bigdata Damian Trilling
  72. 72. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it After the break Hand’s on! Exploring a basic Python script #bigdata Damian Trilling
  73. 73. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Vragen of opmerkingen? Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net #bigdata Damian Trilling
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×