Slides for the course Big Data and Automated Content Analysis, in which students of the social sciences (communication science) learn how to conduct analyses using Python. First Meeting.
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
BD-ACA week1a
1. Introducing. . . The Linux command line Python Next meetings
Big Data and Automated Content Analysis
Week 1 – Monday
»Introduction«
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam
30 March 2015
Big Data and Automated Content Analysis Damian Trilling
2. Introducing. . . The Linux command line Python Next meetings
Today
1 Introducing. . .
. . . the people
. . . the topic
. . . the methods
. . . the tools
2 The Linux command line
3 Python
4 Next meetings
Big Data and Automated Content Analysis Damian Trilling
3. Introducing. . . The Linux command line Python Next meetings
. . . the people
Introducing. . .
. . . the people
Big Data and Automated Content Analysis Damian Trilling
4. Introducing. . . The Linux command line Python Next meetings
. . . the people
Introducing. . .
Damian dr. Damian Trilling
Lecturer Political Communication & Journalism
• studied Communication Science in Münster
and at the VU 2003–2009
• PhD candidate @ ASCoR 2009–2012
• now: Universitair Docent (UD) / Assistant
Professor
• interested in political communication and
journalism in a changing media environment
and in innovative (digital, large-scale,
computational) research methods
@damian0604 d.c.trilling@uva.nl
REC-C 8th
floor www.damiantrilling.net
Big Data and Automated Content Analysis Damian Trilling
5. Introducing. . . The Linux command line Python Next meetings
. . . the people
Introducing. . .
Björn
Björn Burscher, MSc.
PhD Candidate
• studied Political Communication &
Information Science
• currently PhD candidate @ ASCoR
• interested in automatic content analysis and
elections
b.burscher@uva.nl
Big Data and Automated Content Analysis Damian Trilling
6. Introducing. . . The Linux command line Python Next meetings
. . . the people
Introducing. . .
You
Your name?
Your background?
Your reason to follow this course?
Big Data and Automated Content Analysis Damian Trilling
7. Introducing. . . The Linux command line Python Next meetings
. . . the topic
Introducing. . .
. . . the topic
Big Data and Automated Content Analysis Damian Trilling
8. Introducing. . . The Linux command line Python Next meetings
. . . the topic
Introducing. . .
. . . the topic
⇒on Wednesday
Big Data and Automated Content Analysis Damian Trilling
9. Introducing. . . The Linux command line Python Next meetings
. . . the methods
Introducing. . .
. . . the methods
Big Data and Automated Content Analysis Damian Trilling
10. Introducing. . . The Linux command line Python Next meetings
. . . the methods
Introducing. . .
. . . the methods
⇒on Wednesday
Big Data and Automated Content Analysis Damian Trilling
11. Introducing. . . The Linux command line Python Next meetings
. . . the tools
Introducing. . .
. . . the tools
Big Data and Automated Content Analysis Damian Trilling
12. Introducing. . . The Linux command line Python Next meetings
. . . the tools
Introducing. . .
. . . the tools
⇒now!
Big Data and Automated Content Analysis Damian Trilling
13. Introducing. . . The Linux command line Python Next meetings
When point-and-click doesn’t help you further:
The Linux command line
Big Data and Automated Content Analysis Damian Trilling
14.
15. Introducing. . . The Linux command line Python Next meetings
Let’s switch to Linux!
Big Data and Automated Content Analysis Damian Trilling
16. Introducing. . . The Linux command line Python Next meetings
Tools: The linux command line
a.k.a. the terminal, shell or, more specifically, bash
Big Data and Automated Content Analysis Damian Trilling
17. Introducing. . . The Linux command line Python Next meetings
Tools: The linux command line
Big Data and Automated Content Analysis Damian Trilling
18. Introducing. . . The Linux command line Python Next meetings
Tools: The linux command line
Why?
Big Data and Automated Content Analysis Damian Trilling
19. Introducing. . . The Linux command line Python Next meetings
Tools: The linux command line
Why?
• Direct access to your computer’s functions
Big Data and Automated Content Analysis Damian Trilling
20. Introducing. . . The Linux command line Python Next meetings
Tools: The linux command line
Why?
• Direct access to your computer’s functions
• In contrast to point-and-click programs, command line
programs can easily be linked to each other, scripted, . . .
Big Data and Automated Content Analysis Damian Trilling
21. Introducing. . . The Linux command line Python Next meetings
Tools: The linux command line
Why?
• Direct access to your computer’s functions
• In contrast to point-and-click programs, command line
programs can easily be linked to each other, scripted, . . .
• Suitable for handling even huge files
Big Data and Automated Content Analysis Damian Trilling
22. Introducing. . . The Linux command line Python Next meetings
Tools: The linux command line
Why?
• Direct access to your computer’s functions
• In contrast to point-and-click programs, command line
programs can easily be linked to each other, scripted, . . .
• Suitable for handling even huge files
• You simply cannot open them in many GUI programs
• . . . or it takes ages
• The command line allows you to do such things without
problems
Big Data and Automated Content Analysis Damian Trilling
23. Introducing. . . The Linux command line Python Next meetings
Tools: The linux command line
Why?
• Direct access to your computer’s functions
• In contrast to point-and-click programs, command line
programs can easily be linked to each other, scripted, . . .
• Suitable for handling even huge files
• You simply cannot open them in many GUI programs
• . . . or it takes ages
• The command line allows you to do such things without
problems
• It is reproducible (ever tried to explain to your parents on the
phone where they have to click?)
Big Data and Automated Content Analysis Damian Trilling
24. There are endless tutorials, cheat
sheets, videos . . . online. Google it!
25. Introducing. . . The Linux command line Python Next meetings
Exercise
Take the book.
Follow the instructions in Chapter 2.
Big Data and Automated Content Analysis Damian Trilling
26. Introducing. . . The Linux command line Python Next meetings
A language, not a program:
Python
Big Data and Automated Content Analysis Damian Trilling
27. Introducing. . . The Linux command line Python Next meetings
Python
What?
• A language, not a specific program
• Huge advantage: flexibility, portability
• One of the languages for data analysis. (The other one is R.)
Big Data and Automated Content Analysis Damian Trilling
28. Introducing. . . The Linux command line Python Next meetings
Python
What?
• A language, not a specific program
• Huge advantage: flexibility, portability
• One of the languages for data analysis. (The other one is R.)
But Python is more flexible—the original version of Dropbox was written in Python. I’d say: R for
numbers, Python for text and messy stuff. But that’s just my personal view.
Big Data and Automated Content Analysis Damian Trilling
29. Introducing. . . The Linux command line Python Next meetings
Python
What?
• A language, not a specific program
• Huge advantage: flexibility, portability
• One of the languages for data analysis. (The other one is R.)
Which version?
We use Python 3.
http://www.google.com or http://www.stackexchange.com still offer a lot
of Python2-code, but that can easily be adapted. Most notable difference: In
Python 2, you write print "Hi", this has changed to print ("Hi")
Big Data and Automated Content Analysis Damian Trilling
30. Introducing. . . The Linux command line Python Next meetings
If it’s not a program, how do you work with it?
Interactive mode
• Just type python3 on the command line, and you can start
entering Python commands (You can leave again by entering quit())
• Great for quick try-outs, but you cannot even save your code
Big Data and Automated Content Analysis Damian Trilling
31. Introducing. . . The Linux command line Python Next meetings
If it’s not a program, how do you work with it?
Interactive mode
• Just type python3 on the command line, and you can start
entering Python commands (You can leave again by entering quit())
• Great for quick try-outs, but you cannot even save your code
An editor of your choice
• Write your program in any text editor, save it as myprog.py
• and run it from the command line with ./myprog.py or
python3 myprog.py
Big Data and Automated Content Analysis Damian Trilling
32. Introducing. . . The Linux command line Python Next meetings
If it’s not a program, how do you start it?
An IDE (Integrated Development Environment)
• Provides an interface
• Both quick interactive try-outs and writing larger programs
• We use spyder, which looks a bit like RStudio (and to some
extent like Stata)
Big Data and Automated Content Analysis Damian Trilling
33.
34. Introducing. . . The Linux command line Python Next meetings
Exercises
1. Run a program that greets you.
The code for this is
1 print("Hello world")
After that, do some calculations. You can do that in a similar way:
1 a=2
2 print(a*3)
Just play around.
2. Take the book.
Follow the instructions in Chapter 3.
We will talk about the concepts that are introduced during the
next lectures, but it helps if you first try to get started
yourself—that’s less abstract.
Big Data and Automated Content Analysis Damian Trilling
35. Introducing. . . The Linux command line Python Next meetings
Next meetings
Big Data and Automated Content Analysis Damian Trilling
36. Introducing. . . The Linux command line Python Next meetings
Next meetings
Wednesday, 1 April: Lecture
Introduction to the theoretical and methodological underpinnings.
Don’t forget to read the articles in advance.
Wednesday, 8 April: Lab Session
Some serious programming in Python
Big Data and Automated Content Analysis Damian Trilling