Quantifying Qualitative Data
– Big Time!
Jukka Sihvonen©
Department of Accounting and Finance
Workflow
Data acquisition
Data analysis
Results
Quantitative
- Replicable
- Scalable
- Easily revisable
Qualitative
- Not replicable
- Not scalable
- Revision is hard
Financial research Literary research
month(s)
month(s)
minute(s)
minute(s)
Service
Jukka Sihvonen© | Big Data and Digitalization
Analysis
at the office
by hand
Qualitative data
- Photos
- Movies
- Speech
- Text
Results
Traditional way
Jukka Sihvonen© | Big Data and Digitalization
Application Programming Interface (API)
Qualitative data
- Photos
- Movies
- Speech
- Text
Data API Analysis API
Intelligence API
Data refining API
Results
Replicability – Speed – Possibilities
Jukka Sihvonen© | Big Data and Digitalization
Data APIs – retrieve machine readable data
Examples
- Wikipedia
- Facebook
- Twitter
- Instagram
- Suomi24
- Yle
- Google Maps
- Accuweather
- Scopus
import twitter
api = twitter.Api(my_credentials)
tweets = api.GetSearch(term = “univaasa”, count = 10)
Request …
... Response
Twitter API
Twitter example: @univaasa
Jukka Sihvonen© | Big Data and Digitalization
Data refining APIs – metadata and converting
Examples
- Voice to gender
- Names to ethnicity
- Novel to characters
- Articles to abstracts
- Email to language
- Speech to text
- Photo to text
- HTML to text
- PDF to text
Microsoft Cognitive API
Jukka Sihvonen© | Big Data and Digitalization
Analysis APIs – insights from non-numerical data
Examples
face to emotion, speech to person, diary to personality,
story to concept, correspondence to tone
anger: 0.00,
contempt: 0.00,
disgust: 0.00,
fear: 0.00,
happiness: 0.73,
neutral: 0.26,
sadness: 0.00
surprise: 0.00
Emotions…
This is my rifle. There
are many like it, but
this one is mine.
My rifle is my best
friend. It is my life. I
must master it as I
must master my life.
My rifle, without me,
is useless...
Personality…
inner-directed,
strict,
shrewd,
skeptical,
restrained
Our Father,
who art in heaven,
hallowed be thy name,
thy kingdom come,
thy will be done,
on earth as it is in
heaven.
Give us this day our
daily bread and
forgive us our debts…
Concepts…
Christianity
Christian prayer
Linguistics
Gospel of Matthew
Lord's Prayer
Me! Rifleman’s Creed Lord’s Prayer
Jukka Sihvonen© | Big Data and Digitalization
Intelligence APIs – teach machine to classify
Three Musketeers Moby Dick Crime and Punishment
Count de Rochefort Ishmael Dmitri Razumikhin
Monsieur Bonacieux Captain de Deer Andrei Lebezyatnikov
Duke of Buckingham Dough Boy Nastasya Petrovna
Bazin Starbuck Porfiry Petrovich
Felton Ahab Katerina Marmeladov
Anne of Austria Flask Pyotr Luzhin
Athos Fedallah Pulcheria Raskolnikov
Kitty Stubb Alexander Zamyotov
Constance de Bonacieux Captain Bildad Ilya Petrovich
Planchet Queequeg Rodion Raskolnikov
Monsieur de Tréville Daggoo Lizaveta Ivanovna
Milady de Winter Elijah Sofya Marmeladov
Porthos Captain Peleg Alyona Ivanovna
Character Which book? Accuracy
D'Artagnan ?
Grimaud ?
Aramis ?
Mousqueton ?
Captain Boomer ?
Father Mapple ?
Tashtego ?
Pip ?
Semyon Marmeladov ?
Arkady Svidrigailov ?
Zossimov ?
Avdotya Raskolnikov ?
IBM
Watson
Training data Testing data
Character Which book? Accuracy
D'Artagnan Three musketeers True
Grimaud Three musketeers True
Aramis Three musketeers True
Mousqueton Three musketeers True
Captain Boomer Moby Dick True
Father Mapple Moby Dick True
Tashtego Moby Dick True
Pip Moby Dick True
Semyon Marmeladov Crime and punishment True
Arkady Svidrigailov Crime and punishment True
Zossimov Moby Dick False
Avdotya Raskolnikov Crime and punishment True
Jukka Sihvonen© | Big Data and Digitalization
Scaling up – literary research
For Each Book in Library:
For Each Page in Book:
Result(Page) <= Send(Page, API)
Next Page
Next Book
Library none
Book The Diary of a Young Girl
API Watson Sentiment Analysis
For-loop that sends textual material to API:
Empirical exercise: Anne Frank’s diary
Cumulative Standardized Sentiment 1942 – 1944
Fritz Pfeffer
joins the annex
“The sun is
shining … I think
spring is inside
me. I feel spring
awakening”
Jukka Sihvonen© | Big Data and Digitalization
Scaling up – communication studies
For Each Movie in Collection:
For Each Frame in Movie:
Result(Frame) <= Send(Frame, API)
Next Frame
Next Movie
Collection none
Movie Final presidential debate
API Microsoft Emotion API
For-loop that sends images to API:
Empirical exercise: Trump-Clinton Debate
Trump’s primary
facial expression
is angry, and
more so if not
having the floor
Hillary emphasizes
message by raising
eyebrows, expresses
integrity by smiling
Jukka Sihvonen© | Big Data and Digitalization
Key takeaways
- Non-numeric data can be obtained and analyzed at an industrial scale
- The scientific workflow can be automated by chaining APIs cleverly
- Huge implications on what can be researched and at what cost
Jukka Sihvonen© | Big Data and Digitalization

Quantifying Qualitative Data – Big Time!

  • 1.
    Quantifying Qualitative Data –Big Time! Jukka Sihvonen© Department of Accounting and Finance
  • 2.
    Workflow Data acquisition Data analysis Results Quantitative -Replicable - Scalable - Easily revisable Qualitative - Not replicable - Not scalable - Revision is hard Financial research Literary research month(s) month(s) minute(s) minute(s) Service Jukka Sihvonen© | Big Data and Digitalization
  • 3.
    Analysis at the office byhand Qualitative data - Photos - Movies - Speech - Text Results Traditional way Jukka Sihvonen© | Big Data and Digitalization
  • 4.
    Application Programming Interface(API) Qualitative data - Photos - Movies - Speech - Text Data API Analysis API Intelligence API Data refining API Results Replicability – Speed – Possibilities Jukka Sihvonen© | Big Data and Digitalization
  • 5.
    Data APIs –retrieve machine readable data Examples - Wikipedia - Facebook - Twitter - Instagram - Suomi24 - Yle - Google Maps - Accuweather - Scopus import twitter api = twitter.Api(my_credentials) tweets = api.GetSearch(term = “univaasa”, count = 10) Request … ... Response Twitter API Twitter example: @univaasa Jukka Sihvonen© | Big Data and Digitalization
  • 6.
    Data refining APIs– metadata and converting Examples - Voice to gender - Names to ethnicity - Novel to characters - Articles to abstracts - Email to language - Speech to text - Photo to text - HTML to text - PDF to text Microsoft Cognitive API Jukka Sihvonen© | Big Data and Digitalization
  • 7.
    Analysis APIs –insights from non-numerical data Examples face to emotion, speech to person, diary to personality, story to concept, correspondence to tone anger: 0.00, contempt: 0.00, disgust: 0.00, fear: 0.00, happiness: 0.73, neutral: 0.26, sadness: 0.00 surprise: 0.00 Emotions… This is my rifle. There are many like it, but this one is mine. My rifle is my best friend. It is my life. I must master it as I must master my life. My rifle, without me, is useless... Personality… inner-directed, strict, shrewd, skeptical, restrained Our Father, who art in heaven, hallowed be thy name, thy kingdom come, thy will be done, on earth as it is in heaven. Give us this day our daily bread and forgive us our debts… Concepts… Christianity Christian prayer Linguistics Gospel of Matthew Lord's Prayer Me! Rifleman’s Creed Lord’s Prayer Jukka Sihvonen© | Big Data and Digitalization
  • 8.
    Intelligence APIs –teach machine to classify Three Musketeers Moby Dick Crime and Punishment Count de Rochefort Ishmael Dmitri Razumikhin Monsieur Bonacieux Captain de Deer Andrei Lebezyatnikov Duke of Buckingham Dough Boy Nastasya Petrovna Bazin Starbuck Porfiry Petrovich Felton Ahab Katerina Marmeladov Anne of Austria Flask Pyotr Luzhin Athos Fedallah Pulcheria Raskolnikov Kitty Stubb Alexander Zamyotov Constance de Bonacieux Captain Bildad Ilya Petrovich Planchet Queequeg Rodion Raskolnikov Monsieur de Tréville Daggoo Lizaveta Ivanovna Milady de Winter Elijah Sofya Marmeladov Porthos Captain Peleg Alyona Ivanovna Character Which book? Accuracy D'Artagnan ? Grimaud ? Aramis ? Mousqueton ? Captain Boomer ? Father Mapple ? Tashtego ? Pip ? Semyon Marmeladov ? Arkady Svidrigailov ? Zossimov ? Avdotya Raskolnikov ? IBM Watson Training data Testing data Character Which book? Accuracy D'Artagnan Three musketeers True Grimaud Three musketeers True Aramis Three musketeers True Mousqueton Three musketeers True Captain Boomer Moby Dick True Father Mapple Moby Dick True Tashtego Moby Dick True Pip Moby Dick True Semyon Marmeladov Crime and punishment True Arkady Svidrigailov Crime and punishment True Zossimov Moby Dick False Avdotya Raskolnikov Crime and punishment True Jukka Sihvonen© | Big Data and Digitalization
  • 9.
    Scaling up –literary research For Each Book in Library: For Each Page in Book: Result(Page) <= Send(Page, API) Next Page Next Book Library none Book The Diary of a Young Girl API Watson Sentiment Analysis For-loop that sends textual material to API: Empirical exercise: Anne Frank’s diary Cumulative Standardized Sentiment 1942 – 1944 Fritz Pfeffer joins the annex “The sun is shining … I think spring is inside me. I feel spring awakening” Jukka Sihvonen© | Big Data and Digitalization
  • 10.
    Scaling up –communication studies For Each Movie in Collection: For Each Frame in Movie: Result(Frame) <= Send(Frame, API) Next Frame Next Movie Collection none Movie Final presidential debate API Microsoft Emotion API For-loop that sends images to API: Empirical exercise: Trump-Clinton Debate Trump’s primary facial expression is angry, and more so if not having the floor Hillary emphasizes message by raising eyebrows, expresses integrity by smiling Jukka Sihvonen© | Big Data and Digitalization
  • 11.
    Key takeaways - Non-numericdata can be obtained and analyzed at an industrial scale - The scientific workflow can be automated by chaining APIs cleverly - Huge implications on what can be researched and at what cost Jukka Sihvonen© | Big Data and Digitalization