Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Click to edit Master title style
The Role of Data in IS Research
Frank Hopfgartner
University of Glasgow
@OkapiBM25
Click to edit Master title styleQuestion
Do you use a
dataset for your
research?
Click to edit Master title styleIntended Learning Outcome
• By the end of this session, you will be able to
– Explain the ...
Click to edit Master title styleOutline
• Importance of Data
• Getting Data
• Using Datasets for IS Research
Click to edit Master title styleWhy do we use data?
Because it helps us
to understand our
world
Click to edit Master title style
Example:
Ngram Viewer
Source: https://books.google.com/ngrams
Click to edit Master title style
Example:
Online publishing
D. Corney, D. Albakour, M. Martinez, S. Moussa
“What do a Mill...
Click to edit Master title style
Summarising:
Types of data
 Quantitative & Qualitative
 Numeric and Textual
 Compariso...
Click to edit Master title styleOutline
• Importance of Data
• Getting Data
• Using Datasets for IS Research
Click to edit Master title style
Example:
Opening UK Government
Source: https://data.gov.uk/
Click to edit Master title style
Example:
UK Data Archive
 Over 5,000 data
collections
 Largely economic
and social
 Fo...
Click to edit Master title style
Example:
UK Data Service
https://www.ukdataservice.ac.uk
 large-scale
government surveys...
Click to edit Master title style
Non-Public Data
Example: Google Trends
https://www.google.com/trends/home/all/GB
Click to edit Master title styleQuestion
But what if I want to
analyse non-public
data?
Click to edit Master title styleSome people just hack…
http://www.theguardian.com/news/2016/apr/03/what-you-need-to-know-a...
Click to edit Master title styleCreate your own data
• Record data, e.g.,
– Log files of users using information access sy...
Click to edit Master title style
Example:
Campus wide IPTV provider
• Campus wide IPTV provider
• Live and VoD content
• 1...
Click to edit Master title style
1
2
3
4
5
6
7
0246810121416182022
ARTS
CHILDRENS
COMEDY
DRAMA
ENTERTAINMENT
FACTUAL
FILM
...
Click to edit Master title style
Example:
Video retrieval platform
F. Hopfgartner, D. Scott, H. Wang, Y. Yang, Z. Zhang, M...
Click to edit Master title style
F. Hopfgartner and J. M. Jose. Semantic User Profiling Techniques for personalised multim...
Click to edit Master title style
Summarising:
What do I need to consider?
 Documentation
 Terms of deposit
 Permissions...
Click to edit Master title styleOutline
• Importance of Data
• Getting Data
• Using Datasets for IS Research
Click to edit Master title style
Use Case: Evaluation of
Information Access Systems
Information
Access
System
Input
Output
Click to edit Master title style
Examples:
Web Search Engines
Click to edit Master title style
Example:
Social Media Search Engines
Click to edit Master title style
Example:
Product Search Engines
26
Click to edit Master title style
Examples:
Multimedia Search Engines
Click to edit Master title style
Example:
Libraries
Click to edit Master title style
How do we evaluate
information access systems?
Document
collection
Topic
set
Relevance
as...
Click to edit Master title styleEvaluation Campaigns
TREC CLEF
FIRE
NTCIR
 Common dataset
 Pre-defined tasks
 Ground tr...
Click to edit Master title styleFocus on different domains
 Microblogging
 Ad-hoc and Web Search
 Multimedia
 Federate...
Click to edit Master title styleExample projects
Click to edit Master title styleCLEF Initative
Source:http://www.isical.ac.in/~fire/2013/slides/other_clef_fire13.pdf
Click to edit Master title styleCLEF Tracks
Source: http://www.clef-initiative.eu/track/series
 eHealth
 ImageCLEF
 Lif...
Click to edit Master title style
In CLEF NewsREEL, participants can
develop stream-based news
recommendation algorithms an...
Click to edit Master title styleExample: News Articles
Source (Image): T. Brodt of plista.com
Click to edit Master title style
Profit = Clicks on recommendations
Benchmarking metric: Click-Through-
Rate
Request
artic...
Click to edit Master title styleDataset
• Traffic and content
updates of nine German-
language news content
provider websi...
Click to edit Master title style
Evaluation using offline
dataset
Idomaar
request
articlessimulate
stream
Click to edit Master title styleExample results
B. Kille, A. Lommatzsch, R. Turrin, A. Sereny, M. Larson, T. Brodt, J. Sei...
Click to edit Master title styleExample projects
Click to edit Master title styleNTCIRSource:HideoJoho
Click to edit Master title styleNTCIR-12 Tasks
NTCIR-12
 Second round:
 Search-Intent Mining
 Mobile Click
 Temporal I...
Click to edit Master title style
Encourage research advances in organising
and retrieving from lifelog data.
LifeLog @ NTC...
Click to edit Master title styleWhat is The Quantified Self?
The Quantified Self is about obtaining self-knowledge through...
Click to edit Master title styleWhat is The Quantified Self?
Self-tracking is also referred to as lifelogging, self-analys...
Click to edit Master title styleExample: Visual Lifelogging
Click to edit Master title styleVisual Lifelog of a day
2,000 pictures a day
Slide: Cathal Gurrin
Click to edit Master title styleLifelogging Challenges
The challenges are how to sense
the person, their actions, their
li...
Click to edit Master title style
Multimodal dataset with
information needs
Created by three
individuals over
10+ days
TEST...
Click to edit Master title style
Evaluate different methods of
retrieval and access.
TasksT1:LIFELOGSEMANTICACCESS(LSAT)
T...
Click to edit Master title style
Task 1: Lifelog Semantic
Access
Find the
moment(s) where I
use my coffee
machine.
Find th...
Click to edit Master title styleTask 2: Lifelog Insight Task
Provide insights on
the time I spend
taking breakfast.
Provid...
Click to edit Master title styleFinal thoughts
• Data plays an essential role in scientific research since it is
used to p...
Upcoming SlideShare
Loading in …5
×

The Role of Data in IS Research

485 views

Published on

Slides of the lecture given on 13 April 2016 as part of the ESRC Information Science Scotland 2016 training event for PhD students at Napier University, Edinburgh.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

The Role of Data in IS Research

  1. 1. Click to edit Master title style The Role of Data in IS Research Frank Hopfgartner University of Glasgow @OkapiBM25
  2. 2. Click to edit Master title styleQuestion Do you use a dataset for your research?
  3. 3. Click to edit Master title styleIntended Learning Outcome • By the end of this session, you will be able to – Explain the need for datasets for scientific research – List components that comprise test collections – Identify appropriate datasets to answer research hypotheses – Create your own test collections
  4. 4. Click to edit Master title styleOutline • Importance of Data • Getting Data • Using Datasets for IS Research
  5. 5. Click to edit Master title styleWhy do we use data? Because it helps us to understand our world
  6. 6. Click to edit Master title style Example: Ngram Viewer Source: https://books.google.com/ngrams
  7. 7. Click to edit Master title style Example: Online publishing D. Corney, D. Albakour, M. Martinez, S. Moussa “What do a Million News Articles look like?” in Proc. NewsIR’16, pp. 42-47, 2016. Sampling from over 93,000 different news sources recorded in September 2015 Large-scale main News outlets Single-author Blogs
  8. 8. Click to edit Master title style Summarising: Types of data  Quantitative & Qualitative  Numeric and Textual  Comparison (like with like)  Context  Point-in-time  Longitudinal (series and interval)
  9. 9. Click to edit Master title styleOutline • Importance of Data • Getting Data • Using Datasets for IS Research
  10. 10. Click to edit Master title style Example: Opening UK Government Source: https://data.gov.uk/
  11. 11. Click to edit Master title style Example: UK Data Archive  Over 5,000 data collections  Largely economic and social  Founded in 1967  Office of National Statistics  Medical Research Council http://www.data-archive.ac.uk/
  12. 12. Click to edit Master title style Example: UK Data Service https://www.ukdataservice.ac.uk  large-scale government surveys  international macrodata  business microdata  qualitative studies  census data from 1971 to 2011
  13. 13. Click to edit Master title style Non-Public Data Example: Google Trends https://www.google.com/trends/home/all/GB
  14. 14. Click to edit Master title styleQuestion But what if I want to analyse non-public data?
  15. 15. Click to edit Master title styleSome people just hack… http://www.theguardian.com/news/2016/apr/03/what-you-need-to-know-about-the-panama-papers Disclaimer: This is not an appeal to perform any illegal activities.
  16. 16. Click to edit Master title styleCreate your own data • Record data, e.g., – Log files of users using information access systems – Sensor records – Digitise documents (accepting copyright) – …
  17. 17. Click to edit Master title style Example: Campus wide IPTV provider • Campus wide IPTV provider • Live and VoD content • 16 genres • 33 channels • Over 7000 different programme names • Over 500 unique users J. Yuan, F. Sikrivaya, F. Hopfgartner, A. Lommatzsch, M. Mu. Context-Aware LDA: Balancing Relevance and Diversity in TV Content Recommenders. In Proc. RecSysTV workshop, Vienna, Austria, 2015.
  18. 18. Click to edit Master title style 1 2 3 4 5 6 7 0246810121416182022 ARTS CHILDRENS COMEDY DRAMA ENTERTAINMENT FACTUAL FILM LEARNING LIFESTYLE MUSIC NEWS NULL RELIGIONANDETHICS SPORT SPORTS WEATHER day of w eek Category Distribution time of day categories categories chosen count 20 40 60 80 100 120 140 Example: Log user interaction data J. Yuan, F. Sikrivaya, F. Hopfgartner, A. Lommatzsch, M. Mu. Context-Aware LDA: Balancing Relevance and Diversity in TV Content Recommenders. In Proc. RecSysTV workshop, Vienna, Austria, 2015.
  19. 19. Click to edit Master title style Example: Video retrieval platform F. Hopfgartner, D. Scott, H. Wang, Y. Yang, Z. Zhang, M. Zhou, C. gurrin. Helping the Helpers: How Video Retrieval Can Assist Special Interest Groups. In Proc. MMM'13: 19th International Conference on Multimedia Modeling, pp. 493-495, 2013.
  20. 20. Click to edit Master title style F. Hopfgartner and J. M. Jose. Semantic User Profiling Techniques for personalised multimedia recommendation. Multimedia Systems 14(4-5):255- 274, 2010. F. Hopfgartner and J. M. Jose. An experimental evaluation of ontology-based user profiles. Multimedia Tools and Applications 73(2):1029-1051, 2014.
  21. 21. Click to edit Master title style Summarising: What do I need to consider?  Documentation  Terms of deposit  Permissions and re-use  Software  Methodology  Time  Place  Sampling  Data collection  Editorial control  Classification  Coding 21
  22. 22. Click to edit Master title styleOutline • Importance of Data • Getting Data • Using Datasets for IS Research
  23. 23. Click to edit Master title style Use Case: Evaluation of Information Access Systems Information Access System Input Output
  24. 24. Click to edit Master title style Examples: Web Search Engines
  25. 25. Click to edit Master title style Example: Social Media Search Engines
  26. 26. Click to edit Master title style Example: Product Search Engines 26
  27. 27. Click to edit Master title style Examples: Multimedia Search Engines
  28. 28. Click to edit Master title style Example: Libraries
  29. 29. Click to edit Master title style How do we evaluate information access systems? Document collection Topic set Relevance assessments Testcollection Document collection But how can we compare with state-of-the-art? System B System A
  30. 30. Click to edit Master title styleEvaluation Campaigns TREC CLEF FIRE NTCIR  Common dataset  Pre-defined tasks  Ground truth  Evaluation protocol  Evaluation metrics
  31. 31. Click to edit Master title styleFocus on different domains  Microblogging  Ad-hoc and Web Search  Multimedia  Federated Web Search  XML Retrieval  Information Access in the Legal Domain  Document Similarity  …
  32. 32. Click to edit Master title styleExample projects
  33. 33. Click to edit Master title styleCLEF Initative Source:http://www.isical.ac.in/~fire/2013/slides/other_clef_fire13.pdf
  34. 34. Click to edit Master title styleCLEF Tracks Source: http://www.clef-initiative.eu/track/series  eHealth  ImageCLEF  LifeCLEF  Living Labs for IR (LL4IR)  News Recommendation Evaluation Lab (NEWREEL)  Uncovering Plagiarism, Authorship and Social Software Misuse (PAN)  Social Book Search (SBS) CLEF’16
  35. 35. Click to edit Master title style In CLEF NewsREEL, participants can develop stream-based news recommendation algorithms and have them benchmarked (a) online by millions of users over the period of a few months in a living lab, and (b) offline by simulating a live stream. NEWSREEL F. Hopfgartner, T. Brodt, J. Seiler, B. Kille, A. Lommatzsch, M. Larson, R. Turrin, A. Sereny “Benchmarking News Recommendations: The CLEF NewsREEL Use Case,” in SIGIR Forum, 49(2):129-136, 2015
  36. 36. Click to edit Master title styleExample: News Articles Source (Image): T. Brodt of plista.com
  37. 37. Click to edit Master title style Profit = Clicks on recommendations Benchmarking metric: Click-Through- Rate Request article Request article Request recommendation Request recommendation
  38. 38. Click to edit Master title styleDataset • Traffic and content updates of nine German- language news content provider websites • Traffic: Reading article, clicking on recommendations • Updates: adding and updating news articles B. Kille, F. Hopfgartner, T. Brodt, T. Heintz “The plista Dataset” in Proc. NRS'13: International Workshop and Challenge on News Recommender Systems, Hong Kong, China, pp. 16-23, 2013.
  39. 39. Click to edit Master title style Evaluation using offline dataset Idomaar request articlessimulate stream
  40. 40. Click to edit Master title styleExample results B. Kille, A. Lommatzsch, R. Turrin, A. Sereny, M. Larson, T. Brodt, J. Seiler, F. Hopfgartner “Overview of CLEF NewsREEL 2015: News Recommendation Evaluation Lab,” in Working Notes of CLEF 2015, Toulouse, France, 2015.
  41. 41. Click to edit Master title styleExample projects
  42. 42. Click to edit Master title styleNTCIRSource:HideoJoho
  43. 43. Click to edit Master title styleNTCIR-12 Tasks NTCIR-12  Second round:  Search-Intent Mining  Mobile Click  Temporal Information Access  Spoken Query & Spoken Document Retrieval  QA Lab for Entrance Exam  First round:  Medical NLP for Clinical Documents  Personal Lifelog Access & Retrieval  Short Text Conversation
  44. 44. Click to edit Master title style Encourage research advances in organising and retrieving from lifelog data. LifeLog @ NTCIR-12
  45. 45. Click to edit Master title styleWhat is The Quantified Self? The Quantified Self is about obtaining self-knowledge through self-tracking.
  46. 46. Click to edit Master title styleWhat is The Quantified Self? Self-tracking is also referred to as lifelogging, self-analysis, or self-hacking.
  47. 47. Click to edit Master title styleExample: Visual Lifelogging
  48. 48. Click to edit Master title styleVisual Lifelog of a day 2,000 pictures a day Slide: Cathal Gurrin
  49. 49. Click to edit Master title styleLifelogging Challenges The challenges are how to sense the person, their actions, their life and make it accessible using appropriate interfaces, search, recommendation engines and visual/aural feedback. Further, exploiting the lifelog to identify context for adaptive information services. Source (Graphic): DAI-Labor, Berlin
  50. 50. Click to edit Master title style Multimodal dataset with information needs Created by three individuals over 10+ days TESTCOLLECTION  18.18GB  88,124 images  Accompanying output of 1,000 concepts (825MB)  Data processed pre-release (removal of personal content; face blurring, translation of concepts)  Detailed user queries and judgments generated by the lifelogging data gatherers C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, R. Albatal “NTCIR-Lifelog: The First Test Collection for Lifelog Research”, in Proc. SIGIR'16: ACM International Conference on Information Retrieval, Pisa, Italy, to appear.
  51. 51. Click to edit Master title style Evaluate different methods of retrieval and access. TasksT1:LIFELOGSEMANTICACCESS(LSAT) T2:LIFELOGINSIGHT  Models the retrieval need from lifelogs (Known-Item Search)  Retrieve N segments that match information need  Interactive or Automatic participation  Interactive: Time limit for fair and comparative evaluation in an interactive system with users  Automatic: Fully-automatic retrieval system. Automated query processing  Models the need for reflection over lifelog data  Exploratory task, the aim is to:  encourage broad participation  novel methods to visualise and explore lifelogs  Same data as LSAT task  Presented via demo/poster.
  52. 52. Click to edit Master title style Task 1: Lifelog Semantic Access Find the moment(s) where I use my coffee machine. Find the moment(s) where I am in the kitchen Find the moment(s) where I am playing with my phone. Find the moment(s) where I am preparing breakfast.
  53. 53. Click to edit Master title styleTask 2: Lifelog Insight Task Provide insights on the time I spend taking breakfast. Provide insights on the time I spend driving to work. Provide insights on the time I spend reading a paper. Provide insights on the time I spend working on the computer.
  54. 54. Click to edit Master title styleFinal thoughts • Data plays an essential role in scientific research since it is used to prove or disprove a hypothesis • You are now familiar with various sources where you can get datasets that might be useful for your own research • When selecting data, question its credibility, e.g., is it biased? Can it be used to support your hypotheses? • Consider accessibility of the data you want to analyse. Are you allowed to use it? Can others (e.g., other researchers?) access the data?

×