Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data matters-bournemouth-2015

576 views

Published on

Talk at Bournemouth University 16th September.

The main part of this talk is on the post-hoc analysis of REF data for computing, the apparent bias by sub-area, institution and gender, and the implications of this for policy in UK computing.

In addition I briefly review a number of other areas of my research where data is central.

http://alandix.com/ref2014/2015/09/16/ref-talk-at-bournemouth/

Published in: Education
  • Be the first to comment

  • Be the first to like this

Data matters-bournemouth-2015

  1. 1. Data Matters Alan Dix Talis & University of Birmingham http://alandix.com/ref2014/
  2. 2. University of Birmingham Tiree Tiree Tech Wave 22-26 October 2015
  3. 3. today I am not talking about … • intelligent internet interfaces • visualisation and sampling • situated displays, eCampus, small device – large display interactions • fun and games, virtual crackers, artistic performance, slow time • creativity and Bad Ideas • modelling dreams and regret and the emergence of self …
  4. 4. … or even lots of lights http:/www.hcibook.com/alan/projects/firefly/
  5. 5. I am talking about ... REF data analysis long tail of small data
  6. 6. REF
  7. 7. REF 2014 Research Excellence Framework approx 5 yearly research assessment in the UK not just about the UK … lots of countries thinking to do similar ... and looking to REF as example
  8. 8. REF elements three elements: outputs (mainly papers) impact environment focus of this work
  9. 9. REF panels 4 main panels, 36 sub-panels, ~200K outputs sub-panel 11: computer science and informatics I was on this panel but NO confidential data here everything public domain
  10. 10. REF profiles every output graded: 4* / 3* / 2* / 1* individual grades confidential and destroyed each ‘Unit of Assessment’ (dept) given a profile http://results.ref.ac.uk/Results/ByUoa/11/Outputs
  11. 11. sub-area profiles N.B. computing only each output given ACM code originally to enable allocation to panelists … but, also used to create sub-area profiles …
  12. 12. sub-area profiles From Morris Sloman’s slides & panel report theoretical areas 30-40% 4* applied/human areas 10-20% 4*
  13. 13. data not information sub-panel report warning: "These data should be treated with circumspection … however already affecting institutional policy hiring, internal investment … and may influence research council policy
  14. 14. possible reasons for variation … 1. best applied work is weak – including HCI :-/ 2. long tail – weak researchers choose applied areas 3. latent bias – despite panel’s efforts to be fair can bibliometrics disentangle these?
  15. 15. metrics and assessment citation metrics known to be good post-hoc correlates of sophisticated measures … but not for individuals and small cohorts and danger of gaming and policy distortion suitable for verifying large-scale patterns (and HEFCE using them for this)
  16. 16. data used for analysis all in public domain (virtually) complete list of outputs: – excluding a few confidential ones – for each: name, doi, ACM topic area, Scopus citations Google scholar citations for each – gathered after REF (not used in assessment) UoA and sub-area profiles
  17. 17. metrics used Scopus (late 2013 census ) – with/without 2012/13 as few citations ‘Normalised Scopus’ – using ‘contextual data’, corrects for different citation patterns between areas – places output in top 1%, 5%, 10% of its area worldwide Google Scholar (late 2014 census) – with/without 2012/13; zero treated as zero/missing seven variants – all give similar results
  18. 18. results … massive differences % citations in top quartile % REF 4* ratio winners losers
  19. 19. ‘scatter’ graph % outputs in top quartile for citations % outputs awarded REF 4*
  20. 20. rank scores winners losers diagram thanks to Andrew Howes
  21. 21. Another way of looking at it … world ranking within own field
  22. 22. recall REF …
  23. 23. for example, HCI research (web similar) … on average … • HCI/CSCW paper needs to be in top 0.5% worldwide to get 4* • logic/algorithms paper just needs to be in top 5% 10 fold difference
  24. 24. and just as you thought it was all over … … institutional effects look at +/- 25% REF compared with citations N.B. use high-end weighted measure as money is focused (4:1:0:0) of 35 losers, 25 are post-1992 universities of 17 winners, 16 are pre-1992 universities
  25. 25. an example … XXXXXXX – a new university YYYYYYYY – an old university World Rankings REF
  26. 26. and Gender? Female authors in main panel B were significantly less likely to achieve a 4* output than male authors with the same metrics ratings. When considered in the UOA models, women were significantly less likely to have 4* outputs than men whilst controlling for metric scores in the following UOAs: Psychology, Psychiatry and Neuroscience; Computer Science and Informatics; Architecture, Built Environment and Planning; Economics and Econometrics. The Metric Tide (HEFCE, 2015)
  27. 27. implicit bias? HEFCE analysis: male staff in computing is 1/3 more likely to get a 4* than female areas and types institutions disadvantaged by REF often those with more women … implications for future recruitment?
  28. 28. future for research assessment? • pure metrics? • metrics as part (e.g. older outputs) • metrics as under-girding (burden of proof) • human process – metrics for in-process feedback
  29. 29. .. long tail of small data
  30. 30. Big Data everyone is talking about it Twitter, Google, Facebook, NSA, universities, … and funding Big Data does it with MapReduce Semantic Data does it with RDF
  31. 31. the long tail size of data set a few very large data sets e.g. Twitter, streams, Open Govt., OS, geonames, dbpedia the small data of ordinary life: from local bus timetables to squash club league tables
  32. 32. stories of small data … Walking Wales Learning analytics Open Data Islands and Communities Musicology
  33. 33. Alan Walks Wales 1058 miles (1700km) 3 million footfalls 3 ½ months April-July 2013 focus on IT at the margins one thousand miles of poetry, technology and community
  34. 34. vision personal encircling, encompassing, pilgrimage, homecoming, practical IT for the walker & IT for local communities philosophical reflections on walking and space, locality and identity research personal agenda and living lab lots of data
  35. 35. data location GPX ... batteries ... sporadic signals .... bio-sensing ECG (heart), EDA (skin) and accelerometers audio and images in the moment text after the event implicit explicit The largest ECG trace in the public domain
  36. 36. challenges (1) location GPX – merging and mending bio-sensing ECG & EDA – special formats & volume audio and images volume, transcription and annotation text semantic markup, synchronising sources
  37. 37. challenges (2) documentation methodology of creation, data formats for other people to use! meta-data for machines to use PR telling the world about it! academic culture we do not value data!
  38. 38. an offer multiple synchronisable data streams largest public domain ECG trace post-hoc analysis simulate real use please use it!
  39. 39. Learning analytics macro-analytics university strategy MOOCs micro-analytics individual course, student, resource
  40. 40. time frames for learning analytics days and hours email, during lectures and labs, stduent meetings, gaps week preparing for teaching, exercises months/mid-semester reporting points, staff meetings, cohort/student progress end of semester/term/year exams, exam boards, course revew, start of semester/term/year preparing for new courses or re-runs, rollover! years new courses, professional development, appraisal, promotion
  41. 41. Open Data everyone is doing it Governments, Cities, local gov. In C21 Data is Power
  42. 42. why not an island?
  43. 43. island data flows Community groups and individuals rest of the world other communities 1 2 3 4
  44. 44. island data flows from community to world Community groups and individuals rest of the world 1 • visibility and control • identity and empowerment • level of detail • local knowledge
  45. 45. island data flows from world to community Community groups and individuals rest of the world 2 • making the most of open data • local decision making • lobbying and negotiation
  46. 46. island data flows within the community Community groups and individuals 3 • gossip is not enough! • sparse, dispersed population • social cohesion and economic benefits
  47. 47. island data flows between communities Community groups and individuals other communities 4 • sharing best practice • brand presence • interlinked data
  48. 48. benefits to … the community empowerment and control availability of information communication within and between communities the world improved quality of data level of detail of data local knowledge and understanding
  49. 49. In Concert Concert ephemera 1750–1800 Calendar of London Concerts 1815–1895 Concert Life in London 1894–1944 Concert Programme Exchange (BL) External sources MusicBrainz MBz id as connect into Linked Data, BBC, etc. Authoritative sources (future) e.g. British Library BNB, Concert Programmes metadata
  50. 50. concert database classic digital humanities? original sources selected sources systematic sample transcription & extraction (medium expertise) interpretation (high expertise) digitised sources authoritative data analysis & use (high expertise) academic publication large digital archive (e.g. BBC) possibly create linkage
  51. 51. Barriers to progress effort and expertise authority and quality digital acontextuality openness
  52. 52. Openness and Reward Career development Leverhulme & REF Building the discipline?
  53. 53. Re-envisioning the Digital Archive: Curation and Use
  54. 54. big bang to incremental digitised sources authoritative data academic publication ...
  55. 55. big bang to incremental problem focused augmentation transform cost-benefit digitial archive academic publications ... partial enhancement & interpretation
  56. 56. scenario-focused investigations
  57. 57. => reflection and requirements digital symbiosis suggestion and confirmation provenance and authority spreadsheet as user interface semantics through interaction
  58. 58. themes and take-aways ... data in context heterogeneity and linking value and values ethics and empowerment …. and please use my data 

×