Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Reproducibility Analytics Lab

102 views

Published on

Jisc Analytics Labs is an approach to the development of decision-making tools underpinned by data. This presentation will briefly outline this approach and then focus on the results of the reproducibility lab which used data from articles on animal-based research to assess the degree to which factors affecting research reproducibility are reported

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Reproducibility Analytics Lab

  1. 1. What is Analytics Labs?
  2. 2. We have worked with partners to create a business intelligence shared service for UK education Runner up in the 2018 National Technology Awards http://nationaltechnologyawards.co.uk/
  3. 3. • Unique CPD opportunity • Teams of analysts from across UK HE • Expertise in policy, data, and visualisation • One day a week for 13 weeks • Access to a range of data sources including HESA • Aim is to produce proof-of-concept dashboards • Remote working using agile project management methods • Secure data processing environment • 289 participants from 109 UK universities (including 11 APs) What is Analytics Labs? www.jisc.ac.uk/analytics-labs
  4. 4. Analytics Labs – Working in an Agile way – Activity, Team Roles and Approach
  5. 5. Makeup of an Analytics Labs team Product Owner Brings an understanding of the policy context and the needs of customers Data Analyst Expertise in data and analysis, especially from an HE perspective Scrum Master Data & Viz Support Keeps the project on track and removes impediments Specialist knowledge in tools such as Alteryx and Tableau Meta Product Owner Provides expertise and guidance in the specific theme
  6. 6. Lab – the environment and tools
  7. 7. Plus Prep, Power BI, PowerPoint, Word, Python, R, Pentaho, Firefox, Chrome, Sublime Text Secure data processing environment with team and shared data prep areas GDPR User Stories, Sprint Goals, Data Sources Backlog, In Progress, Blocked, Done Lab environment and tools used
  8. 8. The Analytics Labs curriculum focuses on 2 of our 5 competencies: • Participating in agile development • Visualising data • Transforming data • Digital collaboration • Understanding policy and the data landscape Curriculum
  9. 9. Team:Conquest ResearchAnalytics Theme: Evolve Downstream effects of research funding Theme: Reproducibility Bias in experimental research
  10. 10. REPRODUCIBILITY
  11. 11. Reproducibility and transparency of published preclinical research involving animal models of human disease is alarmingly low
  12. 12. Aim Understand where improvements might be made Provide a tool to allow service users to JOURNALFUNDERINSTITUTION Evaluate the current state of published preclinical research and explore what initiatives by the scientific community might have an impact on this 1. Benchmark to other users 2. See where targets for improvement might be set 3.Track this progress Focused on from the perspective of
  13. 13. How do we measure reproducibility? Threats to reproducibility are thought to include Blinded assessment of outcome in these animals Compliance with animal welfare regulations Performance of a sample size calculation Potential conflicts of interest Random allocation of animals to group Lack of scientific rigour Low statistical power Questionable research integrity Evaluated the reporting of 5 key quality measures
  14. 14. The world’s largest collection of open access full texts, containing aggregated content of all research disciplines CONCORDAT ON OPENNESS ON ANIMAL RESEARCH INTHE UK TOP GUIDELINES FOR JOURNALS proxy for relative importance of journal within its field support researchers and organisations to further good practice and promote integrity and high ethical standards in research Animal Model Studies Text Mining Using Machine Learning Data Sources promote Open Research Culture, and alignment of scientific ideals with practices provides the ‘full economic cost’ of activities including how much they spend on research intended to improve the reporting of research using animals encourages organisations to be clear about their use of animals in research and enhance their communications
  15. 15. Data Source
  16. 16. More information on Machine learning carried out by the Edinburgh team: This is an algorithm that was developed by James Thomas at UCL and works by starting with a dataset of studies and classing a subset of these manually, then feeding it to the machine so it can use it as a training set in order to “learn” by identifying patterns between the data and your manual decisions (i.e. whether a study should be included because it reports on an animal model of human disease, or it should be excluded from the dataset because it doesn’t report anything of relevance). The more you class manually and feed into the algorithm the more the machine will be able to detect patterns and its performance of being able to do what you are doing as a human, should go up. Obviously this method is not 100% as there is a lot of noise in there, but it can be a very good tool especially when you have thousands of papers to screen, which would otherwise take months and even years in some cases to be performed by a user (made even worse by the fact that the gold standard is for two independent people to screen and then a third to screen disagreements), so this method not only saves time but also is a good method to use when resources are limited. More technical information about the algorithm: The algorithm uses a bag-of-words model for feature selection and support vector machine with stochastic gradient descent for text classification to filter out animal publications. More on this: https://www.biorxiv.org/content/10.1101/280131v1 Performance of machine learning algorithm for selecting our animal studies: Sensitivity 95.5%, Specificity 83.5% and Accuracy 84.7%
  17. 17. More information on Text Mining used by Edinburgh based team: “Text mining is a method used to explore and analyse large amounts of unstructured text to identify concepts/patterns/keywords/phrases in the data. The team used regular expressions, which are essentially a string of rules that tells the computer what conditions of word combinations to use when searching a piece of text. It’s fairly simple in the sense I tell the computer to find me the expression “animals randomly allocated to group” and if it does, class this as the publication having reported random allocation of animals to group. It’s slightly more sophisticated in the sense that when this statement is preceded by “not” for example, the computer should not class this as a match. Unfortunately these are still a work in progress and like the machine learning are not 100%, but again reading these publications manually and classing them like this is an incredibly time-consuming process therefore automating this can be very useful and the fact that it’s not 100% doesn’t affect the overall conclusions that much. In fact, we have found that in some cases the computer identifies publications that should be classed as TRUE, but the human has falsely classified them as FALSE and therefore there is error in both directions.”
  18. 18. Tableau Visualisations and potential dashboards
  19. 19. Note: findings for illustrative purposes only due to small and example prototype research areas explored
  20. 20. Note: findings for illustrative purposes only due to small and example prototype research areas explored
  21. 21. Note: findings for illustrative purposes only due to small and example prototype research areas explored
  22. 22. Note: findings for illustrative purposes only due to small and example prototype research areas explored
  23. 23. Journal Anonymised .. . . . . . . Note: findings for illustrative purposes only due to small and example prototype research areas explored
  24. 24. Note: findings for illustrative purposes only due to small and example prototype research areas explored

×