Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science Meets Academia - What Comes Next?

179 views

Published on

Presentation given at the University of Rhode Island, September 17, 2018

Published in: Education
  • Be the first to comment

  • Be the first to like this

Data Science Meets Academia - What Comes Next?

  1. 1. Data Science Meets Academia – What Comes Next? Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 09/17/18 University of Rhode Island 1 @pebourne
  2. 2. Perspective • This is NOT a talk about my own research • It draws upon my work as Chief Data Officer for the NIH • It draws upon my 1.4 years in building a Data Science Institute at the University of Virginia • Excessive enthusiasm comes from my drinking my own Kool-aid 09/17/18 University of Rhode Island 2
  3. 3. Questions to address re data science • Is it a discipline in its own right? • What novel research has been accomplished? • What of the future, is it a passing fad? • How should data science activities be organized? • How best to train students in this fast-moving area? • What of the ethical, legal and policy consequences? 09/17/18 University of Rhode Island 3
  4. 4. Take home (hopefully) • Increased awareness of the value of data science to your institution • Some insights into what the funders are thinking • Some thoughts about how to build out research and educational activities in data science 09/17/18 University of Rhode Island 4
  5. 5. Lets start with a couple of definitions…. 09/17/18 University of Rhode Island 5
  6. 6. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… 09/17/18 University of Rhode Island 6 http://vadlo.com/cartoons.php?id=357
  7. 7. So what do I mean by big data/data science? • Use of the ever increasing amount of open, complex, diverse digital data • Finding ways to ask and then answer relevant questions by combining such diverse data sets • Arriving at statistically significant conclusions not otherwise obtainable • Sharing such findings in a useful way • Translating such findings into actions that improve the human condition 09/17/18 University of Rhode Island 7
  8. 8. Understanding what our Data Science Institute is about is best told with a simple story… The case of the trauma surgeon… 09/17/18 8 Addresses: What novel research is being done? University of Rhode Island
  9. 9. Cause • There are ~2.7 Zetabytes (2.7 x 106 PB) of digital data • Volume is doubling every two years • Sheer volume of open or accessible digital data e.g., wearable sensors, mandatory EHRs, social media • New tools e.g., Deep Artificial Neural Networks (DNNs) • New computing power e.g., GPUs 09/17/18 University of Rhode Island 9
  10. 10. Effect • Big data currently estimated as a $50bn business – could save $3.1tn • 50% growth in data/yr; 5% growth in IT expenditure • US 140,000- 190,000 unfilled deep data analytics jobs • UVA DSI has 600 applicants this year for 50 spots; MSDS/MBA highly sought University of Rhode Island 1009/17/18
  11. 11. Effect ++ • Big data currently estimated as a $50bn business – could save $3.1tn – private sector research • 50% growth in data/yr; 5% growth in IT expenditure - undervalued • US 140,000- 190,000 unfilled deep data analytics jobs – competition for skilled researchers high • DSI has 650 applicants this year for 65 spots; MSDS/MBA highly sought – large human capital University of Rhode Island 1109/17/18
  12. 12. Is it a discipline in its own right? I don’t think so .. I think it is an enabler… An enabler that could bring profound change… 09/17/18 University of Rhode Island 12
  13. 13. What of the future, is it a passing fad? 09/17/18 University of Rhode Island 13
  14. 14. What of the future? One view is the 6D’s 09/17/18 University of Rhode Island 14
  15. 15. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - photography 1509/17/18 University of Rhode Island
  16. 16. A call for making these data open • Mandates – NIH, NSF, Data Management Plans • Business models can be protected yet everyone benefits • It saves lives …. 09/17/18 University of Rhode Island 16
  17. 17. Community: Content Creation and Mental Health Training Utilize social media data to increase outreach Have VA Resources complement private health care Recommendation s Limit firearm possession based on mental health status 09/17/18 17University of Rhode Island
  18. 18. How to promote departmental/institutional openness? • Encourage persistent identifiers e.g., ORCID • Encourage preprints • Encourage Open Access (OA) • Recognize openness in hiring and P&T • Teach open scholarship • Promote institutional openness – repositories, wikimedian in residence • Support institutional open data governance 09/17/18 University of Rhode Island 18
  19. 19. How are the funders responding? 09/17/18 University of Rhode Island 19
  20. 20. NIH strategic plan for data • Support a Highly Efficient and Effective Biomedical Research Data Infrastructure • Promote Modernization of the Data-Resources Ecosystem • Support the Development and Dissemination of Advanced Data Management, Analytics, and Visualization Tools • Enhance Workforce Development for Biomedical Data Science • Enact Appropriate Policies to Promote Stewardship and Sustainability 09/17/18 University of Rhode Island 20 https://grants.nih.gov/grants/rfi/NIH-Strategic-Plan-for-Data-Science.pdf
  21. 21. Research data infrastructure … Both funders and some institutions see the need to move from pipes to platforms to accelerate research… 09/17/18 University of Rhode Island 21 https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model- 750x410.png
  22. 22. If platforms are the answer we could ask the question… Will biomedical research become more like Airbnb? 09/17/18 University of Rhode Island 22 Vivien Bonazzi Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  23. 23. I am not crazy, hear me out • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: – 60 million users searching 2 million listings in 192 countries – Average of 500,000 stays per night. – Evaluation of US $25bn 09/17/18 University of Rhode Island 23 Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818
  24. 24. Platforms will ultimately digitally integrate the scholarly workflow for human and machine analysis Should biomedical research be Like Airbnb? doi: 10.1371/journal.pbio.2001818 University of Rhode Island 2409/17/18
  25. 25. Paper Author Paper Reader Data Provider Data Consumer Employer Employee Reagent Provider Reagent Consumer Software Provider Software Consumer Grant Writer Grant Reviewer Supplier Consumer Platform MS Project Google Drive Coursera Researchgate Academia.edu Open Science Framework Synapse F1000 Rio Educator Student Pilot Open Data Lab (ODL) underway University of Rhode Island 2509/17/18
  26. 26. Why a comparison to Airbnb is not fair • Airbnb was born digital • The exchange of services on Airbnb are simple compared to what is required of a platform to support biomedical research Nevertheless there is much to be learnt 09/17/18 University of Rhode Island 26
  27. 27. Impediments to academic platforms • Current work practices by all stakeholders • Entrenched business models • Size of the undertaking aka resources needed • Trust • Incentives to use the platform http://www.forbes.com/sites/johnhall/2013/04/29/1 0-barriers-to-employee-innovation/#8bdbaa811133 09/17/18 University of Rhode Island 27
  28. 28. Such platforms combined with emerging analytics will likely have significant impact on academia 09/17/18 University of Rhode Island 28
  29. 29. What are some possible outcomes? • Enhanced student experience - combining student performance with student health data • Leveraging research assets- mining grants – both successes and failures • Sports analytics 09/17/18 University of Rhode Island 29
  30. 30. How should academic institutions think about exploiting data science? 09/17/18 University of Rhode Island 30
  31. 31. Independent yet integrated … 09/17/18 University of Rhode Island 31
  32. 32. Organization: core data science verticals University of Rhode Island 32 Data Integration & Engineering Machine Learning & Analytics Visualization & Dissemination Data Acquisition Ethics, Law, Policy, Social Implications 09/17/18
  33. 33. Organization: interdisciplinary horizontals University of Rhode Island 33 Data Integration & Engineering Machine Learning & Analytics Visualization Data Acquisition & Dissemination Ethics, Law, Policy, Social Implications Disciplines 09/17/18
  34. 34. Elements of a data science education program (as per UVA) • Start with an MSDS – Emphasize practical training, ethics – Systems orientation moving to more applied • Start a PhD program • Work across campus to establish minor & certificate in DS • Go on-line 09/17/18 University of Rhode Island 34
  35. 35. 09/17/18 University of Rhode Island 35 http://cartertoons.com/
  36. 36. Ethics, Law, Policy & Social Implications • Data sharing • Privacy • Normativity University of Rhode Island 36gDOC Wendy Novicoff, Ph.D 09/17/18
  37. 37. Conclusion: Driven by large amounts of open digital data of different types and new algorithms and approaches, academia is destined to follow the private sector towards the fourth paradigm 09/17/18 University of Rhode Island 37
  38. 38. Acknowledgements 09/17/18 University of Rhode Island 38 The BD2K Team at NIH My Colleagues at UVA The 150 folks who have passed through my laboratory https://docs.google.com/spreadsheets/d/1QZ48UaKcwDl_iFCvBmJsT03FK-bMchdfuIHe9Oxc-rw/edit#gid=0
  39. 39. Thank You peb6a@virginia.edu 3909/17/18 University of Rhode Island
  40. 40. How much biomedical data? • Big Data – Total data from NIH-funded research in 2016 estimated at 650 PB* – 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB in 2016 • Dark Data – Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost – 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 09/17/18 University of Rhode Island 40
  41. 41. Machine learning has been around for over 20 years – why now? • Amount of data available for training • Open source - R and python • Advances in computing (e.g., GPU’s) allow for deeper neural nets (deep learning) • Algorithmic efficiency gains (e.g., in back propagation) • Success promotes further research • Commercialization 09/17/18 University of Rhode Island 41 Pastur-Romay et al. 2016 doi:10.3390/ijms17081313
  42. 42. University of Rhode Island 42 At DeepMind, which is based in London, AlphaGo Zero is working out how proteins fold, a massive scientific challenge that could give drug discovery a sorely needed shot in the arm. 09/17/18

×