Louise Corti Data scientists

545 views
441 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
545
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Louise Corti Data scientists

  1. 1. ARE DATA CURATORS EVER DATA SCIENTISTS? ………………………………………………………………………………………................................................................................................ LOUISE CORTI …………………………………………. ASSOCIATE DIERCTOR UK DATA ARCHIVE UNIVERSITY OF ESSEX …………………………………….…… Online Information Conference London, 20 NOVEMBER 2013
  2. 2. …………………………………………………………………………………………………………………………..…….…………………………….. • “A high ranking professional with the training and curiosity to make discoveries in the world of big data” • Exploiting the opportunities of big data, open data and linked data • Possessing skills to manipulate the data and extract insightful patterns…in multiple petabytes of data (1015) …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE
  3. 3. UK DATA ARCHIVE: RELATIVELY SMALL DATA …………………………………………………………………………………………………………………………..…….…………………………….. • An easy-to-use, innovative and trusted one-stop-shop for users and suppliers of social science data resources: ESRC UK Data Service • Data for secondary analysis, research, policy making and teaching and learning • Wide range of data: • Government and academic surveys • International aggregate data banks • Qualitative data …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE
  4. 4. SCALE AND VOLUME …………………………………………………………………………………………………………………………..…….…………………………….. • • • • • • @ 6,336 studies 1,632 GB 1,255,814 files 113,832 directories Av. file size 1.29 MBytes Grows by 120 GB per year …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE
  5. 5. OUR WORKFLOW AND DATA SKILLS …………………………………………………………………………………………………………………………..…….…………………………….. DATA APPRAISAL DATA LICENSING DATA ANALYSIS User support For Secondary Analysis Pre-Ingest Controlling Access DM TRAINING DATA DESCRIPTION DATA HANDLING & QA DATA DISCLOSURE ANALYSIS DATA TRANSFORMATION …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE
  6. 6. DATA SKILLS AND DOMAIN EXPERTISE …………………………………………………………………………………………………………………………..…….…………………………….. • • • • Roles discreet No one person does them Require data skills/ domain expertise Most who interact with data have post grad qualifications in social science • In our organization, ‘data scientist’ role appropriate to those who: • transform data - use scripting to help, analyze integrity • manipulate, link and merge data sources - harmonized and added value products, e.g. European variables on education, or historic census data …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE
  7. 7. WHO ARE OUR ‘DATA SCIENTISTS?’ …………………………………………………………………………………………………………………………..…….…………………………….. • Only ever one of them at any one time • They are definitely data ‘geeks ‘and are proud of that term • They are highly intelligent social science postgrad /ECRs with experience in analyzing socio-demographic data • They are database/ programming-curious and have picked up these skills along the way. Extract-Transform-Load skills • They have picked up curation skills within the organization, e.g. metadata and preservation requirements • Work with others outside our organization …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE
  8. 8. AM I A DATA SCIENTIST? …………………………………………………………………………………………………………………………..…….…………………………….. • Currently, I probably am not • Undergrad chemist - experimental data, mass spectrometry readings, but pre-digital days • Postgrad and ECR social scientist – designed and analyzed large-scale national survey data • Apply my own research and methods skills: • data selection and appraisal, • data documentation and metadata • user delivery, support and promotion • I have picked up curation skills along the way • My preferred pathway for a Data Professional …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE
  9. 9. WE REALLY NEED THESE SKILLS …………………………………………………………………………………………………………………………..…….…………………………….. • For social science, much larger data are on their way • Linking, scaling up, real time feeds, data mining • The Big Data Family is born - David Willetts MP announces the ESRC Big Data Network • Recent £64 million investment • Phase 1 : Administrative Data Research Network • Phase 2: Business and Local Government Data Research Centres • Phase 3: Third Sector and Social Media Data • Data scientists are critical to this programme …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE
  10. 10. ‘DATA SCIENCE’ TRAINING – DOMAIN ISSUE? …………………………………………………………………………………………………………………………..…….…………………………….. • Increased data management / preservation training in academia • We train a wide range – HE staff, students, research support staff • Why? More/better data, more users • Other social science data archives/ data libraries starting to provide this To draw great insights from data, you have to know the data, know the business, and know the contextual relationships that are built into the business (Pierson, Smart Data Collective) …………………………………………………………………………………………………………………………………….………………………..… UK DATA ARCHIVE

×