The document discusses whether data curators can also be considered data scientists. It describes the work of the UK Data Archive, which curates relatively small social science data for secondary analysis. While the roles performed require data skills and domain expertise, most staff who interact with data have postgraduate qualifications in social science. The document identifies those who transform data through scripting, manipulate and merge multiple data sources as taking on more data scientist roles. Currently, the author does not consider themselves a data scientist but sees value in picking up these skills. It concludes that as larger linked data becomes available, data scientists with both technical skills and domain knowledge will be critical to maximize insights from new sources of social science data.
Unblocking The Main Thread Solving ANRs and Frozen Frames
Louise Corti Data scientists
1. ARE DATA CURATORS EVER DATA SCIENTISTS?
………………………………………………………………………………………................................................................................................
LOUISE
CORTI
………………………………………….
ASSOCIATE DIERCTOR
UK DATA ARCHIVE
UNIVERSITY OF ESSEX
…………………………………….……
Online Information Conference
London, 20 NOVEMBER 2013
2. …………………………………………………………………………………………………………………………..…….……………………………..
• “A high ranking professional with the training and
curiosity to make discoveries in the world of big data”
• Exploiting the opportunities of big data, open data
and linked data
• Possessing skills to manipulate the data and extract
insightful patterns…in multiple petabytes of data
(1015)
…………………………………………………………………………………………………………………………………….………………………..…
UK DATA ARCHIVE
3. UK DATA ARCHIVE: RELATIVELY SMALL DATA
…………………………………………………………………………………………………………………………..…….……………………………..
• An easy-to-use, innovative and trusted one-stop-shop
for users and suppliers of social science data
resources: ESRC UK Data Service
• Data for secondary analysis, research, policy making
and teaching and learning
• Wide range of data:
• Government and academic surveys
• International aggregate data banks
• Qualitative data
…………………………………………………………………………………………………………………………………….………………………..…
UK DATA ARCHIVE
5. OUR WORKFLOW AND DATA SKILLS
…………………………………………………………………………………………………………………………..…….……………………………..
DATA APPRAISAL
DATA LICENSING
DATA ANALYSIS
User support
For Secondary
Analysis
Pre-Ingest
Controlling
Access
DM TRAINING
DATA DESCRIPTION
DATA HANDLING & QA
DATA DISCLOSURE
ANALYSIS
DATA TRANSFORMATION
…………………………………………………………………………………………………………………………………….………………………..…
UK DATA ARCHIVE
6. DATA SKILLS AND DOMAIN EXPERTISE
…………………………………………………………………………………………………………………………..…….……………………………..
•
•
•
•
Roles discreet
No one person does them
Require data skills/ domain expertise
Most who interact with data have post grad
qualifications in social science
• In our organization, ‘data scientist’ role appropriate to
those who:
• transform data - use scripting to help, analyze integrity
• manipulate, link and merge data sources - harmonized
and added value products, e.g. European variables on
education, or historic census data
…………………………………………………………………………………………………………………………………….………………………..…
UK DATA ARCHIVE
7. WHO ARE OUR ‘DATA SCIENTISTS?’
…………………………………………………………………………………………………………………………..…….……………………………..
• Only ever one of them at any one time
• They are definitely data ‘geeks ‘and are proud of that term
• They are highly intelligent social science postgrad /ECRs
with experience in analyzing socio-demographic data
• They are database/ programming-curious and have picked
up these skills along the way. Extract-Transform-Load skills
• They have picked up curation skills within the organization,
e.g. metadata and preservation requirements
• Work with others outside our organization
…………………………………………………………………………………………………………………………………….………………………..…
UK DATA ARCHIVE
8. AM I A DATA SCIENTIST?
…………………………………………………………………………………………………………………………..…….……………………………..
• Currently, I probably am not
• Undergrad chemist - experimental data, mass spectrometry
readings, but pre-digital days
• Postgrad and ECR social scientist – designed and
analyzed large-scale national survey data
• Apply my own research and methods skills:
• data selection and appraisal,
• data documentation and metadata
• user delivery, support and promotion
• I have picked up curation skills along the way
• My preferred pathway for a Data Professional
…………………………………………………………………………………………………………………………………….………………………..…
UK DATA ARCHIVE
9. WE REALLY NEED THESE SKILLS
…………………………………………………………………………………………………………………………..…….……………………………..
• For social science, much larger data are on their way
• Linking, scaling up, real time feeds, data mining
• The Big Data Family is born - David Willetts MP
announces the ESRC Big Data Network
• Recent £64 million investment
• Phase 1 : Administrative Data Research Network
• Phase 2: Business and Local Government Data Research
Centres
• Phase 3: Third Sector and Social Media Data
• Data scientists are critical to this programme
…………………………………………………………………………………………………………………………………….………………………..…
UK DATA ARCHIVE
10. ‘DATA SCIENCE’ TRAINING – DOMAIN ISSUE?
…………………………………………………………………………………………………………………………..…….……………………………..
• Increased data management /
preservation training in academia
• We train a wide range – HE staff,
students, research support staff
• Why? More/better data, more users
• Other social science data archives/
data libraries starting to provide this
To draw great insights from
data, you have to know the
data, know the business, and
know the contextual
relationships that are built into
the business (Pierson, Smart
Data Collective)
…………………………………………………………………………………………………………………………………….………………………..…
UK DATA ARCHIVE