SlideShare a Scribd company logo
“Provenance and Social Science Data”
15 March 2017
Documenting DataTransformations
George Alter, University of Michigan
• Data are useless without Metadata – “data
about data”
• Metadata should:
– Include all information about data creation
– Describe transformations to variables
– Be easy to create
• Our goal: Automated capture of metadata
Why Metadata?
A few words about ICPSR
• World’s largest
archive of social
science data
• Consortium
established 1962
• 760+ member
institutions around
the world
• Founding member
and home office for
the DDI Alliance
Powered by DDI Metadata
ICPSR is building search
tools based upon Data
Documentation Initiative
(DDI) XML
Codebooks (pdf and
online) are rendered from
the DDI.
Searchable database of
4.5M variables
Click here for
online
codebook
Online codebook shows
variable in context of
dataset
Link to online
crosstab tool
What question
was asked?
How was the
question coded?Link to online
graph tool
Searchable database of
4.5M variables
Click here for
variable
comparison
Variable comparison
display
Click here for
online
codebook
Search for datasets with
3 desired variables
Check boxes
for variable
comparison
Crosswalk for American National Election
Study (ANES) and General Social Survey
(GSS)
Columns link to
70 datasets
134 tags in
8 lists
Variable
comparison
display
Variables linked to
online codebooks
Metadata for the American National Election Study
What question
was asked?
Who answered
this question?
How was the
question coded?
Who answered
this question?
Metadata for the American National Election Study
Who answered
this question?
Who answered
this question?
How do we know who
answered the question?
It’s in the pdf.
When data arrive at the
archive…
• No question text
• No interview flow (question order, skip pattern)
• No variable provenance
• Data transformations are not documented.
How is research data created?
• Most surveys are conducted with computer
assisted interview software (CAI)
– CATI – Computer-assisted Telephone Interview
– CAPI – Computer-assisted Personal Interview
– CAWI – Computer Aided Web Interview
• There is no paper questionnaire
• The CAI program is the questionnaire
– i.e. the program is the metadata
Original
data
DDI XML
Original
metadata
CAI
CAI
to
DDI
Convert to
DDI:
Collectica
MQDS
others
Computer
Assisted
Interviewing
We already have tools to
convert CAI to machine-
readable metadata.
SPSS
SAS
Stata
R
Command
scripts:
Original
data
DDI XML
Original
metadata
Revised
data
SPSS
SAS
Stata
R
CAI
CAI
to
DDI
Statistical
Packages
Convert to
DDI:
Collectica
MQDS
others
Computer
Assisted
Interviewing
What happens when a
project modifies the data.
The modified
data no longer
match the
metadata.
SPSS
SAS
Stata
R
Command
scripts:
Original
data
DDI XML
Original
metadata
Revised
data
SPSS
SAS
Stata
R
SPSSSAS
Stata
R
CAI
CAI
to
DDI
Statistical
Packages
Convert to
DDI:
Collectica
MQDS
others
Computer
Assisted
Interviewing
Stat
Package
to
DDI
DDI
XML
Extracted
metadata
Extract
metadata
from
SPSS/SAS/S
tata/R
Data file
Metadata are re-
created after the
data are
transformed.
Transformations
are documented
by hand
Statistics packages have limited
metadata
• Variable names
• Variable labels
• Value labels
• No provenance
SDTL XML
Updater
DDI XML
SPSS
SAS
Stata
R
Script
Parser
Command
scripts:
Original
data
Revised
metadata
DDI XML
Original
metadata
Revised
data
SPSS
SAS
Stata
R
CAI
CAI
to
DDI
Statistical
Packages
Standard
Data
Transformation
Language
Convert to
DDI:
Collectica
MQDS
others
Computer
Assisted
Interviewing
Automating the
capture of
transformation
metadata.
Missing links that we
will build.
What statistics packages should be
covered?
ICPSR Downloads by Format
All downloads
Studies with all
formats
Delimited text 43% 29%
SPSS 22% 24%
SAS 10% 12%
Stata 19% 23%
R 5% 12%
Excel 0% 1%
Other 0% 0%
100% 100%
Number 378,007 154,663
Input Data Output Data
SPSS
MISSING VALUES X(-1).
IF (X > 3) Y=9.
IF (X < 3) Z=8.
X
2
3
4
-1
Stata
replace X=. if X==-1
generate Y=9 if X>3
generate Z=8 if X<3
X
2
3
4
-1
SAS
if X=-1 then X=.;
if X>3 then Y=9;
if X<3 then Z=8;
X
2
3
4
-1
Why do we need an SDTL?
Input Data Output Data
SPSS
MISSING VALUES X(-1).
IF (X > 3) Y=9.
IF (X < 3) Z=8.
X X Y Z
2 2 8
3 3
4 4 9
-1 -1
Stata
replace X=. if X==-1
generate Y=9 if X>3
generate Z=8 if X<3
X X Y Z
2 2 8
3 3
4 4 9
-1 9
SAS
if X=-1 then X=.;
if X>3 then Y=9;
if X<3 then Z=8;
X X Y Z
2 2 . 8
3 3 . .
4 4 9 .
-1 . . 8
Why do we need an SDTL?
What happens when a missing value is
in a logical comparison?
• SPSS
– Logical expressions including a missing value are
considered “Missing.” Usually, “Missing” is equivalent to
“False.”
• Stata
– Missing values are treated as numbers equal to infinity.
So, any number is less than a missing value.
• SAS
– Missing values are treated as numbers equal to minus
infinity. So, any number is greater than a missing value.
Input Data Output Data
SPSS
MISSING VALUES X(-1).
IF (X > 3) Y=9.
IF (X < 3) Z=8.
X X Y Z
2 2 8
3 3
4 4 9
-1 NULL
Stata
replace X=. if X==-1
generate Y=9 if X>3
generate Z=8 if X<3
X X Y Z
2 2 8
3 3
4 4 9
-1 ∞ 9
SAS
if X=-1 then X=.;
if X>3 then Y=9;
if X<3 then Z=8;
X X Y Z
2 2 . 8
3 3 . .
4 4 9 .
-1 -∞ . 8
Missing Values in Comparisons
Benefits of automated metadata
capture
• Metadata will be better
– All the information in the CAI can be included.
– Variable transformations can be described
• Automation will lower costs
– Metadata will not be discarded and re-created
• All metadata will be standardized and machine
readable
– Codebooks with rich information can be rendered at will
• If we make it easy and beneficial, researchers will
use it.
Continuous Capture of Metadata for
Statistical Data
(NSF ACI-1640575)
Project Partners
•Inter-university Consortium for Political and Social
Research (ICPSR), University of Michigan
•Colectica
•Metadata Technology North America
•Norwegian Centre for Research Data
•General Social Survey, NORC, University of Chicago
•American National Election Study, University of
Michigan
Questions?
George Alter
altergc@umich.edu

More Related Content

What's hot

Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refine
Martin Magdinier
 
Connected data meetup group - introduction & scope
Connected data meetup group - introduction & scopeConnected data meetup group - introduction & scope
Connected data meetup group - introduction & scope
Connected Data World
 
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
Big Data Value Association
 
Evolution of big data
Evolution of big dataEvolution of big data
Evolution of big data
ShilpaKrishna6
 
Introduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheetsIntroduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheets
Anders Pedersen
 
The Power of Machine Learning and Graphs
The Power of Machine Learning and GraphsThe Power of Machine Learning and Graphs
The Power of Machine Learning and Graphs
Franz Inc. - AllegroGraph
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Cambridge Semantics
 
Relational data model in Cassandra: Will it fit?
Relational data model in Cassandra: Will it fit?Relational data model in Cassandra: Will it fit?
Relational data model in Cassandra: Will it fit?
Matija Gobec
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
markgrover
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
Ontotext
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
San Diego Supercomputer Center
 
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
semanticsconference
 
Towards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsTowards Visualization Recommendation Systems
Towards Visualization Recommendation Systems
Aditya Parameswaran
 
NO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudNO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloud
Manu Cohen-Yashar
 
Job Data Analysis Reveals Key Skills Required for Data Scientists
Job Data Analysis Reveals Key Skills Required for Data ScientistsJob Data Analysis Reveals Key Skills Required for Data Scientists
Job Data Analysis Reveals Key Skills Required for Data Scientists
JobsPikr
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadata
markgrover
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
TamikaTannis
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
Dataiku
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
Connected Data World
 

What's hot (20)

Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refine
 
Connected data meetup group - introduction & scope
Connected data meetup group - introduction & scopeConnected data meetup group - introduction & scope
Connected data meetup group - introduction & scope
 
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Evolution of big data
Evolution of big dataEvolution of big data
Evolution of big data
 
Introduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheetsIntroduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheets
 
The Power of Machine Learning and Graphs
The Power of Machine Learning and GraphsThe Power of Machine Learning and Graphs
The Power of Machine Learning and Graphs
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Relational data model in Cassandra: Will it fit?
Relational data model in Cassandra: Will it fit?Relational data model in Cassandra: Will it fit?
Relational data model in Cassandra: Will it fit?
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
 
Towards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsTowards Visualization Recommendation Systems
Towards Visualization Recommendation Systems
 
NO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudNO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloud
 
Job Data Analysis Reveals Key Skills Required for Data Scientists
Job Data Analysis Reveals Key Skills Required for Data ScientistsJob Data Analysis Reveals Key Skills Required for Data Scientists
Job Data Analysis Reveals Key Skills Required for Data Scientists
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadata
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
 

Viewers also liked

Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
ARDC
 
Provenance and social science data Nicholas Car - Intro to PROV
Provenance and social science data   Nicholas Car - Intro to PROVProvenance and social science data   Nicholas Car - Intro to PROV
Provenance and social science data Nicholas Car - Intro to PROV
ARDC
 
Kate Hudson, Harper's Bazaar UK Cover
Kate Hudson, Harper's Bazaar UK CoverKate Hudson, Harper's Bazaar UK Cover
Kate Hudson, Harper's Bazaar UK Cover
Strawberry Saroyan
 
Simetria respecto a un eje
Simetria respecto a un ejeSimetria respecto a un eje
Simetria respecto a un eje
Elizabeth Sánchez Sánchez
 
CARACTERISTICAS DE BLOG Y WIKI
CARACTERISTICAS DE BLOG Y WIKICARACTERISTICAS DE BLOG Y WIKI
CARACTERISTICAS DE BLOG Y WIKI
Mary Chevez
 
Sydney fc official merchandise
Sydney fc official merchandiseSydney fc official merchandise
Sydney fc official merchandise
Shah Mohammad Robel
 
Maestria tarea
Maestria tareaMaestria tarea
Viral video marketing
Viral video marketingViral video marketing
Viral video marketing
Jhon Duk
 
Brussels Capital of Data Science
Brussels Capital of Data ScienceBrussels Capital of Data Science
Brussels Capital of Data Science
DigitYser
 
H20 - Thirst for Machine Learning
H20 - Thirst for Machine LearningH20 - Thirst for Machine Learning
H20 - Thirst for Machine Learning
MeetupDataScienceRoma
 
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
IDEAS - Int'l Data Engineering and Science Association
 
Ux and data
Ux and dataUx and data
Ux and data
Vera Kovaleva
 
Telenor Connexion
Telenor Connexion Telenor Connexion
Telenor Connexion
Amazon Web Services
 
2017 ifma presentation pdf
2017 ifma presentation pdf2017 ifma presentation pdf
2017 ifma presentation pdf
Joe Pessa
 
3Com 3C96010C-AC
3Com 3C96010C-AC3Com 3C96010C-AC
3Com 3C96010C-AC
savomir
 
Healthchain. TFG Grado Ingeniería Informática.
Healthchain. TFG Grado Ingeniería Informática.Healthchain. TFG Grado Ingeniería Informática.
Healthchain. TFG Grado Ingeniería Informática.
María Teresa Nieto Galán
 

Viewers also liked (16)

Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
 
Provenance and social science data Nicholas Car - Intro to PROV
Provenance and social science data   Nicholas Car - Intro to PROVProvenance and social science data   Nicholas Car - Intro to PROV
Provenance and social science data Nicholas Car - Intro to PROV
 
Kate Hudson, Harper's Bazaar UK Cover
Kate Hudson, Harper's Bazaar UK CoverKate Hudson, Harper's Bazaar UK Cover
Kate Hudson, Harper's Bazaar UK Cover
 
Simetria respecto a un eje
Simetria respecto a un ejeSimetria respecto a un eje
Simetria respecto a un eje
 
CARACTERISTICAS DE BLOG Y WIKI
CARACTERISTICAS DE BLOG Y WIKICARACTERISTICAS DE BLOG Y WIKI
CARACTERISTICAS DE BLOG Y WIKI
 
Sydney fc official merchandise
Sydney fc official merchandiseSydney fc official merchandise
Sydney fc official merchandise
 
Maestria tarea
Maestria tareaMaestria tarea
Maestria tarea
 
Viral video marketing
Viral video marketingViral video marketing
Viral video marketing
 
Brussels Capital of Data Science
Brussels Capital of Data ScienceBrussels Capital of Data Science
Brussels Capital of Data Science
 
H20 - Thirst for Machine Learning
H20 - Thirst for Machine LearningH20 - Thirst for Machine Learning
H20 - Thirst for Machine Learning
 
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
 
Ux and data
Ux and dataUx and data
Ux and data
 
Telenor Connexion
Telenor Connexion Telenor Connexion
Telenor Connexion
 
2017 ifma presentation pdf
2017 ifma presentation pdf2017 ifma presentation pdf
2017 ifma presentation pdf
 
3Com 3C96010C-AC
3Com 3C96010C-AC3Com 3C96010C-AC
3Com 3C96010C-AC
 
Healthchain. TFG Grado Ingeniería Informática.
Healthchain. TFG Grado Ingeniería Informática.Healthchain. TFG Grado Ingeniería Informática.
Healthchain. TFG Grado Ingeniería Informática.
 

Similar to Documenting Data Transformations

Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
Linked Enterprise Date Services
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
Mahmoud Alfarra
 
Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014
Jonathan Woodward
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
Caserta
 
Database
DatabaseDatabase
Database
sumit621
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
Sushil Kulkarni
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
Eric Kavanagh
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
Domino Data Lab
 
Learning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar CastanedaLearning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar Castaneda
Databricks
 
The BI Sandbox
The BI SandboxThe BI Sandbox
The BI Sandbox
Craig Jordan
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...
Jonathan Challener
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
Institute of Contemporary Sciences
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
LizLavaveshkul
 
GraphDB
GraphDBGraphDB
Data science tips for data engineers
Data science tips for data engineersData science tips for data engineers
Data science tips for data engineers
IBM Analytics
 
Database
DatabaseDatabase
Database
wwaqas2007
 
Database
DatabaseDatabase
Database
Vaibhav Bajaj
 

Similar to Documenting Data Transformations (20)

Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Database
DatabaseDatabase
Database
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
 
Learning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar CastanedaLearning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar Castaneda
 
The BI Sandbox
The BI SandboxThe BI Sandbox
The BI Sandbox
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
 
GraphDB
GraphDBGraphDB
GraphDB
 
Data science tips for data engineers
Data science tips for data engineersData science tips for data engineers
Data science tips for data engineers
 
Database
DatabaseDatabase
Database
 
Database
DatabaseDatabase
Database
 

More from ARDC

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
ARDC
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
ARDC
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation
ARDC
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)
ARDC
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspective
ARDC
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domain
ARDC
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
ARDC
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharing
ARDC
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
ARDC
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scope
ARDC
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
ARDC
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128
ARDC
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical data
ARDC
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
ARDC
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and Challenges
ARDC
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
ARDC
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
ARDC
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of data
ARDC
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018
ARDC
 

More from ARDC (20)

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspective
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domain
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharing
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scope
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical data
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and Challenges
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of data
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018
 

Recently uploaded

REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 

Recently uploaded (20)

REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 

Documenting Data Transformations