SlideShare a Scribd company logo
CLIO INFRA
Data analysis in Dataverse & visualization
of datasets on historical maps
Dataverse Community meeting 2015,
June 11, Harvard University
Vyacheslav Tykhonov Richard Zijdeman Jerry de Vries
International Institute of Social History
Introduction
The International Institute of Social History (IISH) collects information in the field of social
history and makes it available to the public. The IISH is one of the major scientific heritage
institutions in the Netherlands. In the field of socio-economic historical research IISH plays a
prominent role.
CLIO INFRA is Digital Research Infrastructure for the Arts and Humanities developed to
integrate a number of databases (hubs) consisting of data on global social, economic and
institutional indicators over the past five centuries, with special attention for the past 200
years. Clio Infra developed by IISH.
Our mission
Clio Infra is the bridge between Data Collections and Research Datasets.
“Old school” researchers time:
Collecting datasets in CSV, Access or Excel and other formats mostly on his own without real collaboration
between all researchers in the same field. Very closed model.
Digital Humanities era:
Young researchers use various computer tools to collect, analyze and combine research datasets they’ve
got from older generation of researchers. They can contribute and help each other and build communities.
The future:
The goal is to build tools to get datasets from “old school” researchers, standardize and store data in the
Clouds in order provide access to Open Data for researchers all over the world and made possible
collaboration between them.
Clio Infra Collaboration platform
Clio Infra functionality based on the Dataverse solution:
- teams collaboratively can curate, share and analyze research datasets
- teams members can share the responsibility to collect data on specific variables (for
example, countries) and inform each other about changes and additions
- dataset version control system is able to track changes in datasets
- other researchers can download their own copy of the data if dataset is published as Open
Data
Dataverse is flexible metadata store (repository) that connected with Research datasets
storage by our Data Processing Engine (DPE)
Added Value for future Researchers
The benefits of data sharing can be classified in terms of Metadata and Data access and
sharing (Collection tools) and Statistical Analysis and Data Mining (Research tools):
● access to a specific case study, citing and finding data
● access to the universe of data from Dataverse network that can organize and display
them for browsing and searching
● data filtering: researchers with proper authorization can obtain the subset of data
provided by data collector
● data analysis to run descriptive statistics and graphics, visualization, plotting on
historical maps
● Data APIs to export data for further analysis by popular statistical packages (STATA,
SPSS, R, iPython Notebook) and advanced data mining tools that will be developed in
the future (always up-to-date solution)
Collaboration possibilities
Descriptive metadata:
- dataset file
- documentation
- standardized tables with codes can be part of dataset
Sharing and collaboration capabilities:
- requesting unique API token by every researcher - user of Dataverse
- exchanging API token between researchers and granting permissions to work on the
same datasets as a team using data analysis tools
Collaboration Data Workflow
- Draft Datasets are visible only for owners
- with API tokens researchers can get access to interactive dashboard to get some
insights about the data stored in the Dataverse
- every dataset converted to dataframe
- dashboard can provide access to all variables from the dataframe and visualize them
on charts, graphs, historical maps and treemaps
After dataset is prepared to go public, it can be published in Collabs:
- guest users can download the copy of dataset
- team members with permissions and authorized API token can contribute to dataset
Data Analysis in Clio Infra
● Data Processing Engine has python core and developed for nlgis.nl (Netherlands
Geographic Information System) and distributed as widget
● interactive data exploration dashboard based on D3 library
● every dataset from Dataverse is available as Data API (json) and can be connected to
any statistical package
● filtering the data on specific years and variables based on dataframes (pandas, python
for data analysis)
● data quality check is the quick visual tool to apply Benford’s law for all values from
specific dataset
● dataframes can be open by researchers in various statistical packages like iPython
notebook, R Studio, SPSS, STATA, Mathlab, etc
Data Processing Engine (DPE) specification
● can split values from any dataset in number of categories specified by researcher (8 by
default)
● algorithm to categorize data values in proper categories can be selected manually
(percentile by default)
● can define maximum possible categories for specific dataset if there is no way to get
categories number specified by user of the system (for example, if there are 2-3
categories of data values)
● data ranges should be defined to get possibility to visualize data on some chart or map
in the right scale
● colors can be specified by user (Color Brewing, see http://colorbrewer2.org)
● legend generated and attached to all visualizations automatically
● values with missing data shown as 'no data' regions on map
● all data values delivered by Data API to make the data analysis platform independent
and communicate with other systems or statistical packages
API Service (Data API)
Data API provided by Data Processing Engine is the most important functionality for the well
equipped digital infrastructure:
● easy way to analyze data in popular statistical packages (STATA, SPSS, Excel)
● use common data science programming languages like Python, R to perform more
advanced research using external Data Science libraries
● analyze data with toolboxes like Wolfram|Alpha and other Discovery Platforms (added
value for the future)
● suitable for other researchers and developers to use advanced technique and data
mining tools that aren’t developed yet
Example of output from Data API
● every dataset ingested by DPE available as Data API with unique handler
● API can be filtered by variables extracted from the content of data file
Example:
/api/data?&handle=F16UDU:30:31&countrycode=USA&year=1880&categories=8&datarange=calculate
"United States of America": {
"code": "F16UDU30_31",
"color": "#FF7F00",
"countrycode": "USA",
"id": "2085",
"indicator": "Total Urban Population",
"intcode": "840",
"r": 923.99,
"region": "W. Offshoots",
"units": "x 1000",
"value": 14264.0,
"year": 1880
}
}
Data visualization and plotting data on historical maps
● Data Processing Engine (DPE) is the core of data visualization process and connected
to geoservice
● data attributes like scales and colors calculated by DPE on the fly based on the input of
researcher (for example, number of categories to split data) and the part of Data API
● histograms, cross tabulations, enhanced descriptive statistics based on pandas
dataframes
● visualization of datasets on historical maps will be available to plot data on maps for
last 500 years
Historical maps services: Geoservice and Geocoder
Delivered by Webmapper.nl and integrated with common Clio Infra infrastructure.
Geoservice basic requirements:
- should provide actual GeoAPI with historical polygons for all countries and regions
based on standardized codes
- available as geojson/topojson for online visualization
- QGIS toolbox should be supported to upload new maps and update old maps as shape
files
Geocoder should be able to standardize all geographical locations in different datasets:
- USSR and Soviet Union should be recognized as one country with the same PID
- Germany before and after 1990
- Indonesia before and after 1999
GeoAPI example
Geoservice can provide polygons for specific years on the national or world level rendered
as topojson or geojson.
GeoAPI:
/api/maps?world=on&year=1962
Polygons for all countries will be delivered as topojson:
arcs":[[1782,2186]]}]}},"arcs":[[[8387,6231],[0,5],[1,1],[1,-1],[2,0],[2,-1],[3,-4],[1,
-3],[0,-1],[-1,-5],[0,-1],[-1,2],[0,1],[-2,2],[-3,1],[-1,0],[-1,0],[-1,3],[0,1]],
[[8390,6247],[1,1],[0,1],[2,1],[1,0],[1,-2],[-1,-5],[-1,0],[-1,1],[-1,1],[-1,1],[0,1]],
[[8391,6204],[0,2],[-1,1],[-1,-1],[0,1],[0,1],[1,3],[1,0],[2,-6],[0,-1],[0,-1],[0,-1],
[-1,1],[-1,1]],[[8364,6093],[0,2],[2,5],[1,0],[1,-2],[-1,-6],[-1,-3],[-1,0],[-1,0],[0,1],
[0,3]],[[5941,6575],[0,-1],[-1,0],[-1,1],[0,1],[-1,0],[-1,0],[-1,0],[-1,0],[0,-1],[-1,-1],
[0,-2],[0,-1],[0,-2],[0,-1],[-1,-3],[-3,-2],[-4,-4],[-1,-1],[-1,0],[-2,-1],[-2,0],[-1,0],[-1,
-1],[-1,-2],[-5,1],[-1,0],[-1,-1],[-1,-1],[-1,0],[-2,1],[-4,3],[-1,1],[-1,2],[-2,8],[-1,4],
[-1,9],[0,1],[0,2],[0,1],[1,0],[0,-1],[0,-1],[1,-1],[0,-1],[1,0]
Integration of Dataverse, DPE, geoservice and geocoder in 6 steps
1. Data API communicates with geocoder to find standardized codes for all locations in
dataset
2. the same geocode with associated polygons is coming from geoservice
3. data and geoservice matching in the frontend by any geospatial javascript library
4. attributes like colors and scales filling the map polygons
5. legend with scales, values and colors coming from DPE to make map look complete
6. notes, sources and other metadata extracting from dataverse API to provide clear
explanation of generated historical map
Live demo for Labor Conflicts in 100 years
Dataset Quality Check
Benford’s law to do test of the quality of data
Linked Edit Rules: a methodology to publish, link, combine and execute edit rules on the
Web as Linked Data to verify consistency of statistical datasets and recognize wrong filled
data values (for example, characters in values where numbers expected)
Chart visualization to get overview of missing data values
Benford’s law in action on real dataset
Datasets combining and aggregation in data exploration tools
● tools that can predict the same variables disambiguated in different datasets (Year and
Jaar)
● geocoding services can standardize different regions (Netherlands → NL, Amsterdam
→ AMS)
● all possible relationship paths can be ranked from the “best” (100%) to the “worst”
(0%), for example, value “05” for variable “Month” can be recognized as “May”
● standardized datasets can be used as “reference” data for other datasets from other
researcher groups and depot services (CHIA, Harvard DataVerse Network, MIT,
DANS)
Clio Infra Infrastructure is suitable for different kinds of datasets
Quantitative datasets that store quantity data values (numbers):
- data measured (length, speed, height, age)
- example: current clio-infra datasets
Qualitative datasets store quality observations (descriptions):
- data can be observed (colors, textures, professions, groups)
- example: HISCO, world strikes dataset
Data Visualization is different for different kinds of datasets:
- quantity can be plotted on charts, graphs, historical maps
- quality (hierarchy) can be visualized on treemaps and maps
Example: Interactive data exploration based on API token
Summary: data exploration and analysis will extend Dataverse functionality
● data quality check during upload/update of dataset
● automatic ingestion and recognition of years and locations in datasets
● integrated with geocoder and geoservice to get polygons for maps
● interactive dashboard to do visual exploration of variables from dataset (graph /
histogram / scatterplot)
● data processing engine to plot data on interactive historical maps (if dataset has
geospatial data)
● correlation for continuous variables
● regression analysis for estimating the relationships among variables
● building treemaps for qualitative data analysis (hierarchical data coming soon)
Thank you!
Any suggestions?

More Related Content

What's hot

Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training Curricula
EUCLID project
 
Sören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge GraphsSören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge Graphs
semanticsconference
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
Besnik Fetahu
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
Peter Haase
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
Peter Haase
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
Valeria Pesce
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
Ivan Herman
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
Peter Haase
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
Peter Haase
 
Metadata Mapping & Crosswalks
Metadata Mapping & CrosswalksMetadata Mapping & Crosswalks
Metadata Mapping & Crosswalks
Nikos Palavitsinis, PhD
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge Graph
Peter Haase
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
andrea huang
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
CIARD Movement
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
4Science
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple Count
Leigh Dodds
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
Mark Conway
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
Open Data Support
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
vty
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
Vrije Universiteit Amsterdam
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
andrea huang
 

What's hot (20)

Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training Curricula
 
Sören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge GraphsSören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge Graphs
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
 
Metadata Mapping & Crosswalks
Metadata Mapping & CrosswalksMetadata Mapping & Crosswalks
Metadata Mapping & Crosswalks
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge Graph
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple Count
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 

Similar to Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by Vyacheslav Tykhonov, Richard Zijdeman, and Jerry de Vries

Clio infra Collabs data analysis tools
Clio infra Collabs data analysis toolsClio infra Collabs data analysis tools
Clio infra Collabs data analysis tools
vty
 
The recovery of netherlands geographic information system (nlgis 2)
The recovery of netherlands geographic information system (nlgis 2)The recovery of netherlands geographic information system (nlgis 2)
The recovery of netherlands geographic information system (nlgis 2)
vty
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8
plan4all
 
Geohosting
GeohostingGeohosting
Geohosting
Karel Charvat
 
ENGAGE Workshop at OpenDataWeek2013
ENGAGE Workshop at OpenDataWeek2013ENGAGE Workshop at OpenDataWeek2013
ENGAGE Workshop at OpenDataWeek2013
Valerie BRASSE
 
Gsoc proposal
Gsoc proposalGsoc proposal
Gsoc proposal
AyushBansal122
 
Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-data
Raul Palma
 
Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generation
plan4all
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
AyushBansal122
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
OpenAIRE
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
GeethaPratyusha
 
Executable papers
Executable papersExecutable papers
Executable papers
Anita de Waard
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Research Data Alliance
 
Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014
EDINA, University of Edinburgh
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
Arhiv družboslovnih podatkov
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
Andrea Bollini
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
aceas13tern
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
vty
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
Dr. Radhey Shyam
 

Similar to Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by Vyacheslav Tykhonov, Richard Zijdeman, and Jerry de Vries (20)

Clio infra Collabs data analysis tools
Clio infra Collabs data analysis toolsClio infra Collabs data analysis tools
Clio infra Collabs data analysis tools
 
The recovery of netherlands geographic information system (nlgis 2)
The recovery of netherlands geographic information system (nlgis 2)The recovery of netherlands geographic information system (nlgis 2)
The recovery of netherlands geographic information system (nlgis 2)
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8
 
Geohosting
GeohostingGeohosting
Geohosting
 
ENGAGE Workshop at OpenDataWeek2013
ENGAGE Workshop at OpenDataWeek2013ENGAGE Workshop at OpenDataWeek2013
ENGAGE Workshop at OpenDataWeek2013
 
Gsoc proposal
Gsoc proposalGsoc proposal
Gsoc proposal
 
Inspire hack 2017-linked-data
Inspire hack 2017-linked-dataInspire hack 2017-linked-data
Inspire hack 2017-linked-data
 
Team 05 linked data generation
Team 05 linked data generationTeam 05 linked data generation
Team 05 linked data generation
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
 
Executable papers
Executable papersExecutable papers
Executable papers
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 

More from datascienceiqss

Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
Citing Data in Journal Articles using JATS by Deborah A. LapeyreCiting Data in Journal Articles using JATS by Deborah A. Lapeyre
Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
datascienceiqss
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
datascienceiqss
 
iRODS/Dataverse Project by Jonathan Crabtree
iRODS/Dataverse Project by Jonathan CrabtreeiRODS/Dataverse Project by Jonathan Crabtree
iRODS/Dataverse Project by Jonathan Crabtree
datascienceiqss
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
datascienceiqss
 
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
DataTags: Sharing Privacy Sensitive Data by Latanya SweeneyDataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
datascienceiqss
 
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
datascienceiqss
 
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
Geospatial Data Visualization: WorldMap Integration by Raman PrasadGeospatial Data Visualization: WorldMap Integration by Raman Prasad
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
datascienceiqss
 
Sharing Data Through Plots with Plotly by Alex Johnson
Sharing Data Through Plots with Plotly by Alex JohnsonSharing Data Through Plots with Plotly by Alex Johnson
Sharing Data Through Plots with Plotly by Alex Johnson
datascienceiqss
 
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
datascienceiqss
 
MIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeillMIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeill
datascienceiqss
 
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
datascienceiqss
 
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin ShenqinDataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
datascienceiqss
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
datascienceiqss
 
Metadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai ChristianMetadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai Christian
datascienceiqss
 
American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...
datascienceiqss
 
Political Analysis Dataverse by Jonathan N. Katz
Political Analysis Dataverse by Jonathan N. KatzPolitical Analysis Dataverse by Jonathan N. Katz
Political Analysis Dataverse by Jonathan N. Katz
datascienceiqss
 
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
datascienceiqss
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
datascienceiqss
 
Data Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-TiessenData Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-Tiessen
datascienceiqss
 
Persistent Identifier Services and their Metadata by John Kunze
Persistent Identifier Services and their Metadata by John KunzePersistent Identifier Services and their Metadata by John Kunze
Persistent Identifier Services and their Metadata by John Kunze
datascienceiqss
 

More from datascienceiqss (20)

Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
Citing Data in Journal Articles using JATS by Deborah A. LapeyreCiting Data in Journal Articles using JATS by Deborah A. Lapeyre
Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
 
iRODS/Dataverse Project by Jonathan Crabtree
iRODS/Dataverse Project by Jonathan CrabtreeiRODS/Dataverse Project by Jonathan Crabtree
iRODS/Dataverse Project by Jonathan Crabtree
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
 
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
DataTags: Sharing Privacy Sensitive Data by Latanya SweeneyDataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
 
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
 
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
Geospatial Data Visualization: WorldMap Integration by Raman PrasadGeospatial Data Visualization: WorldMap Integration by Raman Prasad
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
 
Sharing Data Through Plots with Plotly by Alex Johnson
Sharing Data Through Plots with Plotly by Alex JohnsonSharing Data Through Plots with Plotly by Alex Johnson
Sharing Data Through Plots with Plotly by Alex Johnson
 
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
 
MIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeillMIT Libraries Dataverse by Katherine McNeill
MIT Libraries Dataverse by Katherine McNeill
 
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
 
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin ShenqinDataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
 
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
 
Metadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai ChristianMetadata & Data Curation Services by Thu-Mai Christian
Metadata & Data Curation Services by Thu-Mai Christian
 
American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...American Journal of Political Science & The Odum Institute: Promoting Researc...
American Journal of Political Science & The Odum Institute: Promoting Researc...
 
Political Analysis Dataverse by Jonathan N. Katz
Political Analysis Dataverse by Jonathan N. KatzPolitical Analysis Dataverse by Jonathan N. Katz
Political Analysis Dataverse by Jonathan N. Katz
 
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
 
Dataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. BorgmanDataverse in the Universe of Data by Christine L. Borgman
Dataverse in the Universe of Data by Christine L. Borgman
 
Data Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-TiessenData Publishing Models by Sünje Dallmeier-Tiessen
Data Publishing Models by Sünje Dallmeier-Tiessen
 
Persistent Identifier Services and their Metadata by John Kunze
Persistent Identifier Services and their Metadata by John KunzePersistent Identifier Services and their Metadata by John Kunze
Persistent Identifier Services and their Metadata by John Kunze
 

Recently uploaded

Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 

Recently uploaded (20)

Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 

Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by Vyacheslav Tykhonov, Richard Zijdeman, and Jerry de Vries

  • 1. CLIO INFRA Data analysis in Dataverse & visualization of datasets on historical maps Dataverse Community meeting 2015, June 11, Harvard University Vyacheslav Tykhonov Richard Zijdeman Jerry de Vries International Institute of Social History
  • 2. Introduction The International Institute of Social History (IISH) collects information in the field of social history and makes it available to the public. The IISH is one of the major scientific heritage institutions in the Netherlands. In the field of socio-economic historical research IISH plays a prominent role. CLIO INFRA is Digital Research Infrastructure for the Arts and Humanities developed to integrate a number of databases (hubs) consisting of data on global social, economic and institutional indicators over the past five centuries, with special attention for the past 200 years. Clio Infra developed by IISH.
  • 3. Our mission Clio Infra is the bridge between Data Collections and Research Datasets. “Old school” researchers time: Collecting datasets in CSV, Access or Excel and other formats mostly on his own without real collaboration between all researchers in the same field. Very closed model. Digital Humanities era: Young researchers use various computer tools to collect, analyze and combine research datasets they’ve got from older generation of researchers. They can contribute and help each other and build communities. The future: The goal is to build tools to get datasets from “old school” researchers, standardize and store data in the Clouds in order provide access to Open Data for researchers all over the world and made possible collaboration between them.
  • 4. Clio Infra Collaboration platform Clio Infra functionality based on the Dataverse solution: - teams collaboratively can curate, share and analyze research datasets - teams members can share the responsibility to collect data on specific variables (for example, countries) and inform each other about changes and additions - dataset version control system is able to track changes in datasets - other researchers can download their own copy of the data if dataset is published as Open Data Dataverse is flexible metadata store (repository) that connected with Research datasets storage by our Data Processing Engine (DPE)
  • 5. Added Value for future Researchers The benefits of data sharing can be classified in terms of Metadata and Data access and sharing (Collection tools) and Statistical Analysis and Data Mining (Research tools): ● access to a specific case study, citing and finding data ● access to the universe of data from Dataverse network that can organize and display them for browsing and searching ● data filtering: researchers with proper authorization can obtain the subset of data provided by data collector ● data analysis to run descriptive statistics and graphics, visualization, plotting on historical maps ● Data APIs to export data for further analysis by popular statistical packages (STATA, SPSS, R, iPython Notebook) and advanced data mining tools that will be developed in the future (always up-to-date solution)
  • 6. Collaboration possibilities Descriptive metadata: - dataset file - documentation - standardized tables with codes can be part of dataset Sharing and collaboration capabilities: - requesting unique API token by every researcher - user of Dataverse - exchanging API token between researchers and granting permissions to work on the same datasets as a team using data analysis tools
  • 7. Collaboration Data Workflow - Draft Datasets are visible only for owners - with API tokens researchers can get access to interactive dashboard to get some insights about the data stored in the Dataverse - every dataset converted to dataframe - dashboard can provide access to all variables from the dataframe and visualize them on charts, graphs, historical maps and treemaps After dataset is prepared to go public, it can be published in Collabs: - guest users can download the copy of dataset - team members with permissions and authorized API token can contribute to dataset
  • 8. Data Analysis in Clio Infra ● Data Processing Engine has python core and developed for nlgis.nl (Netherlands Geographic Information System) and distributed as widget ● interactive data exploration dashboard based on D3 library ● every dataset from Dataverse is available as Data API (json) and can be connected to any statistical package ● filtering the data on specific years and variables based on dataframes (pandas, python for data analysis) ● data quality check is the quick visual tool to apply Benford’s law for all values from specific dataset ● dataframes can be open by researchers in various statistical packages like iPython notebook, R Studio, SPSS, STATA, Mathlab, etc
  • 9. Data Processing Engine (DPE) specification ● can split values from any dataset in number of categories specified by researcher (8 by default) ● algorithm to categorize data values in proper categories can be selected manually (percentile by default) ● can define maximum possible categories for specific dataset if there is no way to get categories number specified by user of the system (for example, if there are 2-3 categories of data values) ● data ranges should be defined to get possibility to visualize data on some chart or map in the right scale ● colors can be specified by user (Color Brewing, see http://colorbrewer2.org) ● legend generated and attached to all visualizations automatically ● values with missing data shown as 'no data' regions on map ● all data values delivered by Data API to make the data analysis platform independent and communicate with other systems or statistical packages
  • 10. API Service (Data API) Data API provided by Data Processing Engine is the most important functionality for the well equipped digital infrastructure: ● easy way to analyze data in popular statistical packages (STATA, SPSS, Excel) ● use common data science programming languages like Python, R to perform more advanced research using external Data Science libraries ● analyze data with toolboxes like Wolfram|Alpha and other Discovery Platforms (added value for the future) ● suitable for other researchers and developers to use advanced technique and data mining tools that aren’t developed yet
  • 11. Example of output from Data API ● every dataset ingested by DPE available as Data API with unique handler ● API can be filtered by variables extracted from the content of data file Example: /api/data?&handle=F16UDU:30:31&countrycode=USA&year=1880&categories=8&datarange=calculate "United States of America": { "code": "F16UDU30_31", "color": "#FF7F00", "countrycode": "USA", "id": "2085", "indicator": "Total Urban Population", "intcode": "840", "r": 923.99, "region": "W. Offshoots", "units": "x 1000", "value": 14264.0, "year": 1880 } }
  • 12. Data visualization and plotting data on historical maps ● Data Processing Engine (DPE) is the core of data visualization process and connected to geoservice ● data attributes like scales and colors calculated by DPE on the fly based on the input of researcher (for example, number of categories to split data) and the part of Data API ● histograms, cross tabulations, enhanced descriptive statistics based on pandas dataframes ● visualization of datasets on historical maps will be available to plot data on maps for last 500 years
  • 13. Historical maps services: Geoservice and Geocoder Delivered by Webmapper.nl and integrated with common Clio Infra infrastructure. Geoservice basic requirements: - should provide actual GeoAPI with historical polygons for all countries and regions based on standardized codes - available as geojson/topojson for online visualization - QGIS toolbox should be supported to upload new maps and update old maps as shape files Geocoder should be able to standardize all geographical locations in different datasets: - USSR and Soviet Union should be recognized as one country with the same PID - Germany before and after 1990 - Indonesia before and after 1999
  • 14. GeoAPI example Geoservice can provide polygons for specific years on the national or world level rendered as topojson or geojson. GeoAPI: /api/maps?world=on&year=1962 Polygons for all countries will be delivered as topojson: arcs":[[1782,2186]]}]}},"arcs":[[[8387,6231],[0,5],[1,1],[1,-1],[2,0],[2,-1],[3,-4],[1, -3],[0,-1],[-1,-5],[0,-1],[-1,2],[0,1],[-2,2],[-3,1],[-1,0],[-1,0],[-1,3],[0,1]], [[8390,6247],[1,1],[0,1],[2,1],[1,0],[1,-2],[-1,-5],[-1,0],[-1,1],[-1,1],[-1,1],[0,1]], [[8391,6204],[0,2],[-1,1],[-1,-1],[0,1],[0,1],[1,3],[1,0],[2,-6],[0,-1],[0,-1],[0,-1], [-1,1],[-1,1]],[[8364,6093],[0,2],[2,5],[1,0],[1,-2],[-1,-6],[-1,-3],[-1,0],[-1,0],[0,1], [0,3]],[[5941,6575],[0,-1],[-1,0],[-1,1],[0,1],[-1,0],[-1,0],[-1,0],[-1,0],[0,-1],[-1,-1], [0,-2],[0,-1],[0,-2],[0,-1],[-1,-3],[-3,-2],[-4,-4],[-1,-1],[-1,0],[-2,-1],[-2,0],[-1,0],[-1, -1],[-1,-2],[-5,1],[-1,0],[-1,-1],[-1,-1],[-1,0],[-2,1],[-4,3],[-1,1],[-1,2],[-2,8],[-1,4], [-1,9],[0,1],[0,2],[0,1],[1,0],[0,-1],[0,-1],[1,-1],[0,-1],[1,0]
  • 15. Integration of Dataverse, DPE, geoservice and geocoder in 6 steps 1. Data API communicates with geocoder to find standardized codes for all locations in dataset 2. the same geocode with associated polygons is coming from geoservice 3. data and geoservice matching in the frontend by any geospatial javascript library 4. attributes like colors and scales filling the map polygons 5. legend with scales, values and colors coming from DPE to make map look complete 6. notes, sources and other metadata extracting from dataverse API to provide clear explanation of generated historical map Live demo for Labor Conflicts in 100 years
  • 16. Dataset Quality Check Benford’s law to do test of the quality of data Linked Edit Rules: a methodology to publish, link, combine and execute edit rules on the Web as Linked Data to verify consistency of statistical datasets and recognize wrong filled data values (for example, characters in values where numbers expected) Chart visualization to get overview of missing data values
  • 17. Benford’s law in action on real dataset
  • 18. Datasets combining and aggregation in data exploration tools ● tools that can predict the same variables disambiguated in different datasets (Year and Jaar) ● geocoding services can standardize different regions (Netherlands → NL, Amsterdam → AMS) ● all possible relationship paths can be ranked from the “best” (100%) to the “worst” (0%), for example, value “05” for variable “Month” can be recognized as “May” ● standardized datasets can be used as “reference” data for other datasets from other researcher groups and depot services (CHIA, Harvard DataVerse Network, MIT, DANS)
  • 19. Clio Infra Infrastructure is suitable for different kinds of datasets Quantitative datasets that store quantity data values (numbers): - data measured (length, speed, height, age) - example: current clio-infra datasets Qualitative datasets store quality observations (descriptions): - data can be observed (colors, textures, professions, groups) - example: HISCO, world strikes dataset Data Visualization is different for different kinds of datasets: - quantity can be plotted on charts, graphs, historical maps - quality (hierarchy) can be visualized on treemaps and maps
  • 20. Example: Interactive data exploration based on API token
  • 21. Summary: data exploration and analysis will extend Dataverse functionality ● data quality check during upload/update of dataset ● automatic ingestion and recognition of years and locations in datasets ● integrated with geocoder and geoservice to get polygons for maps ● interactive dashboard to do visual exploration of variables from dataset (graph / histogram / scatterplot) ● data processing engine to plot data on interactive historical maps (if dataset has geospatial data) ● correlation for continuous variables ● regression analysis for estimating the relationships among variables ● building treemaps for qualitative data analysis (hierarchical data coming soon)