Historical Reasoning on the Web

•Download as PPTX, PDF•

2 likes•378 views

Linked Data has enabled an integrated and uniform access to various disparate socio-historical data sources in the Netherlands. However, preparing data for analysis is still a cumbersome task, taking up to 60% of the time. In this presentation I describe some novel tools based on Semantic Web technology to help automate the still closed, unshared, and non-repeatable process of data preparation.

Data & Analytics

Historical Quantitative Reasoning
on the Web
Albert Meroño-Peñuela
Ashkan Ashkpour

Historical Open Data on the Web
• Volume
• Velocity
• Variety
• Veracity

Data Preparation
• Many interesting datasets are messy, incomplete
and incorrect
• Data analysis requires clean data
• Cleaning data involves careful interpretation and
study
• Values and variables in the data are replaced with
(more) standard terms (coding)
• Cross-dataset analyses requires a further data
harmonization step

Data Preparation
This ‘data preparation’ step can take up to 60% of the total work

We do this repeatedly for the same
datasets!

Linking Social History Data
• Linked Open Data – machine-readable Web
graph with 100 billion statements [1]
• Sharing (socio-historical) knowledge for
reusability
• Solves integration
[1] http://lodlaundromat.org/

• Tablinker: Conversion of Excel spreadsheets to RDF
• Integrator: Attach harmonization rules to the raw RDF
• Qber: crowd based, interactive coding and harmonization
• LSD Dimensions: index of statistical variables on the Web

Edit Rules
• Data is good
• Knowledge to assess quality of data is good++

http://linkededitrules.org/
• Reusable rules hub
• Quality assessment
tool

$SCRY Web standards compatible statistical functions in SPARQL PREFIX : <http://scry.rocks/example/> PREFIX scry: <http://scry.rocks/> PREFIX impute: <http://scry.rocks/math/impute?> PREFIX mean: <http://scry.rocks/math/mean?> PREFIX sd: <http://scry.rocks/math/stdev?> SELECT ?obs ?dim ?imputed_val WHERE { ?obs a qb:Observation . ?dim a qb:DimensionProperty|qb:MeasureProperty . FILTER NOT EXISTS { ?obs ?dim ?val .} ?other_obs ?dim ?other_val . SERVICE <http://sparql.scry.rocks/> { SELECT ?imputed_val { GRAPH ?g1 {impute:v scry:input ?other_val ; scry:output ?imputed_val .} } } } Delegation of non- standard function to remote SCRY orb$

Don’t like SPARQL? Neither do we!
https://github.com/CEDAR-project/Queries http://grlc.clariah-
sdh.eculture.labs.vu.nl/CEDAR-
project/Queries/api-docs

Conclusion
• Data preparation: an expensive task (60%)
• Linked Data is good for (socio-historical) data
integration on the Web
• But data quality issues remain
– Linked Edit Rules: rule-hub and data quality
assessment
– SCRY: Linked Data compatible statistical functionality
– grlc: you don’t need to know Linked Data to use
Linked Data

Thank you
@albertmeronyo
https://github.com/CEDAR-project/
https://github.com/CLARIAH/

What's hot

LinkedStat: making ISTAT data more valuableSpazioDati

Collections as Data National Forum (Elings)Mary Elings

Ingrid - Power Plants DatabaseMartin Gascon

Deluca "Building Momentum and Support for Institutional Repository Deposits"National Information Standards Organization (NISO)

041018 Esds PosterRudolf Husar

Measure Twice and Cut Once: How a Budget Cut Impacted Subscription Renewals f...NASIG

Check Conference 2008 HybridsPatti Poe

Hmp 201512Hector Corrada Bravo

SexTant: Visualizing Time-Evolving Linked Geospatial DataCharalampos (Babis) Nikolaou

Rivero - VIPERCarlos Rivero

The IGeLU Linked Open Data Special Interest Working GroupLukas Koster

Library Support For RefDavid Clay

GBA Data viewerMartin Schiegl

OpenAGRIS: using bibliographical data for linking into the agricultural knowl...AIMS (Agricultural Information Management Standards)

EnviroInsite training workshop - environmental forensicsBruce Jacobs

Aeijaeijjournal

Library Mashups: What's NewNicole Baratta

Intro to QuickSight.pptxVisuVasan

BDE ESD Tool - Big Data Met NORWAY Rasmus BenestadMandy Vlachogianni

What's hot (19)

LinkedStat: making ISTAT data more valuable

Collections as Data National Forum (Elings)

Ingrid - Power Plants Database

Deluca "Building Momentum and Support for Institutional Repository Deposits"

041018 Esds Poster

Measure Twice and Cut Once: How a Budget Cut Impacted Subscription Renewals f...

Check Conference 2008 Hybrids

Hmp 201512

SexTant: Visualizing Time-Evolving Linked Geospatial Data

Rivero - VIPER

The IGeLU Linked Open Data Special Interest Working Group

Library Support For Ref

GBA Data viewer

OpenAGRIS: using bibliographical data for linking into the agricultural knowl...

EnviroInsite training workshop - environmental forensics

Aeij

Library Mashups: What's New

Intro to QuickSight.pptx

BDE ESD Tool - Big Data Met NORWAY Rasmus Benestad

Viewers also liked

International tripEna Verma

Hukuk burolarinda avukat performansi degerlendirme sureciResat Eraksoy

Los que vieron y creyeron - Herbert Edgar DouglassZafnat Panea

Sigara birakma yöntemleriwww.tipfakultesi. org

Anantara Kihavah Maldives 4 NightsShahrukh Hussain

Hassan er diagram assignmentHassan Ahmed

Leslie resume leslie moosom

Presentazione brother in basketball 2015Berta Parras Dacal

Labour force participation of married women, US 1860-2010Richard Zijdeman

Solar turbine plantsSujitkumar Patel

Gis Emilim (fazlası için www.tipfakultesi.org )www.tipfakultesi. org

Writing an IEPRuth Barkan

Lección de Escuela Sabática Primer Trimestre 2017 - El Espíritu Santo y la Es...Advenz

Kupovanie lístka na MHD - sociálna zručnosťMonika Smrekova

Colors and shapes - academic skillss_y_l_w_i

Threading beadsMonika Smrekova

Making a fried egg in a toast - work skillMonika Smrekova

Sweeping - Work skillMonika Smrekova

Painting a wood - work skillsMonika Smrekova

The BASICS of the IEP PROCESSrruswick

Viewers also liked (20)

International trip

Hukuk burolarinda avukat performansi degerlendirme sureci

Los que vieron y creyeron - Herbert Edgar Douglass

Sigara birakma yöntemleri

Anantara Kihavah Maldives 4 Nights

Hassan er diagram assignment

Leslie resume

Presentazione brother in basketball 2015

Labour force participation of married women, US 1860-2010

Solar turbine plants

Gis Emilim (fazlası için www.tipfakultesi.org )

Writing an IEP

Lección de Escuela Sabática Primer Trimestre 2017 - El Espíritu Santo y la Es...

Kupovanie lístka na MHD - sociálna zručnosť

Colors and shapes - academic skills

Threading beads

Making a fried egg in a toast - work skill

Sweeping - Work skill

Painting a wood - work skills

The BASICS of the IEP PROCESS

Similar to Historical Reasoning on the Web

Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT

Discovering Related Data Sources in Data PortalsPeter Haase

Linked Open Data in RomaniaVlad Posea

Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsArtificial Intelligence Institute at UofSC

CBS CEDAR PresentationAlbert Meroño-Peñuela

WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016CLARIAH

Industry@RuleML2015 DataGraftRuleML

NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone

RDA, Data Citation, and PIDs for DataOneResearch Data Alliance

BiographyNet: Linking the world of HistoryBiographyNet

Using Tableau to Assess Electronic Resources in ContextMark Paris

Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia

Session 1.4 a distributed network of heritage informationsemanticsconference

A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers

Relationship status: Libraries and linked data in EuropeDiane Rasmussen Pennington

Data warehouse 23 spatial dimension in data warehouseVaibhav Khanna

NISO Webinar: Library Linked Data: From Vision to RealityNational Information Standards Organization (NISO)

Australian Ecosystems Science CloudTERN Australia

November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...National Information Standards Organization (NISO)

Steve Mc Eachern Australian Data ArchiveFuture Perfect 2012

Similar to Historical Reasoning on the Web (20)

Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...

Discovering Related Data Sources in Data Portals

Linked Open Data in Romania

Semantics-enhanced Geoscience Interoperability, Analytics, and Applications

CBS CEDAR Presentation

WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016

Industry@RuleML2015 DataGraft

NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014

RDA, Data Citation, and PIDs for DataOne

BiographyNet: Linking the world of History

Using Tableau to Assess Electronic Resources in Context

Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...

Session 1.4 a distributed network of heritage information

A distributed network of digital heritage information - Semantics Amsterdam

Relationship status: Libraries and linked data in Europe

Data warehouse 23 spatial dimension in data warehouse

NISO Webinar: Library Linked Data: From Vision to Reality

Australian Ecosystems Science Cloud

November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...

Steve Mc Eachern Australian Data Archive

Recently uploaded

Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics

Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy

Easter Eggs From Star Wars and in cars 1 and 217djon017

How we prevented account sharing with MFAAndrei Kaleshka

LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter

Vision, Mission, Goals and Objectives ppt..pptxellehsormae

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7

Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

ASML's Taxonomy Adventure by Daniel Cantervoginip

RadioAdProWritingCinderellabyButleri.pdfgstagge

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档208367051

Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

Learn How Data Science Changes Our WorldEduminds Learning

Recently uploaded (20)

Defining Constituents, Data Vizzes and Telling a Data Story

20240419 - Measurecamp Amsterdam - SAM.pdf

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

Student profile product demonstration on grades, ability, well-being and mind...

Easter Eggs From Star Wars and in cars 1 and 2

How we prevented account sharing with MFA

LLMs, LMMs, their Improvement Suggestions and the Path towards AGI

Vision, Mission, Goals and Objectives ppt..pptx

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...

Heart Disease Classification Report: A Data Analysis Project

Real-Time AI Streaming - AI Max Princeton

ASML's Taxonomy Adventure by Daniel Canter

RadioAdProWritingCinderellabyButleri.pdf

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...

DBA Basics: Getting Started with Performance Tuning.pdf

Learn How Data Science Changes Our World

Historical Reasoning on the Web

1. Historical Quantitative Reasoning on the Web Albert Meroño-Peñuela Ashkan Ashkpour

2. Historical Open Data on the Web • Volume • Velocity • Variety • Veracity

3. (Historical) Knowledge Discovery

4. Data Preparation • Many interesting datasets are messy, incomplete and incorrect • Data analysis requires clean data • Cleaning data involves careful interpretation and study • Values and variables in the data are replaced with (more) standard terms (coding) • Cross-dataset analyses requires a further data harmonization step

5. Data Preparation This ‘data preparation’ step can take up to 60% of the total work

6. We do this repeatedly for the same datasets!

7. Linking Social History Data • Linked Open Data – machine-readable Web graph with 100 billion statements [1] • Sharing (socio-historical) knowledge for reusability • Solves integration [1] http://lodlaundromat.org/

8. • Tablinker: Conversion of Excel spreadsheets to RDF • Integrator: Attach harmonization rules to the raw RDF • Qber: crowd based, interactive coding and harmonization • LSD Dimensions: index of statistical variables on the Web

9. http://lod.cedar-project.nl/maps/

10. http://lod.cedar-project.nl/maps/ N/A

11. Edit Rules • Data is good • Knowledge to assess quality of data is good++

12. http://linkededitrules.org/ • Reusable rules hub • Quality assessment tool

13. SCRY Web standards compatible statistical functions in SPARQL PREFIX : <http://scry.rocks/example/> PREFIX scry: <http://scry.rocks/> PREFIX impute: <http://scry.rocks/math/impute?> PREFIX mean: <http://scry.rocks/math/mean?> PREFIX sd: <http://scry.rocks/math/stdev?> SELECT ?obs ?dim ?imputed_val WHERE { ?obs a qb:Observation . ?dim a qb:DimensionProperty|qb:MeasureProperty . FILTER NOT EXISTS { ?obs ?dim ?val .} ?other_obs ?dim ?other_val . SERVICE <http://sparql.scry.rocks/> { SELECT ?imputed_val { GRAPH ?g1 {impute:v scry:input ?other_val ; scry:output ?imputed_val .} } } } Delegation of non- standard function to remote SCRY orb

14. Don’t like SPARQL? Neither do we! https://github.com/CEDAR-project/Queries http://grlc.clariah- sdh.eculture.labs.vu.nl/CEDAR- project/Queries/api-docs

15. Conclusion • Data preparation: an expensive task (60%) • Linked Data is good for (socio-historical) data integration on the Web • But data quality issues remain – Linked Edit Rules: rule-hub and data quality assessment – SCRY: Linked Data compatible statistical functionality – grlc: you don’t need to know Linked Data to use Linked Data

16. Refining Statistical Data on the Web

17. Thank you @albertmeronyo https://github.com/CEDAR-project/ https://github.com/CLARIAH/

Editor's Notes

Tools to facilitate data integration in social history – make the life of the social historian working with semi-structured data easier Tools TO MAKE THEIR LIFES EASIER STRUCTURED DATA
1 . EXPLAIN PROCESS STAGES 2 . PREPARATION = ARTISAN 3 . A critical step in knowledge discovery is DATA INTEGRATION: INTERROGATING VARIOUS DATASETS IN A COMBINED WAY CURRENTLY: BROUGHT TOGETHER BY HAND
CS KEEN ON RESEARCHING LATER STAGES The PROBLEM IS SOMEWHERE ELSE!
SO OUR GOAL IS TO PROVIDE AUTOMATIC METHODS TO AVOID THIS MAKE PREPARATION *********REUSABLE************
(AVOID JARGON, DESCRIBE THE TOOLS) See https://github.com/CEDAR-project See https://github.com/CLARIAH The good things of data integration come next…
BRINGING DATABASES TOGETHER ******HOWEVER********
BUT: I actually don’t want to talk about how good Linked Data is for integrating historical data…. But the problems that such integrated datasets might still have Aligned with general problem of data on the Web. Web data is of varied quality (meaning: you’ll stumble upon very crappy data)
KEEP EXAMPLE ON MISSING DATA
Hub of interconnected constraints on statistical datasets (Screenshot) QBsistent: use LER to validate your data TOOL TO VALIDATE YOUR DATA USING THIS HUB (Link to repo) (Problem: need to implement statistical functions in SPARQL…)
QUERY EXAMPLES AS A NOVICE EDIT A QUERY AND JUST GET A SPREADSHEET BY PUSHING A BUTTON YOU DON’T NEED THE LINKED DATA KNOWLEDGE
These and many more fully described in this recently published book… which actually is my thesis dissertation

Historical Reasoning on the Web

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Historical Reasoning on the Web

Similar to Historical Reasoning on the Web (20)

More from Albert Meroño-Peñuela

More from Albert Meroño-Peñuela (19)

Recently uploaded

Recently uploaded (20)

Historical Reasoning on the Web

Editor's Notes