These slides are from Auke Rijpma who presented the Catasto meets SPARQL workshop. All stuff is in beta, so let us know when something broke (twitter: @rlzijdeman)
These slides accompanied a presentation by Steve Breker of Artefactual Systems, delivered as part of AtoM Camp Cambridge, a three-day boot camp held at St John's College, Cambridge University, May 9-11, 2017 For more information, see:
https://wiki.accesstomemory.org/Community/Camps/SJC2017
These slides provide advanced users with an overview of AtoM's data model, and demonstrate how a graphical user interface application such as MySQL Workbench can be used to explore the AtoM MySQL database from the back-end. The slides include a number of example queries.
Big Data - Load CSV File & Query the EZ way - HPCC SystemsFujio Turner
A "How To" to load CSV files into HPCC Systems and query them. You can use this method to migrate your RDBMS data ,MySQL / Oracle / SQL, into HPCC Systems.
NoSQL Couchbase Lite & BigData HPCC SystemsFujio Turner
Mobile data is becoming the new source for data. Managing data in the mobile devices has become easier with NoSQL Couchbase Lite mobile database. Making sense, analyzing, scaling to exabytes has also become easier with LexisNexis Big Data platform HPCC Systems.
These slides accompanied a presentation by Steve Breker of Artefactual Systems, delivered as part of AtoM Camp Cambridge, a three-day boot camp held at St John's College, Cambridge University, May 9-11, 2017 For more information, see:
https://wiki.accesstomemory.org/Community/Camps/SJC2017
These slides provide advanced users with an overview of AtoM's data model, and demonstrate how a graphical user interface application such as MySQL Workbench can be used to explore the AtoM MySQL database from the back-end. The slides include a number of example queries.
Big Data - Load CSV File & Query the EZ way - HPCC SystemsFujio Turner
A "How To" to load CSV files into HPCC Systems and query them. You can use this method to migrate your RDBMS data ,MySQL / Oracle / SQL, into HPCC Systems.
NoSQL Couchbase Lite & BigData HPCC SystemsFujio Turner
Mobile data is becoming the new source for data. Managing data in the mobile devices has become easier with NoSQL Couchbase Lite mobile database. Making sense, analyzing, scaling to exabytes has also become easier with LexisNexis Big Data platform HPCC Systems.
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
Hadoop Summit June 2016
The landscape for storing your big data is quite complex, with several competing formats and different implementations of each format. Understanding your use of the data is critical for picking the format. Depending on your use case, the different formats perform very differently. Although you can use a hammer to drive a screw, it isn’t fast or easy to do so. The use cases that we’ve examined are: * reading all of the columns * reading a few of the columns * filtering using a filter predicate * writing the data Furthermore, it is important to benchmark on real data rather than synthetic data. We used the Github logs data available freely from http://githubarchive.org We will make all of the benchmark code open source so that our experiments can be replicated.
How to Take Advantage of Optimizer Improvements in MySQL 8.0Norvald Ryeng
MySQL 8.0 introduces several improvements to the query optimizer that may give improved performance for your queries. This presentation looks at what kind of queries the different improvements apply to, and the focus is on what you can do to get the most out of the optimizer improvements. The main topics are changes to the optimizer cost model, histograms, and new optimizer hints, but other improvements to how MySQL executes queries are also covered. The presentation includes many practical examples of how you can get a significant speedup for your MySQL queries.
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
This white paper focuses on handling data corruption in Elasticsearch. It describes how to recover data from corrupted indices of Elasticsearch and re-index that data in a new index. The paper also guides you about Lucene’s index terminology
Tech Talk - JPA and Query Optimization - publishGleydson Lima
Behind JPA ans SQL Query Optimizations. Talk about PostgreSQL Indexes and Query Planner and Java Persistence API Performance Tips. Hibernate. Java. PostgreSQL. Spring Boot. Spring JPA.
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...Amazon Web Services
In this session, we learn how Instacart reimagined its catalog data processing pipeline to utilize Snowflake, the data warehouse built for the cloud. Instacart grew from a hand-entered catalog to one that processes billions of data points daily. Keeping pace with customer demand prompted Instacart to take an entirely new approach to addressing the unique challenges of grocery catalog curation. Through Snowflake’s unique architecture, which separates compute from storage, Instacart has increased their ability to quickly scale while improving the accuracy, traceability, and quality of their reporting. In turn, better information leads to offering more customized grocery catalog options that delight their customers. This session is brought to you by AWS partner, Snowflake Computing.
R is an open source programming language, which is most popular among data scientists. R isn’t just a tool for industry. It is also very popular among academic scientists and researchers.
This presentations will help readers to get started with R programming.
Efficient spatial queries on vanilla databasesJulian Hyde
A talk given by Julian Hyde at the Apache Calcite online meetup, 2021/01/20.
Spatial and GIS applications have traditionally required specialized databases, or at least specialized data structures like r-trees. Unfortunately this means that hybrid applications such as spatial analytics are not well served, and many people are unaware of the power of spatial queries because their favorite database does not support them.
In this talk, we describe how Apache Calcite enables efficient spatial queries using generic data structures such as HBase’s key-sorted tables, using techniques like Hilbert space-filling curves and materialized views. Calcite implements much of the OpenGIS function set and recognizes query patterns that can be rewritten to use particular spatial indexes. Calcite is bringing spatial query to the masses!
Since version 8.0.14, MySQL supports LATERAL derived tables, sometimes called the for each loop of SQL. What are they? How do they work? Why do you need them? What can they do? How can you use them? Should you use them? What is all this talk about for each loops?
Create Linked Open Data (LOD) Microthesauri using Art & Architecture Thesaurus (AAT) LOD. View and manage options by a non-techy person. Everyone can use, create,
derive from, & map to AAT microthesauri and make the digital collection become LOD-ready dataset.
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
Data Lakes have become a new tool in building modern data warehouse architectures. In this presentation we will introduce Microsoft's Azure Data Lake offering and its new big data processing language called U-SQL that makes Big Data Processing easy by combining the declarativity of SQL with the extensibility of C#. We will give you an initial introduction to U-SQL by explaining why we introduced U-SQL and showing with an example of how to analyze some tweet data with U-SQL and its extensibility capabilities and take you on an introductory tour of U-SQL that is geared towards existing SQL users.
slides for SQL Saturday 635, Vancouver BC, Aug 2017
File Format Benchmarks - Avro, JSON, ORC, & ParquetOwen O'Malley
Hadoop Summit June 2016
The landscape for storing your big data is quite complex, with several competing formats and different implementations of each format. Understanding your use of the data is critical for picking the format. Depending on your use case, the different formats perform very differently. Although you can use a hammer to drive a screw, it isn’t fast or easy to do so. The use cases that we’ve examined are: * reading all of the columns * reading a few of the columns * filtering using a filter predicate * writing the data Furthermore, it is important to benchmark on real data rather than synthetic data. We used the Github logs data available freely from http://githubarchive.org We will make all of the benchmark code open source so that our experiments can be replicated.
How to Take Advantage of Optimizer Improvements in MySQL 8.0Norvald Ryeng
MySQL 8.0 introduces several improvements to the query optimizer that may give improved performance for your queries. This presentation looks at what kind of queries the different improvements apply to, and the focus is on what you can do to get the most out of the optimizer improvements. The main topics are changes to the optimizer cost model, histograms, and new optimizer hints, but other improvements to how MySQL executes queries are also covered. The presentation includes many practical examples of how you can get a significant speedup for your MySQL queries.
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
This white paper focuses on handling data corruption in Elasticsearch. It describes how to recover data from corrupted indices of Elasticsearch and re-index that data in a new index. The paper also guides you about Lucene’s index terminology
Tech Talk - JPA and Query Optimization - publishGleydson Lima
Behind JPA ans SQL Query Optimizations. Talk about PostgreSQL Indexes and Query Planner and Java Persistence API Performance Tips. Hibernate. Java. PostgreSQL. Spring Boot. Spring JPA.
How Instacart’s Catalog Flourished While Hyper-Growing (ANT328-S) - AWS re:In...Amazon Web Services
In this session, we learn how Instacart reimagined its catalog data processing pipeline to utilize Snowflake, the data warehouse built for the cloud. Instacart grew from a hand-entered catalog to one that processes billions of data points daily. Keeping pace with customer demand prompted Instacart to take an entirely new approach to addressing the unique challenges of grocery catalog curation. Through Snowflake’s unique architecture, which separates compute from storage, Instacart has increased their ability to quickly scale while improving the accuracy, traceability, and quality of their reporting. In turn, better information leads to offering more customized grocery catalog options that delight their customers. This session is brought to you by AWS partner, Snowflake Computing.
R is an open source programming language, which is most popular among data scientists. R isn’t just a tool for industry. It is also very popular among academic scientists and researchers.
This presentations will help readers to get started with R programming.
Efficient spatial queries on vanilla databasesJulian Hyde
A talk given by Julian Hyde at the Apache Calcite online meetup, 2021/01/20.
Spatial and GIS applications have traditionally required specialized databases, or at least specialized data structures like r-trees. Unfortunately this means that hybrid applications such as spatial analytics are not well served, and many people are unaware of the power of spatial queries because their favorite database does not support them.
In this talk, we describe how Apache Calcite enables efficient spatial queries using generic data structures such as HBase’s key-sorted tables, using techniques like Hilbert space-filling curves and materialized views. Calcite implements much of the OpenGIS function set and recognizes query patterns that can be rewritten to use particular spatial indexes. Calcite is bringing spatial query to the masses!
Since version 8.0.14, MySQL supports LATERAL derived tables, sometimes called the for each loop of SQL. What are they? How do they work? Why do you need them? What can they do? How can you use them? Should you use them? What is all this talk about for each loops?
Create Linked Open Data (LOD) Microthesauri using Art & Architecture Thesaurus (AAT) LOD. View and manage options by a non-techy person. Everyone can use, create,
derive from, & map to AAT microthesauri and make the digital collection become LOD-ready dataset.
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
Data Lakes have become a new tool in building modern data warehouse architectures. In this presentation we will introduce Microsoft's Azure Data Lake offering and its new big data processing language called U-SQL that makes Big Data Processing easy by combining the declarativity of SQL with the extensibility of C#. We will give you an initial introduction to U-SQL by explaining why we introduced U-SQL and showing with an example of how to analyze some tweet data with U-SQL and its extensibility capabilities and take you on an introductory tour of U-SQL that is geared towards existing SQL users.
slides for SQL Saturday 635, Vancouver BC, Aug 2017
This is an intermediate conversion course for C++, suitable for second year computing students who may have learned Java or another language in first year.
As the popularity of PostgreSQL continues to soar, many companies are exploring ways of migrating their application database over. At Redgate Software, we recently added PostgreSQL as an optional data store for SQL Monitor, our flagship monitoring application, after nearly 18 years of being backed exclusively by SQL Server. Knowing that others will be taking this journey in the near future, we'd like to discuss what we learned. In this training, we'll discuss the planning that needs to take place before a migration begins, including datatype changes, PostgreSQL configuration modifications, and query differences. This will be a mix of slides and demo from our own learnings, as well as those of some clients we've helped along the way.
The Protein Regulatory Networks of COVID-19 - A Knowledge Graph Created by El...Neo4j
Shobhna Srivastava discusses Elsevier's Research Citation network. She talks about how the journey of trying to simplify the existing data processing pipeline, to optimise costs, and choose the right solution to the problem opens the doors to other potential use cases and innovation. Graph technology has been applied to the scientific research domain to enhance content discovery.
Hekaton is the original project name for In-Memory OLTP and just sounds cooler for a title name. Keeping up the tradition of deep technical “Inside” sessions at PASS, this half-day talk will take you behind the scenes and under the covers on how the In-Memory OLTP functionality works with SQL Server.
We will cover “everything Hekaton”, including how it is integrated with the SQL Server Engine Architecture. We will explore how data is stored in memory and on disk, how I/O works, how native complied procedures are built and executed. We will also look at how Hekaton integrates with the rest of the engine, including Backup, Restore, Recovery, High-Availability, Transaction Logging, and Troubleshooting.
Demos are a must for a half-day session like this and what would an inside session be if we didn’t bring out the Windows Debugger. As with previous “Inside…” talks I’ve presented at PASS, this session is level 500 and not for the faint of heart. So read through the docs on In-Memory OLTP and bring some extra pain reliever as we move fast and go deep.
This session will appear as two sessions in the program guide but is not a Part I and II. It is one complete session with a small break so you should plan to attend it all to get the maximum benefit.
Pairwise Coverage-based Testing with Selected Elements in a Query for Databas...Hironori Washizaki
Koji Tsumura, Hironori Washizaki, Yoshiaki Fukazawa, Keishi Oshima, Ryota Mibe, “Pairwise Coverage-based Testing with Selected Elements in a Query for Database Applications,” 5th International Workshop on Combinatorial Testing (IWCT 2016), collocated with ICST 2016, Chicago, USA, April 10, 2016.
Emerging technologies /frameworks in Big DataRahul Jain
A short overview presentation on Emerging technologies /frameworks in Big Data covering Apache Parquet, Apache Flink, Apache Drill with basic concepts of Columnar Storage and Dremel.
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
Presentations on behalf of the workshop series Digital Humanities & Archieven. Nationaal Archief, The Hague, The Netherlands. March 25, 2019. Language @nl.
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
A glimpse of how we are used to connecting datasets on our laptops and how, imho, need to move to the Web of Data, including a demo connecting various sources all from your(!) machine.
Non-technical demonstration of grlc intended for users without prior knowledge of SPARQL or github. See grlc.io and SALAD best paper awarded research: http://datalegend.net/assets/paper7.pdf
Historical occupational classification and occupational stratification schemesRichard Zijdeman
Lecture slides of 1 day course on the practice of coding historical occupations into HISCO and HISCAM consisting of 2x1.5 hours lecture and afternoon computer hands-on session in R.
3x half day lecture and practicals on introductory facets of R, amongst others: installing R and RStudio, reading in and writing out data, data cleaning, descriptive statistics, data visualization (including visual analysis). Courtesy of the European Historical Sample Population Network and the Babeş-Bolyai University (Cluj-Napoca, Romania)
Labour force participation of married women, US 1860-2010Richard Zijdeman
In this presentation I describe the shape of labour force participation curve of married women in the US. It is hypothesized to be U-shaped, but it appears to be more S-shaped. However, more importantly it provides an effort to test the underlying mechanisms of the U-shape at the US state level.
Advancing the comparability of occupational data through Linked Open DataRichard Zijdeman
Occupations are a crucial resource for historical research in a wide variety of fields. This presentation indicates the size of the error that is made when combining data from the two major classification schemes OCCHISCO and HISCO. Next it shows how Linked Data provides a solution to circumvent this and similar issues.
We show how rather than manually, we created an algorithm to derive labour relations automatically from IPUMS-USA data in a more efficient and detailed way, allowing for explanatory questions rather than descriptive questions
This presenations provides an outlook of what we anticipate with the structured data hub: to create linkable datasets, enhance the use of provenance, add quality flags to data, answer new questions and finally, borrow from and provide to public sources such as dbpedia
Introduction into R for historians (part 4: data manipulation)Richard Zijdeman
Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
Introduction into R for historians (part 3: examine and import data)Richard Zijdeman
Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
Introduction into R for historians (part 1: introduction)Richard Zijdeman
Introduction into R for the European Historical Population Sample summerschool, Cluj-Napoca, Romana, 2015. Aimed at a public of historians with little quantitative skills
Historical occupational classification and stratification schemes (lecture)Richard Zijdeman
This is the lab session I provided for the European Historical Sample Network Summerschool, Nijmegen 2015, on why occupations are important in historical research and how we can appropriately deal with them using HISCO and HISCAM
Using HISCO and HISCAM to code and analyze occupationsRichard Zijdeman
This is the lab session I provided for the European Historical Sample Network Summerschool on why occupations are important in historical research and how we can appropriately deal with them using HISCO and HISCAM
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
2. Clariah datahub example
• Try to construct some queries to get a feel for
interacting with Clariah Structured Data Hub.
• Use Catasto, famous dataset, made by David Herlihy
and Christiane Klapisch-Zuber.
• Fiscal census for 1427 Tuscany, covering 60k+
households and 270k+ individuals.
• Covering such fiscal matters as asset ownership,
occupations, etc., but also some basic demographic
information.
3. 6-812
76
SAMPLE CODING FORM
Ser . Hold No. Loc. Name Fat-er's Farii v
3 7 12 2^ 32
Source :
Vol. Pp. K H A I Oc . Inv. Puhiic Total Deduct . Tax
42 45- 48 52 55 60 65 71 76
Ilt3' -
Ser. & Hhoid No . Me—triers
(1-6) Cd.
As above. 7 9 16 30 37
1_6 0l ~ Io, ~
44 51 5S 65 - 72
1 _1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ I_1_1_I_1_1_1_1_1_1_1_1_1_1_1_ !
Ser. Hhold No. Loc. Name Fathers Famil y
1 3 7 12 22 ?2
Iv l~l_I_1_1_1~1~1JID ;7 L D ., IQ •. E,N2, o ; _1_ ,_ B,~' A,N~,U ~C1~1~,_1 _1 _'_1_1_1_+_1_1_ i
Source :
Vol. Pp. K -H A I 0c: Inv. Public Total Deduct. Ta x
42 45 48 52 55 60 65 71 7 6
!~,8,_I$ I l ,_,_,_,_!_,__ 1_11 R.!_1_I_I1$ _1__°
•
Ser. & Hhold No . Members
(1-6) Cd.
As above . 7 9 16 23 30 3
7d451 58 65 72
_+_,_ ,
1_I_1_I_1_1_I_1_1_1_1_I_1_I_I_I_I_I_1_ I _I_ 1
Ser. Hhold No. Loc. Name Father's Family
1 3 7 12 22 32
ID,b ;_,_1_I_i ~lal`_~,~ :~ ;N1I4,Ni~/,1,_,_,_,_,_ iG,A .,t!',ZI~!;_i_1_1_1_1_1_1_,_1_1_1_1_1_1 _
Source :
Vol. Pp. K H A I Oc. Inv. Public Total Deduct. Tax
42 45 48 52 55 60 65 71 76 - -
111C 11i 8 ,` 1_ ;_1A _
Ser. & Hhold No. Members
(1-6) Cd.
As above . 7 9 16 23 30 37
ii 1' I ~I J 1 01LI_i~i3101 e1 r_ 2 e.L2,6 :_2. 1 l,_1_•_1_,_I_r—, _
44 51 ' 58 65 7 2
I_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_1_ 1 _1_{_1_1_1_1_ 1
75
4.
5.
6.
7. Catasto datasets
• Early versions error-prone fwf files
• More recent version offer tabular data
• Mix of household and individual data in rows:
need to know whether e.g. A11 will exist for a
given household.
• Early versions strictly numeric except hhh-names.
• Hard to browse, interpret results.
8. Catasto as linked data
• New datamodel:
• individuals (rdf:type) inHousehold household
• observations (age, occupation, sex, marital
status, relation to head) for individuals
• households householdMember individual
• observations (fiscal, occupation, house)
• Codebook included using prefLabel
9. Browse
• Find links and other long, hard-to-type things at
goo.gl/pwnTZo.
• Browse the new data at <http://
data.socialhistory.org/resource/catasto/household/
2222>
• Try to find some individuals there.
• Try to find the meaning of the codes of a variable
like METIER (occupation) or maritalStatus.
10. SPARQL and triples
• Basic unit in linked data and linked data (SPARQL) queries is
the triple.
• subject - predicate -object
• So here for example:
• individual - age - 75
• household privateInvestments - 5000
• household(head) - occupation - Barbiere
• individual:4_11 inHousehold household:4
11. SPARQL and triples
• SPARQL queries are made with similar triple statements.
• Statement is either a URI: <http://…/…>
• Or a literal: “something”
• Place a question-mark ? to allow part of the statement to
be anything.
• Specify part of the statement as URI or Literal to fix it.
• FROM specifies the named graph where the statements
are in.
12. Query basics
• The basic starting query asks for all triples by
entering all three parts of the statement as variable.
• SELECT * to select all
• ?sub ?pred ?obj
• LIMIT 10 to go easy on the server.
• http://yasgui.org/short/rkQeY_vEZ
13. Query basics: DISTINCT
• Putting DISTINCT after SELECT gives the unique
results; get rid of duplicates.
• write a query to see all the predicates in the Catasto:
• http://yasgui.org/short/ry8iLdPNb
• write a query to see all the possible codes for the
METIER predicate
• http://yasgui.org/short/SytvcOD4W
14. Query basics: PREFIXes
• Writing our URIs all the time isn’t fun and prone to errors.
• Make your life easier by adding prefixes.
• PREFIX name: <uri goes here>
• Usage in the query is name:FINAL_BIT_OF_STATEMENT.
• Replace everything before “METIER” in previous query
by a sensible prefix.
• http://yasgui.org/short/S1SYjOwNb
16. Query basics: summarise
• Add COUNT after SELECT to count how often a
statement in a triple exists in the data.
• Automatically grouped by other variables in the query.
• Can also add GROUP BY at the end to
• Count the number of household (heads) in each
occupational category.
• http://yasgui.org/short/HyCsnuvVb
17. Codebook access
• Codebook is integrated part of data.
• Explore with skos:prefLabel
• Because Clariah-hub uses CSVW-standard, each
file has its own unique graph.
• Either add graph names (there are a lot!) or remove
the FROM statement to search the entire hub.
18. Ordering results
• Use ORDER BY or ORDER BY DESC() at the end of
the query to sort the results.
• Place the previous results in a sensible order
• http://yasgui.org/short/BJzFetvEb
19. Codebook access
• Careful! Need some sort of triple statement that limits it to
the right graphs or you’ll be flooded with results.
• Do limit 100 for safety as well.
• Add meaningful labels to the occupation count query.
• To do this, you’ll need to add a query line.
• Queries with multiple query lines requires the lines to end
with a dot.
• http://yasgui.org/short/rkeLktDNZ
20. Your turn
• Now build something from the ground up.
• Get the ages for individuals (use limit 10 at first).
• http://yasgui.org/short/rJZe-KDEb
• Then make a population distribution:
• http://yasgui.org/short/rkErbKwEZ
21. Your turn
• Use catasto/dimension:relationToHead (not actually to head) and
catasto/dimension:sex (explore using brwsr) to find couples in the
catasto.
• Calculate the age difference between them
• http://yasgui.org/short/rJgIcFPNZ
• What do you notice?
• Can you extend the query to see if this varies by socio-economic group?
• http://yasgui.org/short/BkMA9YP4Z
• http://yasgui.org/short/rkW0V5PEZ (heavy on the browser)