SlideShare a Scribd company logo
1 of 74
Download to read offline
From Research Objects to
Reproducible Science Tales
Bertram Ludäscher
ludaesch@illinois.edu
Director, Center for Informatics Research in Science & Scholarship (CIRSS)
School of Information Sciences (iSchool@Illinois)
& National Center for Supercomputing Applications (NCSA)
& Department of Computer Science (CS@Illinois)
Southampton
UK
2019-11-121
Outline
• Crisis Time? Manifesto Time!
• Terminology
• Research Objects - A Long March
• ROs & Reproducibility: Cui Bono?
• A call to action: Transparency Action ..
From ROs to Reproducible Science 2
Manifesto Time!
Never enough …
From ROs to Reproducible Science 3
From ROs to Reproducible Science 4
From ROs to Reproducible Science 5
From ROs to Reproducible Science 6
Terminology Time!
Yeah, but I said that yesterday ..
From ROs to Reproducible Science 7
From ROs to Reproducible Science 8
From ROs to Reproducible Science 9
From ROs to Reproducible Science 10
Plurality: Let’s be community
specific
… and more manifestos!
From ROs to Reproducible Science 11
From ROs to Reproducible Science 12
From ROs to Reproducible Science 13
Back to the big picture
Research Objects
FAIR-dom
From ROs to Reproducible Science 14
From ROs to Reproducible Science 15
Tool Envy Syndrome (TES)
Maybe we should be working on
conceptual foundations and ask:
I can haz some
Reproducibility Platform?
How Perl Saved the Human Genome Project
• The Perl Journal, http://www.tpj.com
• By Lincoln Stein
• DATE: Early February, 1996
• LOCATION: Cambridge, England, in the conference room of the largest DNA
sequencing center in Europe.
• OCCASION: A high level meeting between the computer scientists of this center
and the largest DNA sequencing center in the United States.
• THE PROBLEM: Although the two centers use almost identical laboratory
techniques, almost identical databases, and almost identical data analysis tools,
they still can't interchange data or meaningfully compare results.
• THE SOLUTION: Perl
From ROs to Reproducible Science 17
• … Most groups, however, learned to build modular, loosely-coupled systems whose parts could be
swapped in and out without retooling the whole system:
• First there's a basic quality check on the sequence: is it long enough? Are the number of ambiguous
leMers below the maximum limit? Then the vector check ensures that only human DNA gets into the
database. Next there's a check for repeTTve sequences … penulOmate step is to aMempt to match
the new sequence against other sequences in a large community database of DNA sequences ….
APer performing all these checks, the sequence along with the informaOon that's been gathered
about it [provenance!] along the way is loaded into the local laboratory database.
Tool Envy Syndrome (TES)
Maybe we should be working on
conceptual foundations and ask:
I can haz some
reduce-sillity-platforms?
A Reproducibility Platform …
errors galore
• Reproducibility platform
• … Reproducility platform
• … Reproduce-sillity platform
• è Debuggers!
• … Reduce-sillity platform
• è The Vision
• Helmut Schmidt: “Wer Visionen hat soll zum Arzt
gehen!” (If you have visions, go see a doctor!)
From ROs to Reproducible Science 19
• … NSF SKOPE: system and tools to discover,
access, analyze, visualize paleoenvironmental
data
– unprecedented ability to explore provenance
(detailed, comprehensible record of computa:onal
deriva:on of results)
– for researchers, :nkerers, and modelers
• … NSF Whole Tale:
– leverage & contribute to exisAng CI to support the
whole tale (“living paper”), from workflow run to
scholarly publica:on
– integrate tools & CI (DataONE, Globus, iRODS,
NDS, ...) to simplify use and promote best
pracAces.
– driven by science WGs (Archaeology/SKOPE,
materials science, astro, biodiversity informa:cs ..)
Enter the tool makers
From ROs to Reproducible Science 20
Whole Tale: The next step in the evolution of the
scholarly article: The “Living [Frozen?] Paper”
• 1st Generation:
– narrative (prose)
• 2nd Generation: plus …
– name .. identify .. include (access to) data
• 3rd Generation: plus …
– name .. reference .. include code (software) ..
– and provenance … and exec environment (containers)
Ludäscher: Why-Not Provenance 21
Whole Tale
Whole Tale Dashboard
Whole Tale Vision
Tale
Data
{ Code
D1PROV
22
WT Architecture
23Ludäscher: Why-Not Provenance
https://dashboard.
wholetale.org
Example Tale:
LIGO gravitational wave detection
(tutorial Jupyter notebook)
Ludäscher: Why-Not Provenance 25
h"ps://dashboard.wholetale.org
ROs ... Whole Tale
… are we done here?
From ROs to Reproducible Science 26
The return of the R* brouhaha
From ROs to Reproducible Science 27
In a nutshell
• Computa/onal reproducibility
=/=>
• Scien/fic reproducibility
• Transparency to the rescue!
• What’s the goal again?
• And what informa/on gain is implied by
– a successful reproducibility study (alright … )
– a failed reproducibility study (K Popper says ‘Hi’!)
– a non-conclusive reproducibility study
From ROs to Reproducible Science 28
Reproducibility Crisis (reprised)
• Successful reproducibility study:
• increases trust in prior study J
• … but no surprises L
• Failed reproducibility study :
• decreases trust (or falsifies) prior study L
• … but surprising failure yields new info/knowledge J
• Learning from failures!
– Not really a new, revolutionary idea..
– What is a positive vs negative result anyways?
– ... fail early, fail often ...
On Provenance 29
PRIMAD (what have you “primed”?)
On Provenance 30
Dagstuhl Seminar #16041 Report Outputs = Exec(M,I,P,D) | RO, A
- M = parsimony/bootstrap/..
- I = package XYZ
- P = MacOS ..
- D = (Params, Files)
PRIMAD (what have you “primed”?)
On Provenance 31
Dagstuhl Seminar #16041 Report
From ROs to Reproducible Science 32
Query evaluation
game
EDB: e(a,b), e(b,b)
a b
tc(X,Y) :- e(X,Y) # (1)--e(X,Y)-->(2)
tc(X,Y) :- # (1)--exists:Z-->(3)
e(X,Z), # (3)->(4)-e(X,Z)->(5)
tc(Z,Y). # (3)--X:=Z-->(1) 2
3
1
X := Z
4 5
e(X,Y)
exists:Z
e(X,Z)
3:(b,b,b) 1
1:(b,b) 11
4:(b,b) 1
1
1:(a,b) 1
3:(a,b,a) 1
2:(a,b) 01
3:(a,b,b) 1
2
2
3:(b,b,a) 1
2:(b,b) 01
4:(a,b) 1 5:(a,b) 01
5:(b,b) 01
3:(a,a,a) 1
4:(a,a) 0
1
1:(a,a) 2
1
3:(b,a,a) 1
4:(b,a) 0
1
1
1
1
3:(a,a,b) 2 1:(b,a) 2 3:(b,a,b) 2
Provenance’12 @Dagstuhl
with JanVdB TJ Green
Flum, Kubierschky, Ludäscher, Total and partial well-founded
Datalog coincide, ICDT-The-Bag-1997, Delphi, Greece
Eureka!
33Provenance @ SBBD'16
A rant …
Didn’t you come here for this?
From ROs to Reproducible Science 34
The Evolution of Language
– Peter Buneman for Phil Wadler
35
The Evolution of Language
2x (Descartes)
x. 2x (Church)
(LAMBDA (X) (* 2 X)) (McCarthy)
<?xml version="1.0"?>
<LAMBDA-TERM>
<VAR-LIST>
<VAR>X</VAR>
</VAR-LIST>
<EXPR>
<APPLICATION>
<EXPR><CONST>*</CONST></EXPR>
<ARGUMENT-LIST>
<EXPR><CONST>2</CONST></EXPR>
<EXPR><VAR>X</VAR></EXPR>
</ARGUMENT-LIST>
</APPLICATION>
</EXPR>
</LAMBDA-TERM>
(W3C)
Thesis:
• There’s no problem that can’t be
tackled by another level of
indirec5on.
An5thesis:
• Adding levels of indirec=on gets you
further away from solving your
problem.
• ... or worse:
Beware of the Turing tar-pit in which
everything is possible but nothing of
interest is easy.
-- Alan Perlis in Epigrams on Programming
From ROs to Reproducible Science
Beware of Techno(re)ligion:
Great ideas are simple; frozen accidents aren’t …
• Geo-/Helio-centric
model
• Evolution by Natural
Selection
• Structure of DNA
• Genetic Code
• Relativity
• …
• Logic
F = A | F/F | -F | (ex x) F
36
vs
From ROs to Reproducible Science
Thinking Tools
From ROs to Reproducible Science 37
You can’t do much carpentry with your bare hands,
and you can’t do much thinking with your bare brain
– Bo Dahlbom (via D. Dennett)
Why we need Thinking Tools
• How do we analyze metadata models, schemas,
integrity constraints, taxonomies, ontologies, …
• … or the big picture: what do we mean by …
provenance? Reproducibility in science?
• From Thinking Tools to …. “Tool Tools”?
From ROs to Reproducible Science 38
Provenance as an Intuition Pump for
Understanding what happened!
(Frozen Accidents Edition)
Zrzavý, Jan, David Storch, and Stanislav
Mihulka. Evolu?on: Ein Lese-Lehrbuch.
Springer-Verlag, 2009.
Author: Jkwchui (Based on
drawing by Truth-seeker2004)
From ROs to Reproducible Science 39
Thinking about provenance
What provenance?
From ROs to Reproducible Science 40
Those who can’t remember the past …
– … are forced to repeat it!
• Provenance in
– ... scien&fic workflows
• First (second, third, ...) Provenance Challenge
• ... OPM ... W3C incubator ... W3C PROV ...
– ... databases
• Why, Where, How, ...., Why-NOT, ...
– ... programming languages
– ...
– ... logic-based KR (NMR, ...)
From ROs to Reproducible Science 41
Ludäscher: Why-Not Provenance 42
Computational Provenance …
• Origin, processing history of artifacts
– data products, figures, ...
– also: underlying workflow
è understand methods, dataflow, and dependencies
From ROs to Reproducible Science 43
Climate Change Impacts
in the United States
U.S. National Climate Assessment
U.S. Global Change Research Program
João F. Pimentel, Saumen Dey, Timothy McPhillips,
Khalid Belhajjame, David Koop, Leonardo Murta,
Vanessa Braganholo, Bertram Ludäscher
Yin & Yang: Demonstrating complementary
provenance from noWorkflow &
YesWorkflow
module.__build_class__
module.__build_class__
simulate_data_collection
180 return
180 run_logger
201 return
201 new_image_file
230 parser
231 cassette_id
236 add_option
241 add_option
246 add_option
248 set_usage
251 parse_args
251 args
251 options
254 module.len
24 cassette_id
24 sample_score_cutoff
24 data_redundancy
24 calibration_image_file
30 exists
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
36 run_log
37 write
38 str(sample_score_cutoff)
38 write
38 str(sample_score_cutoff)
49 str.format
49 sample_spreadsheet_file
50 spreadsheet_rows
cassette_q55_spreadsheet.csv
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format 51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
72 str.format
72 write
73 open
73 rejection_log
74 str.format
74 TextIOWrapper.write
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
calibration.img
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open 119 collection_log_file 120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
128 return
run/run_log.txt
run/rejected_samples.txt
run/raw/q55/DRT240/e10000/image_001.raw
run/data/DRT240/DRT240_10000eV_001.img
run/collected_images.csv
run/raw/q55/DRT240/e10000/image_002.raw
run/data/DRT240/DRT240_10000eV_002.img
run/raw/q55/DRT240/e11000/image_001.raw
run/data/DRT240/DRT240_11000eV_001.img
run/raw/q55/DRT240/e11000/image_002.raw
run/data/DRT240/DRT240_11000eV_002.img
run/raw/q55/DRT240/e12000/image_001.raw
run/data/DRT240/DRT240_12000eV_001.img
run/raw/q55/DRT240/e12000/image_002.raw
run/data/DRT240/DRT240_12000eV_002.img
run/raw/q55/DRT322/e10000/image_001.raw
run/data/DRT322/DRT322_10000eV_001.img
run/raw/q55/DRT322/e10000/image_002.raw
run/data/DRT322/DRT322_10000eV_002.img
run/raw/q55/DRT322/e11000/image_001.raw
run/data/DRT322/DRT322_11000eV_001.img
run/raw/q55/DRT322/e11000/image_002.raw
run/data/DRT322/DRT322_11000eV_002.img
noWorkflow:
not only
Workflow!
• Scripts have provenance, too!
• Transparently capture some/all
provenance from Python script
runs.
• Use filter queries to “zoom” into
relevant parts ..
Provenance @ SBBD'16
simulate_data_collection
230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8>
251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55'])
251 args = ['q55']
251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}>
24 cassette_id = 'q55'
24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0
24 calibration_image_file = 'calibration.img'
49 str.format
49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv'
50 spreadsheet_rows(sample_spreadsheet_file)
50 sample_name = 'DRT240'50 sample_quality = 45
61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000])
61 accepted_sample = 'DRT240'61 num_images = 2
61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240'
92 collect_next_image(casset ... _{frame_number:03d}.raw')
92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw'
106 str.format
106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img')
calibration.img
run/data/DRT240/DRT240_11000eV_002.img
$ now dataflow -f "run/data/DRT240/DRT240_11000eV_002.img"
$(NW_FILTERED_LINEAGE_GRAPH).gv: $(NW_FACTS)
now helper df_style.py
now dataflow -v 55 -f
$(RETROSPECTIVE_LINEAGE_VALUE) -m simulation
| python df_style.py -d BT -e >
$(NW_FILTERED_LINEAGE_GRAPH).gv
.. auto-“make” this!
noWorkflow lineage
of an image file
Provenance information
about Python function calls,
variable assignments, etc.
Provenance @ SBBD'16
simulate_data_collection
initialize_run
run_log load_screening_results
sample_namesample_quality
calculate_strategy
accepted_samplerejected_sample num_imagesenergies
log_rejected_sample
rejection_log
collect_data_set
sample_id energyframe_number raw_image
transform_images
corrected_imagetotal_intensitypixel_count
log_average_image_intensity
collection_log
sample_spreadsheet
calibration_image
sample_score_cutoffdata_redundancy
cassette_id
YesWorkflow: Yes, scripts are Workflows, too!
• Use YW annotations
@begin...@end, @in,
@out to reveal hidden
conceptual workflow
(prospective provenance)
• Script isn't changed:
– annotations via comments
(=> language independent)
• For understanding and
sharing the “big picture”
• Query and visualize!
Provenance @ SBBD'16
simulate_data_collection
initialize_run
run_log load_screening_results
sample_namesample_quality
calculate_strategy
accepted_samplerejected_sample num_imagesenergies
log_rejected_sample
rejection_log
collect_data_set
sample_id energyframe_number raw_image
transform_images
corrected_imagetotal_intensitypixel_count
log_average_image_intensity
collection_log
sample_spreadsheet
calibration_image
sample_score_cutoffdata_redundancy
cassette_id
simulate_data_collection
collect_data_set
sample_id energy frame_number raw_image
calculate_strategy
accepted_sample num_imagesenergies
load_screening_results
sample_namesample_quality
transform_images
corrected_image
sample_spreadsheet
calibration_image
sample_score_cutoff data_redundancy
cassette_id
module.__build_class__
module.__build_class__
simulate_data_collection
180 return
180 run_logger
201 return
201 new_image_file
230 parser
231 cassette_id
236 add_option
241 add_option
246 add_option
248 set_usage
251 parse_args
251 args
251 options
254 module.len
24 cassette_id
24 sample_score_cutoff
24 data_redundancy
24 calibration_image_file
30 exists
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
36 run_log
37 write
38 str(sample_score_cutoff)
38 write
38 str(sample_score_cutoff)
49 str.format
49 sample_spreadsheet_file
50 spreadsheet_rows
cassette_q55_spreadsheet.csv
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format 51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
72 str.format
72 write
73 open
73 rejection_log
74 str.format
74 TextIOWrapper.write
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
calibration.img
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open 119 collection_log_file 120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
128 return
run/run_log.txt
run/rejected_samples.txt
run/raw/q55/DRT240/e10000/image_001.raw
run/data/DRT240/DRT240_10000eV_001.img
run/collected_images.csv
run/raw/q55/DRT240/e10000/image_002.raw
run/data/DRT240/DRT240_10000eV_002.img
run/raw/q55/DRT240/e11000/image_001.raw
run/data/DRT240/DRT240_11000eV_001.img
run/raw/q55/DRT240/e11000/image_002.raw
run/data/DRT240/DRT240_11000eV_002.img
run/raw/q55/DRT240/e12000/image_001.raw
run/data/DRT240/DRT240_12000eV_001.img
run/raw/q55/DRT240/e12000/image_002.raw
run/data/DRT240/DRT240_12000eV_002.img
run/raw/q55/DRT322/e10000/image_001.raw
run/data/DRT322/DRT322_10000eV_001.img
run/raw/q55/DRT322/e10000/image_002.raw
run/data/DRT322/DRT322_10000eV_002.img
run/raw/q55/DRT322/e11000/image_001.raw
run/data/DRT322/DRT322_11000eV_001.img
run/raw/q55/DRT322/e11000/image_002.raw
run/data/DRT322/DRT322_11000eV_002.img
simulate_data_collection
230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8>
251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55'])
251 args = ['q55']
251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}>
24 cassette_id = 'q55'
24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0
24 calibration_image_file = 'calibration.img'
49 str.format
49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv'
50 spreadsheet_rows(sample_spreadsheet_file)
50 sample_name = 'DRT240'50 sample_quality = 45
61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000])
61 accepted_sample = 'DRT240'61 num_images = 2
61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240'
92 collect_next_image(casset ... _{frame_number:03d}.raw')
92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw'
106 str.format
106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img')
calibration.img
run/data/DRT240/DRT240_11000eV_002.img
lineage query
lineage query
YesWorkflow:
Conceptual workflow model
noWorkflow:
Python trace model
But how do we
bridge this gap???
Would like to use YW
model to query NW
data!
Provenance @ SBBD'16
Habemus Pons!
We’ve got the Bridge!
The bridge is the journey..
(The journey is the destination)
Lineage of image file
in terms of YW
model, with details
from NW provenance
Provenance @ SBBD'16
Computa(onal Thinking: Die Grenzen meiner
Sprache bedeuten die Grenzen meiner Welt …
• Vanilla Process Network
• Functional Programming
Dataflow Network
• XML Transformation
Network
• Collection-oriented
Modeling & Design
framework (COMAD)
– Look Ma: No Shims!
From ROs to Reproducible Science 50
Tool Envy Syndrome (TES)
Maybe we should be working on
conceptual foundations and ask:
I can haz some
Terminology Tools?
52
Y X X YX Y X Y X Y
Congruence
X == Y
Inclusion
X > Y
Inverse Inclusion
X < Y
Overlap
X>< Y
Disjointness
X ! Y
Origins:
Euler diagrams ...
... limited FO reasoning
... RCC-5++ reasoning
Applica:on: Geo-Taxonomy Alignment
The secret sauce inside: Moved from FO reasoner to … qualitative reasoning
(RCC-5) to … Answer Set Programming (ASP) + some more secret sauce
Taxonomy Alignment Problem
From ROs to Reproducible Science
• Euler/X project employs
qualitative reasoning (RCC-5),
implemented in ASP to align,
merge taxonomies, debug
alignments, etc.
53
Reasoning with Incomplete Knowledge:
Exploring Possible Worlds
From ROs to Reproducible Science
The long march to ROs & Reproducible Science
We're off to see the Wizard,
The wonderful Wizard of Prov!
--
We hear he is a wiz of a wiz
If ever a wiz there was.
--
If ever, oh ever, a wiz there was,
The Wizard of Prov is one because,
Because, because, because, because, because,
Because of the wonderful things he does.
Provenance @ SBBD'16
Meanwhile in a galaxy far far away…
Semantic Web Stuff
From ROs to Reproducible Science 55
W3C Activities in Developing New Query Languages
[Man15] R. MAN TH EY . Back to the Future – Should SQL Surrender to SPARQL? SOFSEM, LNCS,
2015.
Are we caught in a strange loop?
From ROs to Reproducible Science 56
[Man15] R. MANTHEY . Back to the Future – Should SQL Surrender to SPARQL? SOFSEM, LNCS, 2015.
The long march begins …
From ROs to Reproducible Science 57
From ROs to Reproducible Science 58
Self-valida*ng Knowledge-based ROS?
From ROs to Reproducible Science 59
Actionable Transparency
• Transparency vs Re-executability
• In the beginning was the Question!
– … then came the (logic) rule
– ... in the form of a query!
• Semantics anyone?
From ROs to Reproducible Science 60
From ROs to Reproducible Science 61
�����������������
�����
��������������������������������������������������������������
��������������������������������������������������������������
��������������
����������������������������������
���������
����������������
�������������������������������������������������������������
����������
�����������������
��������������������������������������������������������������������������������������
����������������
�������
��������������
������������������
�������������������������������������
����������������
�����������������
��������������������������������������
�������������������
�����������
�������������������������������
������������������
����������
������������������������������
�����������������
�����������
����������������������������
������������
�������������
������������������������������������������������������
���������������������
�����������������������������������
�����������������
�����������������
�����
���������
��������������
����������������
����������
����������
�����������������
����������������
����������
�������
����������
������������������
����������������
���������
�����������������
�������������������
���������
�����������
������������������
�������������
���������
����������
�����������������
�������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
�����������������
������������������
����������������
�������
����������
�����������
������������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
���������
�����������������
�������������������
���������������������������������
����������
�����������������
��������������������������������������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
������������������������������������������������������������������
What (& where) is Semantics?
How (& what) to do (with) Semantics?
• The Meaning Triangle
• Controlled Vocabularies ..
Terminological Logics (DLs) ..
Ontologies
• (Relational) Structures
• A query is a question about a
concept!
– Is this graph bipartite?
– Demonstrate, show, prove
• .. hat it is!
• .. that it isn’t! 62From ROs to Reproducible Science
Answer Set Programming: a superpower for “doing semantics”
• ASP = DB+LP+KR+SAT
• Reasoning spectrum: …queries … constraint solving
• … OWL/DL, FO, SQL, Datalog, ..., ASP, ...
• ASP occupies a “sweet spot”
• ... but needs GTD extensions:
• PWE = ASP
+ Python
+ Jupyter
63h"ps://github.com/idaks/PWE-demos
From ROs to Reproducible Science
ASP + PWE: Possible Worlds Explorer
64
https://github.com/idaks/PW-explorer https://github.com/idaks/PWE-demosFrom ROs to Reproducible Science
65From ROs to Reproducible Science
Lowest Common Ancestors (LCAs)
From ROs to Reproducible Science 66
Visualized in PWE via Python under the hood!
From ROs to Reproducible Science 67
… for a few Python LOCs more …
(growing the target audience)
From ROs to Reproducible Science 68
… we get highlighting of the LCAs!
From ROs to Reproducible Science 69
“Boring” (ASCII) answer sets become
informative Timeline Visualization
(Here: IC Checking & Repair rules!)
From ROs to Reproducible Science 70
… visualizing clusters of PWs (answer sets) …
From ROs to Reproducible Science 71
… easily plug in different
ranking/distance/similarity functions!
… to discover additional structure!
• … discover similar (here:
isomorphic) solutions
• … and display them!
From ROs to Reproducible Science 72
Conclusion I
• Clarifying what we mean by reproducibility
• Identifying tool & thinking gaps
• Bridging gaps
• Empowering the many (long tail)
• Turbocharging the specialists
From ROs to Reproducible Science 73
Conclusion II: Actionable Thinking Tools!
• Possible Worlds Explorer (PWE):
– loosely coupling (= wrapping) Datalog & ASP systems
• DLV, clingo, …, XSB, … , <you-name-it>
– … with Python
– … and Jupyter notebooks
=> where the users are!
=> leveraging Python, Pandas, … analytics and visualization!
• Datalog & ASP for the rest of us!
– … and for LP / DB-Theory gurus :-)
• Work in progress
– join or fork: https://github.com/idaks/PW-explorer
– or talk, to get started: ludaesch@Illinois.edu
From ROs to Reproducible Science 74

More Related Content

Similar to From Research Objects to Reproducible Science Tales

House of Game//Play talk: Recent adventures in virtual worlds - some travelle...
House of Game//Play talk: Recent adventures in virtual worlds - some travelle...House of Game//Play talk: Recent adventures in virtual worlds - some travelle...
House of Game//Play talk: Recent adventures in virtual worlds - some travelle...Rikke Toft Noergaard
 
What knowledge bases know (and what they don't)
What knowledge bases know (and what they don't)What knowledge bases know (and what they don't)
What knowledge bases know (and what they don't)srazniewski
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
Visualizing Scientific Data - LATAM Faculty Summit 2011
Visualizing Scientific Data - LATAM Faculty Summit 2011Visualizing Scientific Data - LATAM Faculty Summit 2011
Visualizing Scientific Data - LATAM Faculty Summit 2011Derick Campbell
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
 
Is the current measure of excellence perverting Science? A Data deluge is com...
Is the current measure of excellence perverting Science? A Data deluge is com...Is the current measure of excellence perverting Science? A Data deluge is com...
Is the current measure of excellence perverting Science? A Data deluge is com...Lourdes Verdes-Montenegro
 
ABC-PhD program-Politecnico di Milano
ABC-PhD program-Politecnico di MilanoABC-PhD program-Politecnico di Milano
ABC-PhD program-Politecnico di MilanoEnrico DeAngelis
 
Formal Ontologies and Uncertainty - INPUT2014
Formal Ontologies and Uncertainty - INPUT2014Formal Ontologies and Uncertainty - INPUT2014
Formal Ontologies and Uncertainty - INPUT2014Matteo Caglioni
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchBertram Ludäscher
 
DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfessionGary Rector
 
An Incomplete Introduction to Artificial Intelligence
An Incomplete Introduction to Artificial IntelligenceAn Incomplete Introduction to Artificial Intelligence
An Incomplete Introduction to Artificial IntelligenceSteven Beeckman
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesBalázs Kégl
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...Bertram Ludäscher
 
Scientific software engineering methods and their validity
Scientific software engineering methods and their validityScientific software engineering methods and their validity
Scientific software engineering methods and their validityDaniel Mendez
 
UMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionUMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionmatthewturk
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchDatapetermurrayrust
 
Digital Scholarship Intersection Scale Social Machines
Digital Scholarship Intersection Scale Social MachinesDigital Scholarship Intersection Scale Social Machines
Digital Scholarship Intersection Scale Social MachinesDavid De Roure
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataRinke Hoekstra
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 

Similar to From Research Objects to Reproducible Science Tales (20)

House of Game//Play talk: Recent adventures in virtual worlds - some travelle...
House of Game//Play talk: Recent adventures in virtual worlds - some travelle...House of Game//Play talk: Recent adventures in virtual worlds - some travelle...
House of Game//Play talk: Recent adventures in virtual worlds - some travelle...
 
Recommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenuRecommandation sociale : filtrage collaboratif et par le contenu
Recommandation sociale : filtrage collaboratif et par le contenu
 
What knowledge bases know (and what they don't)
What knowledge bases know (and what they don't)What knowledge bases know (and what they don't)
What knowledge bases know (and what they don't)
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Visualizing Scientific Data - LATAM Faculty Summit 2011
Visualizing Scientific Data - LATAM Faculty Summit 2011Visualizing Scientific Data - LATAM Faculty Summit 2011
Visualizing Scientific Data - LATAM Faculty Summit 2011
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
 
Is the current measure of excellence perverting Science? A Data deluge is com...
Is the current measure of excellence perverting Science? A Data deluge is com...Is the current measure of excellence perverting Science? A Data deluge is com...
Is the current measure of excellence perverting Science? A Data deluge is com...
 
ABC-PhD program-Politecnico di Milano
ABC-PhD program-Politecnico di MilanoABC-PhD program-Politecnico di Milano
ABC-PhD program-Politecnico di Milano
 
Formal Ontologies and Uncertainty - INPUT2014
Formal Ontologies and Uncertainty - INPUT2014Formal Ontologies and Uncertainty - INPUT2014
Formal Ontologies and Uncertainty - INPUT2014
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of Research
 
DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
 
An Incomplete Introduction to Artificial Intelligence
An Incomplete Introduction to Artificial IntelligenceAn Incomplete Introduction to Artificial Intelligence
An Incomplete Introduction to Artificial Intelligence
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
 
Scientific software engineering methods and their validity
Scientific software engineering methods and their validityScientific software engineering methods and their validity
Scientific software engineering methods and their validity
 
UMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimensionUMich CI Days: Scaling a code in the human dimension
UMich CI Days: Scaling a code in the human dimension
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
Digital Scholarship Intersection Scale Social Machines
Digital Scholarship Intersection Scale Social MachinesDigital Scholarship Intersection Scale Social Machines
Digital Scholarship Intersection Scale Social Machines
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 

More from Bertram Ludäscher

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Bertram Ludäscher
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database RulesBertram Ludäscher
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database RulesBertram Ludäscher
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueBertram Ludäscher
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...Bertram Ludäscher
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...Bertram Ludäscher
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsBertram Ludäscher
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachBertram Ludäscher
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatBertram Ludäscher
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceBertram Ludäscher
 
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionWild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionBertram Ludäscher
 
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...Bertram Ludäscher
 
A Brief Provenance Tour … via DataONE
A Brief Provenance Tour  … via DataONEA Brief Provenance Tour  … via DataONE
A Brief Provenance Tour … via DataONEBertram Ludäscher
 
Declarative Datalog Debugging for Mere Mortals
Declarative Datalog Debugging for Mere MortalsDeclarative Datalog Debugging for Mere Mortals
Declarative Datalog Debugging for Mere MortalsBertram Ludäscher
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeBertram Ludäscher
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Theory
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in TheoryWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Theory
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in TheoryBertram Ludäscher
 

More from Bertram Ludäscher (20)

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A Dialogue
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's Seat
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable Provenance
 
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionWild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
 
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
 
A Brief Provenance Tour … via DataONE
A Brief Provenance Tour  … via DataONEA Brief Provenance Tour  … via DataONE
A Brief Provenance Tour … via DataONE
 
Declarative Datalog Debugging for Mere Mortals
Declarative Datalog Debugging for Mere MortalsDeclarative Datalog Debugging for Mere Mortals
Declarative Datalog Debugging for Mere Mortals
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Theory
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in TheoryWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Theory
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Theory
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

From Research Objects to Reproducible Science Tales

  • 1. From Research Objects to Reproducible Science Tales Bertram Ludäscher ludaesch@illinois.edu Director, Center for Informatics Research in Science & Scholarship (CIRSS) School of Information Sciences (iSchool@Illinois) & National Center for Supercomputing Applications (NCSA) & Department of Computer Science (CS@Illinois) Southampton UK 2019-11-121
  • 2. Outline • Crisis Time? Manifesto Time! • Terminology • Research Objects - A Long March • ROs & Reproducibility: Cui Bono? • A call to action: Transparency Action .. From ROs to Reproducible Science 2
  • 3. Manifesto Time! Never enough … From ROs to Reproducible Science 3
  • 4. From ROs to Reproducible Science 4
  • 5. From ROs to Reproducible Science 5
  • 6. From ROs to Reproducible Science 6
  • 7. Terminology Time! Yeah, but I said that yesterday .. From ROs to Reproducible Science 7
  • 8. From ROs to Reproducible Science 8
  • 9. From ROs to Reproducible Science 9
  • 10. From ROs to Reproducible Science 10
  • 11. Plurality: Let’s be community specific … and more manifestos! From ROs to Reproducible Science 11
  • 12. From ROs to Reproducible Science 12
  • 13. From ROs to Reproducible Science 13
  • 14. Back to the big picture Research Objects FAIR-dom From ROs to Reproducible Science 14
  • 15. From ROs to Reproducible Science 15
  • 16. Tool Envy Syndrome (TES) Maybe we should be working on conceptual foundations and ask: I can haz some Reproducibility Platform?
  • 17. How Perl Saved the Human Genome Project • The Perl Journal, http://www.tpj.com • By Lincoln Stein • DATE: Early February, 1996 • LOCATION: Cambridge, England, in the conference room of the largest DNA sequencing center in Europe. • OCCASION: A high level meeting between the computer scientists of this center and the largest DNA sequencing center in the United States. • THE PROBLEM: Although the two centers use almost identical laboratory techniques, almost identical databases, and almost identical data analysis tools, they still can't interchange data or meaningfully compare results. • THE SOLUTION: Perl From ROs to Reproducible Science 17 • … Most groups, however, learned to build modular, loosely-coupled systems whose parts could be swapped in and out without retooling the whole system: • First there's a basic quality check on the sequence: is it long enough? Are the number of ambiguous leMers below the maximum limit? Then the vector check ensures that only human DNA gets into the database. Next there's a check for repeTTve sequences … penulOmate step is to aMempt to match the new sequence against other sequences in a large community database of DNA sequences …. APer performing all these checks, the sequence along with the informaOon that's been gathered about it [provenance!] along the way is loaded into the local laboratory database.
  • 18. Tool Envy Syndrome (TES) Maybe we should be working on conceptual foundations and ask: I can haz some reduce-sillity-platforms?
  • 19. A Reproducibility Platform … errors galore • Reproducibility platform • … Reproducility platform • … Reproduce-sillity platform • è Debuggers! • … Reduce-sillity platform • è The Vision • Helmut Schmidt: “Wer Visionen hat soll zum Arzt gehen!” (If you have visions, go see a doctor!) From ROs to Reproducible Science 19
  • 20. • … NSF SKOPE: system and tools to discover, access, analyze, visualize paleoenvironmental data – unprecedented ability to explore provenance (detailed, comprehensible record of computa:onal deriva:on of results) – for researchers, :nkerers, and modelers • … NSF Whole Tale: – leverage & contribute to exisAng CI to support the whole tale (“living paper”), from workflow run to scholarly publica:on – integrate tools & CI (DataONE, Globus, iRODS, NDS, ...) to simplify use and promote best pracAces. – driven by science WGs (Archaeology/SKOPE, materials science, astro, biodiversity informa:cs ..) Enter the tool makers From ROs to Reproducible Science 20
  • 21. Whole Tale: The next step in the evolution of the scholarly article: The “Living [Frozen?] Paper” • 1st Generation: – narrative (prose) • 2nd Generation: plus … – name .. identify .. include (access to) data • 3rd Generation: plus … – name .. reference .. include code (software) .. – and provenance … and exec environment (containers) Ludäscher: Why-Not Provenance 21 Whole Tale Whole Tale Dashboard
  • 23. WT Architecture 23Ludäscher: Why-Not Provenance https://dashboard. wholetale.org
  • 24. Example Tale: LIGO gravitational wave detection (tutorial Jupyter notebook)
  • 25. Ludäscher: Why-Not Provenance 25 h"ps://dashboard.wholetale.org
  • 26. ROs ... Whole Tale … are we done here? From ROs to Reproducible Science 26
  • 27. The return of the R* brouhaha From ROs to Reproducible Science 27
  • 28. In a nutshell • Computa/onal reproducibility =/=> • Scien/fic reproducibility • Transparency to the rescue! • What’s the goal again? • And what informa/on gain is implied by – a successful reproducibility study (alright … ) – a failed reproducibility study (K Popper says ‘Hi’!) – a non-conclusive reproducibility study From ROs to Reproducible Science 28
  • 29. Reproducibility Crisis (reprised) • Successful reproducibility study: • increases trust in prior study J • … but no surprises L • Failed reproducibility study : • decreases trust (or falsifies) prior study L • … but surprising failure yields new info/knowledge J • Learning from failures! – Not really a new, revolutionary idea.. – What is a positive vs negative result anyways? – ... fail early, fail often ... On Provenance 29
  • 30. PRIMAD (what have you “primed”?) On Provenance 30 Dagstuhl Seminar #16041 Report Outputs = Exec(M,I,P,D) | RO, A - M = parsimony/bootstrap/.. - I = package XYZ - P = MacOS .. - D = (Params, Files)
  • 31. PRIMAD (what have you “primed”?) On Provenance 31 Dagstuhl Seminar #16041 Report
  • 32. From ROs to Reproducible Science 32
  • 33. Query evaluation game EDB: e(a,b), e(b,b) a b tc(X,Y) :- e(X,Y) # (1)--e(X,Y)-->(2) tc(X,Y) :- # (1)--exists:Z-->(3) e(X,Z), # (3)->(4)-e(X,Z)->(5) tc(Z,Y). # (3)--X:=Z-->(1) 2 3 1 X := Z 4 5 e(X,Y) exists:Z e(X,Z) 3:(b,b,b) 1 1:(b,b) 11 4:(b,b) 1 1 1:(a,b) 1 3:(a,b,a) 1 2:(a,b) 01 3:(a,b,b) 1 2 2 3:(b,b,a) 1 2:(b,b) 01 4:(a,b) 1 5:(a,b) 01 5:(b,b) 01 3:(a,a,a) 1 4:(a,a) 0 1 1:(a,a) 2 1 3:(b,a,a) 1 4:(b,a) 0 1 1 1 1 3:(a,a,b) 2 1:(b,a) 2 3:(b,a,b) 2 Provenance’12 @Dagstuhl with JanVdB TJ Green Flum, Kubierschky, Ludäscher, Total and partial well-founded Datalog coincide, ICDT-The-Bag-1997, Delphi, Greece Eureka! 33Provenance @ SBBD'16
  • 34. A rant … Didn’t you come here for this? From ROs to Reproducible Science 34
  • 35. The Evolution of Language – Peter Buneman for Phil Wadler 35 The Evolution of Language 2x (Descartes) x. 2x (Church) (LAMBDA (X) (* 2 X)) (McCarthy) <?xml version="1.0"?> <LAMBDA-TERM> <VAR-LIST> <VAR>X</VAR> </VAR-LIST> <EXPR> <APPLICATION> <EXPR><CONST>*</CONST></EXPR> <ARGUMENT-LIST> <EXPR><CONST>2</CONST></EXPR> <EXPR><VAR>X</VAR></EXPR> </ARGUMENT-LIST> </APPLICATION> </EXPR> </LAMBDA-TERM> (W3C) Thesis: • There’s no problem that can’t be tackled by another level of indirec5on. An5thesis: • Adding levels of indirec=on gets you further away from solving your problem. • ... or worse: Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy. -- Alan Perlis in Epigrams on Programming From ROs to Reproducible Science
  • 36. Beware of Techno(re)ligion: Great ideas are simple; frozen accidents aren’t … • Geo-/Helio-centric model • Evolution by Natural Selection • Structure of DNA • Genetic Code • Relativity • … • Logic F = A | F/F | -F | (ex x) F 36 vs From ROs to Reproducible Science
  • 37. Thinking Tools From ROs to Reproducible Science 37 You can’t do much carpentry with your bare hands, and you can’t do much thinking with your bare brain – Bo Dahlbom (via D. Dennett)
  • 38. Why we need Thinking Tools • How do we analyze metadata models, schemas, integrity constraints, taxonomies, ontologies, … • … or the big picture: what do we mean by … provenance? Reproducibility in science? • From Thinking Tools to …. “Tool Tools”? From ROs to Reproducible Science 38
  • 39. Provenance as an Intuition Pump for Understanding what happened! (Frozen Accidents Edition) Zrzavý, Jan, David Storch, and Stanislav Mihulka. Evolu?on: Ein Lese-Lehrbuch. Springer-Verlag, 2009. Author: Jkwchui (Based on drawing by Truth-seeker2004) From ROs to Reproducible Science 39
  • 40. Thinking about provenance What provenance? From ROs to Reproducible Science 40
  • 41. Those who can’t remember the past … – … are forced to repeat it! • Provenance in – ... scien&fic workflows • First (second, third, ...) Provenance Challenge • ... OPM ... W3C incubator ... W3C PROV ... – ... databases • Why, Where, How, ...., Why-NOT, ... – ... programming languages – ... – ... logic-based KR (NMR, ...) From ROs to Reproducible Science 41
  • 43. Computational Provenance … • Origin, processing history of artifacts – data products, figures, ... – also: underlying workflow è understand methods, dataflow, and dependencies From ROs to Reproducible Science 43 Climate Change Impacts in the United States U.S. National Climate Assessment U.S. Global Change Research Program
  • 44. João F. Pimentel, Saumen Dey, Timothy McPhillips, Khalid Belhajjame, David Koop, Leonardo Murta, Vanessa Braganholo, Bertram Ludäscher Yin & Yang: Demonstrating complementary provenance from noWorkflow & YesWorkflow
  • 45. module.__build_class__ module.__build_class__ simulate_data_collection 180 return 180 run_logger 201 return 201 new_image_file 230 parser 231 cassette_id 236 add_option 241 add_option 246 add_option 248 set_usage 251 parse_args 251 args 251 options 254 module.len 24 cassette_id 24 sample_score_cutoff 24 data_redundancy 24 calibration_image_file 30 exists 33 exists 32 filepath 34 module.remove 33 exists 32 filepath 34 module.remove 33 exists 32 filepath 34 module.remove 36 run_log 37 write 38 str(sample_score_cutoff) 38 write 38 str(sample_score_cutoff) 49 str.format 49 sample_spreadsheet_file 50 spreadsheet_rows cassette_q55_spreadsheet.csv 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 72 str.format 72 write 73 open 73 rejection_log 74 str.format 74 TextIOWrapper.write 50 spreadsheet_rows 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 90 str.format 90 write 91 sample_id 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image calibration.img 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 50 spreadsheet_rows 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 90 str.format 90 write 91 sample_id 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 50 spreadsheet_rows 128 return run/run_log.txt run/rejected_samples.txt run/raw/q55/DRT240/e10000/image_001.raw run/data/DRT240/DRT240_10000eV_001.img run/collected_images.csv run/raw/q55/DRT240/e10000/image_002.raw run/data/DRT240/DRT240_10000eV_002.img run/raw/q55/DRT240/e11000/image_001.raw run/data/DRT240/DRT240_11000eV_001.img run/raw/q55/DRT240/e11000/image_002.raw run/data/DRT240/DRT240_11000eV_002.img run/raw/q55/DRT240/e12000/image_001.raw run/data/DRT240/DRT240_12000eV_001.img run/raw/q55/DRT240/e12000/image_002.raw run/data/DRT240/DRT240_12000eV_002.img run/raw/q55/DRT322/e10000/image_001.raw run/data/DRT322/DRT322_10000eV_001.img run/raw/q55/DRT322/e10000/image_002.raw run/data/DRT322/DRT322_10000eV_002.img run/raw/q55/DRT322/e11000/image_001.raw run/data/DRT322/DRT322_11000eV_001.img run/raw/q55/DRT322/e11000/image_002.raw run/data/DRT322/DRT322_11000eV_002.img noWorkflow: not only Workflow! • Scripts have provenance, too! • Transparently capture some/all provenance from Python script runs. • Use filter queries to “zoom” into relevant parts .. Provenance @ SBBD'16
  • 46. simulate_data_collection 230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8> 251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55']) 251 args = ['q55'] 251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}> 24 cassette_id = 'q55' 24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0 24 calibration_image_file = 'calibration.img' 49 str.format 49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv' 50 spreadsheet_rows(sample_spreadsheet_file) 50 sample_name = 'DRT240'50 sample_quality = 45 61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000]) 61 accepted_sample = 'DRT240'61 num_images = 2 61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240' 92 collect_next_image(casset ... _{frame_number:03d}.raw') 92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw' 106 str.format 106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img') calibration.img run/data/DRT240/DRT240_11000eV_002.img $ now dataflow -f "run/data/DRT240/DRT240_11000eV_002.img" $(NW_FILTERED_LINEAGE_GRAPH).gv: $(NW_FACTS) now helper df_style.py now dataflow -v 55 -f $(RETROSPECTIVE_LINEAGE_VALUE) -m simulation | python df_style.py -d BT -e > $(NW_FILTERED_LINEAGE_GRAPH).gv .. auto-“make” this! noWorkflow lineage of an image file Provenance information about Python function calls, variable assignments, etc. Provenance @ SBBD'16
  • 47. simulate_data_collection initialize_run run_log load_screening_results sample_namesample_quality calculate_strategy accepted_samplerejected_sample num_imagesenergies log_rejected_sample rejection_log collect_data_set sample_id energyframe_number raw_image transform_images corrected_imagetotal_intensitypixel_count log_average_image_intensity collection_log sample_spreadsheet calibration_image sample_score_cutoffdata_redundancy cassette_id YesWorkflow: Yes, scripts are Workflows, too! • Use YW annotations @begin...@end, @in, @out to reveal hidden conceptual workflow (prospective provenance) • Script isn't changed: – annotations via comments (=> language independent) • For understanding and sharing the “big picture” • Query and visualize! Provenance @ SBBD'16
  • 48. simulate_data_collection initialize_run run_log load_screening_results sample_namesample_quality calculate_strategy accepted_samplerejected_sample num_imagesenergies log_rejected_sample rejection_log collect_data_set sample_id energyframe_number raw_image transform_images corrected_imagetotal_intensitypixel_count log_average_image_intensity collection_log sample_spreadsheet calibration_image sample_score_cutoffdata_redundancy cassette_id simulate_data_collection collect_data_set sample_id energy frame_number raw_image calculate_strategy accepted_sample num_imagesenergies load_screening_results sample_namesample_quality transform_images corrected_image sample_spreadsheet calibration_image sample_score_cutoff data_redundancy cassette_id module.__build_class__ module.__build_class__ simulate_data_collection 180 return 180 run_logger 201 return 201 new_image_file 230 parser 231 cassette_id 236 add_option 241 add_option 246 add_option 248 set_usage 251 parse_args 251 args 251 options 254 module.len 24 cassette_id 24 sample_score_cutoff 24 data_redundancy 24 calibration_image_file 30 exists 33 exists 32 filepath 34 module.remove 33 exists 32 filepath 34 module.remove 33 exists 32 filepath 34 module.remove 36 run_log 37 write 38 str(sample_score_cutoff) 38 write 38 str(sample_score_cutoff) 49 str.format 49 sample_spreadsheet_file 50 spreadsheet_rows cassette_q55_spreadsheet.csv 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 72 str.format 72 write 73 open 73 rejection_log 74 str.format 74 TextIOWrapper.write 50 spreadsheet_rows 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 90 str.format 90 write 91 sample_id 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image calibration.img 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 50 spreadsheet_rows 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 90 str.format 90 write 91 sample_id 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 50 spreadsheet_rows 128 return run/run_log.txt run/rejected_samples.txt run/raw/q55/DRT240/e10000/image_001.raw run/data/DRT240/DRT240_10000eV_001.img run/collected_images.csv run/raw/q55/DRT240/e10000/image_002.raw run/data/DRT240/DRT240_10000eV_002.img run/raw/q55/DRT240/e11000/image_001.raw run/data/DRT240/DRT240_11000eV_001.img run/raw/q55/DRT240/e11000/image_002.raw run/data/DRT240/DRT240_11000eV_002.img run/raw/q55/DRT240/e12000/image_001.raw run/data/DRT240/DRT240_12000eV_001.img run/raw/q55/DRT240/e12000/image_002.raw run/data/DRT240/DRT240_12000eV_002.img run/raw/q55/DRT322/e10000/image_001.raw run/data/DRT322/DRT322_10000eV_001.img run/raw/q55/DRT322/e10000/image_002.raw run/data/DRT322/DRT322_10000eV_002.img run/raw/q55/DRT322/e11000/image_001.raw run/data/DRT322/DRT322_11000eV_001.img run/raw/q55/DRT322/e11000/image_002.raw run/data/DRT322/DRT322_11000eV_002.img simulate_data_collection 230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8> 251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55']) 251 args = ['q55'] 251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}> 24 cassette_id = 'q55' 24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0 24 calibration_image_file = 'calibration.img' 49 str.format 49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv' 50 spreadsheet_rows(sample_spreadsheet_file) 50 sample_name = 'DRT240'50 sample_quality = 45 61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000]) 61 accepted_sample = 'DRT240'61 num_images = 2 61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240' 92 collect_next_image(casset ... _{frame_number:03d}.raw') 92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw' 106 str.format 106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img') calibration.img run/data/DRT240/DRT240_11000eV_002.img lineage query lineage query YesWorkflow: Conceptual workflow model noWorkflow: Python trace model But how do we bridge this gap??? Would like to use YW model to query NW data! Provenance @ SBBD'16
  • 49. Habemus Pons! We’ve got the Bridge! The bridge is the journey.. (The journey is the destination) Lineage of image file in terms of YW model, with details from NW provenance Provenance @ SBBD'16
  • 50. Computa(onal Thinking: Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt … • Vanilla Process Network • Functional Programming Dataflow Network • XML Transformation Network • Collection-oriented Modeling & Design framework (COMAD) – Look Ma: No Shims! From ROs to Reproducible Science 50
  • 51. Tool Envy Syndrome (TES) Maybe we should be working on conceptual foundations and ask: I can haz some Terminology Tools?
  • 52. 52 Y X X YX Y X Y X Y Congruence X == Y Inclusion X > Y Inverse Inclusion X < Y Overlap X>< Y Disjointness X ! Y Origins: Euler diagrams ... ... limited FO reasoning ... RCC-5++ reasoning Applica:on: Geo-Taxonomy Alignment The secret sauce inside: Moved from FO reasoner to … qualitative reasoning (RCC-5) to … Answer Set Programming (ASP) + some more secret sauce Taxonomy Alignment Problem From ROs to Reproducible Science
  • 53. • Euler/X project employs qualitative reasoning (RCC-5), implemented in ASP to align, merge taxonomies, debug alignments, etc. 53 Reasoning with Incomplete Knowledge: Exploring Possible Worlds From ROs to Reproducible Science
  • 54. The long march to ROs & Reproducible Science We're off to see the Wizard, The wonderful Wizard of Prov! -- We hear he is a wiz of a wiz If ever a wiz there was. -- If ever, oh ever, a wiz there was, The Wizard of Prov is one because, Because, because, because, because, because, Because of the wonderful things he does. Provenance @ SBBD'16
  • 55. Meanwhile in a galaxy far far away… Semantic Web Stuff From ROs to Reproducible Science 55 W3C Activities in Developing New Query Languages [Man15] R. MAN TH EY . Back to the Future – Should SQL Surrender to SPARQL? SOFSEM, LNCS, 2015.
  • 56. Are we caught in a strange loop? From ROs to Reproducible Science 56 [Man15] R. MANTHEY . Back to the Future – Should SQL Surrender to SPARQL? SOFSEM, LNCS, 2015.
  • 57. The long march begins … From ROs to Reproducible Science 57
  • 58. From ROs to Reproducible Science 58
  • 59. Self-valida*ng Knowledge-based ROS? From ROs to Reproducible Science 59
  • 60. Actionable Transparency • Transparency vs Re-executability • In the beginning was the Question! – … then came the (logic) rule – ... in the form of a query! • Semantics anyone? From ROs to Reproducible Science 60
  • 61. From ROs to Reproducible Science 61 ����������������� ����� �������������������������������������������������������������� �������������������������������������������������������������� �������������� ���������������������������������� ��������� ���������������� ������������������������������������������������������������� ���������� ����������������� �������������������������������������������������������������������������������������� ���������������� ������� �������������� ������������������ ������������������������������������� ���������������� ����������������� �������������������������������������� ������������������� ����������� ������������������������������� ������������������ ���������� ������������������������������ ����������������� ����������� ���������������������������� ������������ ������������� ������������������������������������������������������ ��������������������� ����������������������������������� ����������������� ����������������� ����� ��������� �������������� ���������������� ���������� ���������� ����������������� ���������������� ���������� ������� ���������� ������������������ ���������������� ��������� ����������������� ������������������� ��������� ����������� ������������������ ������������� ��������� ���������� ����������������� ������������� �������� ����������� ������������ ������������� ��������������������� ������������������������������������������������������������������� ����������������� ������������������������������������������������������������������������� ����������������� ������������������ ���������������� ������� ���������� ����������� ������������������ ����� ��������� �������������� ���������������� ���������� ��������������� ����������������� ���������������� ��������� ����������������� ������������������� ��������������������������������� ���������� ����������������� �������������������������������������� ����������� ������������ ������������� ��������������������� ������������������������������������������������������������������� ����������������� ������������������������������������������������������������������
  • 62. What (& where) is Semantics? How (& what) to do (with) Semantics? • The Meaning Triangle • Controlled Vocabularies .. Terminological Logics (DLs) .. Ontologies • (Relational) Structures • A query is a question about a concept! – Is this graph bipartite? – Demonstrate, show, prove • .. hat it is! • .. that it isn’t! 62From ROs to Reproducible Science
  • 63. Answer Set Programming: a superpower for “doing semantics” • ASP = DB+LP+KR+SAT • Reasoning spectrum: …queries … constraint solving • … OWL/DL, FO, SQL, Datalog, ..., ASP, ... • ASP occupies a “sweet spot” • ... but needs GTD extensions: • PWE = ASP + Python + Jupyter 63h"ps://github.com/idaks/PWE-demos From ROs to Reproducible Science
  • 64. ASP + PWE: Possible Worlds Explorer 64 https://github.com/idaks/PW-explorer https://github.com/idaks/PWE-demosFrom ROs to Reproducible Science
  • 65. 65From ROs to Reproducible Science
  • 66. Lowest Common Ancestors (LCAs) From ROs to Reproducible Science 66
  • 67. Visualized in PWE via Python under the hood! From ROs to Reproducible Science 67
  • 68. … for a few Python LOCs more … (growing the target audience) From ROs to Reproducible Science 68
  • 69. … we get highlighting of the LCAs! From ROs to Reproducible Science 69
  • 70. “Boring” (ASCII) answer sets become informative Timeline Visualization (Here: IC Checking & Repair rules!) From ROs to Reproducible Science 70
  • 71. … visualizing clusters of PWs (answer sets) … From ROs to Reproducible Science 71 … easily plug in different ranking/distance/similarity functions!
  • 72. … to discover additional structure! • … discover similar (here: isomorphic) solutions • … and display them! From ROs to Reproducible Science 72
  • 73. Conclusion I • Clarifying what we mean by reproducibility • Identifying tool & thinking gaps • Bridging gaps • Empowering the many (long tail) • Turbocharging the specialists From ROs to Reproducible Science 73
  • 74. Conclusion II: Actionable Thinking Tools! • Possible Worlds Explorer (PWE): – loosely coupling (= wrapping) Datalog & ASP systems • DLV, clingo, …, XSB, … , <you-name-it> – … with Python – … and Jupyter notebooks => where the users are! => leveraging Python, Pandas, … analytics and visualization! • Datalog & ASP for the rest of us! – … and for LP / DB-Theory gurus :-) • Work in progress – join or fork: https://github.com/idaks/PW-explorer – or talk, to get started: ludaesch@Illinois.edu From ROs to Reproducible Science 74