Querying	Provenance	Information:	Basic	
Notions	and	an	Example	from	Paleoclimate	
Reconstruction	(SKOPE	Project)
Bertram	Ludäscher?,		Victoria	Stodden*,	Kyle	Bocinsky,	Keith	
Kintigh,	Tim	Kohler,	Timothy	McPhillips,	Johnathan	Rush	
?	contact	(Q&A)	for	SKOPE	&	YesWorkflow *presenting	 1
SKOPE	Project:	Synthesized	Knowledge	Of	Past	
Environments
Example:	Bocinsky,	Kohler	et	al.	study	rain-fed	maize	of Anasazi
– Four	Corners;	AD	600–1500. Climate	change	influenced	Mesa	Verde	Migrations;	late	
13th	century	AD.	Uses	network	of	tree-ring	chronologies	to	reconstruct	a	spatio-
temporal	climate	field	at	a	fairly	high	resolution	(~800	m)	from	AD	1–2000.	Algorithm	
estimates	joint	information	in	tree-rings	and	a	climate	signal	to	identify	“best”	 tree-ring	
chronologies	for	climate	reconstructing.
… implemented as an R Script …
2
K.	Bocinsky,	T.	Kohler,	A	2000-year	reconstruction	of	the	
rain-fed	maize	agricultural	niche	in	the	US	Southwest.	
Nature Communications.	doi:10.1038/ncomms6618
Data	&	Code	available!
..but	is	it	enough?	How	to	think	about;	
model;	query	the	underlying	workflow	
and	data	provenance?
3
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
?
YesWorkflow:	Yes,	scripts	(often)	are	
workflows,	too!
• R,	MATLAB,	Python,	…	scripts	
“hide”	valuable	dataflow,	unless	
revealed	using	a	workflow	model.
• YesWorkflow tool	can	be	used	to	
model,	query	underlying	
workflow	and	provenance	info.
4
YesWorkflow:	Prospective &	Retrospective	
Provenance	…	(almost)	for	free!	
• Simple	YW	annotations	(@begin,	
@end,	@in,	@out,	…)	in	the	
script	(R,	Python,	MATLAB)	are	
used	to	recreate	the	workflow	
view	from	the	script	…	
YW!
5
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
Paleoclimate	Reconstruction	(OpenSKOPE.org)
• …	explained	using	YesWorkflow!
Kyle	B.,	(computational)	archaeologist:	
"It	took	me	about	20	minutes	to	comment.	Less	
than	an	hour	to	learn	and	YW-annotate,	all-told."
6
YW	annotations:	Model	your	Workflow!
7
YW (prospective)	and	
YW-Recon	(retrospective)	Provenance
• 1.	YW:	Annotate	Script	=>	YW	Model
– Annotate	@BEGIN..@END,	@IN,	@OUT
– Visualize,	share,	be	happy	J
• 2.	Run	script
– Files	are	read	and	written
– Folder- &	Filenames	have	metadata
• 3.	YW-Recon
– Use	@URI	tags	that	link	YW	Model	ó Persisted	Data
– Run	URI-template	queries	
• cf.	“ls -R”	&	RegEx matching
• 4.	YW-Query
– Answer	the	user’s	provenance	queries	
8
YW (prospective)	and	
YW-Recon	(retrospective)	Provenance
• 1.	YW:	Annotate	Script	=>	YW	Model
– Annotate	@BEGIN..@END,	@IN,	@OUT
– Visualize,	share,	be	happy	J
• 2.	Run	script
– Files	are	read	and	written
– Folder- &	Filenames	have	metadata
• 3.	YW-Recon
– Use	@URI	tags	that	link	YW	Model	ó Persisted	Data
– Run	URI-template	queries	
• cf.	“ls -R”	&	RegEx matching
• 4.	YW-Query
– Answer	the	user’s	provenance	queries	
9
Blurring	the	line	between	generic	provenance
questions and	science questions ...	
• What	version	of	GDAL (Geospatial	Data	Abstraction	Library)	was	used?
• What	were	the	files and	parameters used	as	inputs to	the	scripts used?			
• What	geographic	regions and	years	are	covered by the	PaleoCAR input?
• Are	any	regions in the	processed data not covered by the	input	data?
• For	each	value	displayed	in	the	graphs	or	downloaded	from	the	web	app:
– Is it	the	exact value output	by	PaleoCAR for	the	30"x30"	region	containing	the	
marker?
– Or	is it	a	value interpolated from	multiple	values	in	the	PaleoCAR output?
– If	interpolated,	what	are	the	values and	corresponding	coordinates	for	the	points	
used in the	interpolation?
– What	formula	or	curve-fitting	algorithm was	used for performing	the	interpolation?
• What	are	the	estimated	errors in the	input data to	data	processing	(and	inputs	
to	PaleoCAR)	that	result	in	these	estimated	errors?
• …
10
Blurring	the	line	between	generic	provenance
questions and	science questions ...	
• …	
• Interesting,	provenance-based question	for	a	reconstruction	
technique	like	PaleoCAR:
– What	tree-ring	chronologies/species	were	selected	for	a	particular	
reconstruction	(say,	summer	temperature)?
• Such	information	can	reveal	local	climate	patterns	or	long-range	
climate	teleconnections.
• =>		if	a	researcher	mistrusts	a	particular	tree-ring	chronology,	they	
might	be	interested	in	what	(geographic	and	temporal)	portions	of	
a	reconstruction	are	influenced	by	the	suspect	chronology	(if	any).	
11
Executive	Summary
• Research	papers	explain	findings,	methods;	increasingly	link	to	
data	&	code	(&	exec	environment	=>	Whole	Tale)
• Prospective provenance	(= workflow	definition)	and	
retrospective provenance	(e.g.	data	lineage)	for	script-based	
computational	studies	(R,	Python,	MATLAB,	…)	can	be	
combined	to	support	powerful	hybrid	provenance	queries.
– Provenance	isn’t	just	metadata	for	others:	“provenance-for-self”	
queries	can	be	used	by	researchers	during the	studies.
• YesWorkflow tool	can	be	used	to	model	prospective	
provenance,	combine	with	and	query	retrospective	
provenance
• SKOPE project	provides	rich	use	cases	for	“deep”	(science-
oriented)	provenance	queries.	
12
Provenance	Support	for	Reproducible	Science	
Use	Case:	Paleoclimate	Reconstruction
Science	paper	(OA)	uses:
• open	source	PaleoCAR
model	in	R
• But	what	was	the	
“workflow”?
• Is	there	prospective	
and/or	retrospective	
provenance?
• =>	YesWorkflow tool	
can	help!
13
SUPPLEMENTARY	MATERIAL
14
run/  
├──  raw  
│      └──  q55  
│              ├──  DRT240  
│              │      ├──  e10000  
│              │      │      ├──  image_001.raw  
...          ...  ...  ...  
│              │      │      └──  image_037.raw  
│              │      └──  e11000  
│              │              ├──  image_001.raw  
...          ...          ...  
│              │              └──  image_037.raw  
│              └──  DRT322  
│                      ├──  e10000  
│                      │      ├──  image_001.raw  
...                  ...  ...  
│                      │      └──  image_030.raw  
│                      └──  e11000  
│                              ├──  image_001.raw  
...                          ...  
│                              └──  image_030.raw  
├──  data  
│      ├──  DRT240  
│      │      ├──  DRT240_10000eV_001.img  
...  ...  ...  
│      │      └──  DRT240_11000eV_037.img  
│      └──  DRT322  
│              ├──  DRT322_10000eV_001.img  
...          ...  
│              └──  DRT322_11000eV_030.img  
│  
├──  collected_images.csv  
├──  rejected_samples.txt  
└──  run_log.txt  
  
YW-RECON:	Prospective	&	Retrospective
Provenance	…	(almost)	for	free!	
cassette_id
sample_score_cutoff
sample_spreadsheet
file:cassette_{cassette_id}_spreadsheet.csv
calibration_image
file:calibration.img
initialize_run
run_log
file:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_log
file:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_number
raw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_image
file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_log
file:run/collected_images.csv
• URI-templates	link conceptual	entities	
to	runtime	provenance	“left	behind”	by	
the	script	author	…	
• …	facilitating	provenance	reconstruction 15
initialize_run
run_log
file:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_log
file:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_number
raw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_image
file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_log
file:run/collected_images.csv
sample_spreadsheet
file:cassette_{cassette_id}_spreadsheet.csv
calibration_image
file:calibration.img
cassette_id
sample_score_cutoff
Q:	Where	is	the	raw	image	of	the	corrected	image	
DRT322_11000ev_030.img?	run/  
├──  raw  
│      └──  q55  
│              ├──  DRT240  
│              │      ├──  e10000  
│              │      │      ├──  image_001.raw  
...          ...  ...  ...  
│              │      │      └──  image_037.raw  
│              │      └──  e11000  
│              │              ├──  image_001.raw  
...          ...          ...  
│              │              └──  image_037.raw  
│              └──  DRT322  
│                      ├──  e10000  
│                      │      ├──  image_001.raw  
...                  ...  ...  
│                      │      └──  image_030.raw  
│                      └──  e11000  
│                              ├──  image_001.raw  
...                          ...  
│                              └──  image_030.raw  
├──  data  
│      ├──  DRT240  
│      │      ├──  DRT240_10000eV_001.img  
...  ...  ...  
│      │      └──  DRT240_11000eV_037.img  
│      └──  DRT322  
│              ├──  DRT322_10000eV_001.img  
...          ...  
│              └──  DRT322_11000eV_030.img  
│  
├──  collected_images.csv  
├──  rejected_samples.txt  
└──  run_log.txt  
  
16
main
fetch_mask
input_mask_file
load_data
input_data_file standardize_with_mask
land_water_mask
NEE_data simple_diagnose
standardized_NEE_data result_NEE_pdf
Get	3	views	for	the	price	of	1!
result_NEE_pdf
input_mask_file land_water_mask
fetch_mask
input_data_file NEE_data
load_data
standardized_NEE_data
standardize_with_mask
standardize_with_mask
simple_diagnose
fetch_mask land_water_mask
load_data NEE_data
standardize_with_mask standardized_NEE_data simple_diagnose result_NEE_pdf
input_mask_file
input_data_file
Process	view
Data	view
Combined	view
17
Provenance in Action: DataONE Project
A	DataONE search	(here:	“grass”)	yields	different	packages	with	provenance		
18
DataONE: Support for Provenance
Yaxing’s script with	
inputs &	output	
products
Christopher’s	
YesWorkflow
model
Christopher	using
Yaxing’s outputs	as	
inputs	for	his	script
Christopher’s	results	
can	be	traced	back	all	
the	way	to	Yaxing’s
input
19
Multi-Scale	Synthesis	and	Terrestrial	Model	Intercomparison
Project	(MsTMIP)
fetch_drought_variable
drought_variable_1
fetch_effect_variable
effect_variable_1
convert_effect_variable_units
effect_variable_2
create_land_water_mask
land_water_mask
init_data_variables
predrought_effect_variable_1 drought_value_variable_1 recovery_time_variable_1 drought_number_variable_1
define_droughts
sigma_dv_event month_dv_length
detrend_deseasonalize_effect_variable
effect_variable_3
calculate_data_variables
recovery_time_variable_2 drought_value_variable_2 predrought_effect_variable_2 drought_number_variable_2
export_recovery_time_figure
output_recovery_time_figure
export_drought_value_variable_figure
output_drought_value_variable_figure
export_predrought_effect_variable_figure
output_predrought_effect_variable_figure
export_drought_number_variable_figure
output_drought_number_figure
input_drough_variable
input_effect_variable
Christopher	Schwalm,
Yaxing Wei
20
module.__build_class__
module.__build_class__
simulate_data_collection
180 return
180 run_logger
201 return
201 new_image_file
230 parser
231 cassette_id
236 add_option
241 add_option
246 add_option
248 set_usage
251 parse_args
251 args
251 options
254 module.len
24 cassette_id
24 sample_score_cutoff
24 data_redundancy
24 calibration_image_file
30 exists
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
36 run_log
37 write
38 str(sample_score_cutoff)
38 write
38 str(sample_score_cutoff)
49 str.format
49 sample_spreadsheet_file
50 spreadsheet_rows
cassette_q55_spreadsheet.csv
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format 51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
72 str.format
72 write
73 open
73 rejection_log
74 str.format
74 TextIOWrapper.write
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
calibration.img
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open 119 collection_log_file 120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
128 return
run/run_log.txt
run/rejected_samples.txt
run/raw/q55/DRT240/e10000/image_001.raw
run/data/DRT240/DRT240_10000eV_001.img
run/collected_images.csv
run/raw/q55/DRT240/e10000/image_002.raw
run/data/DRT240/DRT240_10000eV_002.img
run/raw/q55/DRT240/e11000/image_001.raw
run/data/DRT240/DRT240_11000eV_001.img
run/raw/q55/DRT240/e11000/image_002.raw
run/data/DRT240/DRT240_11000eV_002.img
run/raw/q55/DRT240/e12000/image_001.raw
run/data/DRT240/DRT240_12000eV_001.img
run/raw/q55/DRT240/e12000/image_002.raw
run/data/DRT240/DRT240_12000eV_002.img
run/raw/q55/DRT322/e10000/image_001.raw
run/data/DRT322/DRT322_10000eV_001.img
run/raw/q55/DRT322/e10000/image_002.raw
run/data/DRT322/DRT322_10000eV_002.img
run/raw/q55/DRT322/e11000/image_001.raw
run/data/DRT322/DRT322_11000eV_001.img
run/raw/q55/DRT322/e11000/image_002.raw
run/data/DRT322/DRT322_11000eV_002.img
noWorkflow:
not only
Workflow!
• Scripts	have	provenance,	too!
• Transparently capture	some/all	
provenance	from	Python	script	
runs.
• Use	filter	queries to	“zoom”	into	
relevant	parts	..		
21
simulate_data_collection
230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8>
251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55'])
251 args = ['q55']
251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}>
24 cassette_id = 'q55'
24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0
24 calibration_image_file = 'calibration.img'
49 str.format
49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv'
50 spreadsheet_rows(sample_spreadsheet_file)
50 sample_name = 'DRT240'50 sample_quality = 45
61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000])
61 accepted_sample = 'DRT240'61 num_images = 2
61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240'
92 collect_next_image(casset ... _{frame_number:03d}.raw')
92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw'
106 str.format
106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img')
calibration.img
run/data/DRT240/DRT240_11000eV_002.img
$	now dataflow	-f	"run/data/DRT240/DRT240_11000eV_002.img"
$(NW_FILTERED_LINEAGE_GRAPH).gv: $(NW_FACTS)
now helper df_style.py
now dataflow -v 55 -f
$(RETROSPECTIVE_LINEAGE_VALUE) -m simulation
| python df_style.py -d BT -e >
$(NW_FILTERED_LINEAGE_GRAPH).gv
..	auto-“make” this!
noWorkflow lineage	
of	an	image	file
Provenance	information	
about	Python	function	calls,	
variable assignments,	etc.
22
simulate_data_collection
initialize_run
run_log load_screening_results
sample_namesample_quality
calculate_strategy
accepted_samplerejected_sample num_imagesenergies
log_rejected_sample
rejection_log
collect_data_set
sample_id energyframe_number raw_image
transform_images
corrected_imagetotal_intensitypixel_count
log_average_image_intensity
collection_log
sample_spreadsheet
calibration_image
sample_score_cutoffdata_redundancy
cassette_id
simulate_data_collection
collect_data_set
sample_id energy frame_number raw_image
calculate_strategy
accepted_sample num_imagesenergies
load_screening_results
sample_namesample_quality
transform_images
corrected_image
sample_spreadsheet
calibration_image
sample_score_cutoff data_redundancy
cassette_id
module.__build_class__
module.__build_class__
simulate_data_collection
180 return
180 run_logger
201 return
201 new_image_file
230 parser
231 cassette_id
236 add_option
241 add_option
246 add_option
248 set_usage
251 parse_args
251 args
251 options
254 module.len
24 cassette_id
24 sample_score_cutoff
24 data_redundancy
24 calibration_image_file
30 exists
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
36 run_log
37 write
38 str(sample_score_cutoff)
38 write
38 str(sample_score_cutoff)
49 str.format
49 sample_spreadsheet_file
50 spreadsheet_rows
cassette_q55_spreadsheet.csv
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format 51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
72 str.format
72 write
73 open
73 rejection_log
74 str.format
74 TextIOWrapper.write
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
calibration.img
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open 119 collection_log_file 120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
128 return
run/run_log.txt
run/rejected_samples.txt
run/raw/q55/DRT240/e10000/image_001.raw
run/data/DRT240/DRT240_10000eV_001.img
run/collected_images.csv
run/raw/q55/DRT240/e10000/image_002.raw
run/data/DRT240/DRT240_10000eV_002.img
run/raw/q55/DRT240/e11000/image_001.raw
run/data/DRT240/DRT240_11000eV_001.img
run/raw/q55/DRT240/e11000/image_002.raw
run/data/DRT240/DRT240_11000eV_002.img
run/raw/q55/DRT240/e12000/image_001.raw
run/data/DRT240/DRT240_12000eV_001.img
run/raw/q55/DRT240/e12000/image_002.raw
run/data/DRT240/DRT240_12000eV_002.img
run/raw/q55/DRT322/e10000/image_001.raw
run/data/DRT322/DRT322_10000eV_001.img
run/raw/q55/DRT322/e10000/image_002.raw
run/data/DRT322/DRT322_10000eV_002.img
run/raw/q55/DRT322/e11000/image_001.raw
run/data/DRT322/DRT322_11000eV_001.img
run/raw/q55/DRT322/e11000/image_002.raw
run/data/DRT322/DRT322_11000eV_002.img
simulate_data_collection
230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8>
251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55'])
251 args = ['q55']
251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}>
24 cassette_id = 'q55'
24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0
24 calibration_image_file = 'calibration.img'
49 str.format
49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv'
50 spreadsheet_rows(sample_spreadsheet_file)
50 sample_name = 'DRT240'50 sample_quality = 45
61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000])
61 accepted_sample = 'DRT240'61 num_images = 2
61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240'
92 collect_next_image(casset ... _{frame_number:03d}.raw')
92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw'
106 str.format
106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img')
calibration.img
run/data/DRT240/DRT240_11000eV_002.img
lineage	query
lineage	query
YesWorkflow:
Conceptual workflow	model
noWorkflow:	
Python trace	model
Need	to	bridge	this	
gap with	a	shared	
model
Would	like	to	use	YW	
model	to	query	NW	
data!
23
Habemus	Pons!
We’ve	got	the	Bridge!	
Lineage	of	image	file
in	terms	of	YW	
model,	with	details	
from	NW	provenance
24
C3-C4	Prospective	Provenance	
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
file:outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_C4_fraction
C4_fraction_data
file:outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
file:outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
mean_airtemp
file:inputs/narr_air.2m_monthly/air.2m_monthly_{start_year}_{end_year}_mean.{month}.nc
mean_precip
file:inputs/narr_apcp_rescaled_monthly/apcp_monthly_{start_year}_{end_year}_mean.{month}.nc
25
Upstream	Lineage	of	C3_fraction_data
C3_C4_map_present_NA
examine_pixels_for_grass
C3_Data
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_precipitation_data
Rain_Matrix
fetch_monthly_mean_air_temperature_data
Tair_Matrix
generate_netcdf_file_for_C3_fraction
C3_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
26
Upstream	of	Grass_fraction_data!
27
C3_C4_map_present_NA
initialize_Grass_Matrix
Grass_variable
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
Hybrid	Provenance	Graph
28
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
[data19] C3_fraction_data
outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_C4_fraction
[data20] C4_fraction_data
outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_Grass_fraction
[data21] Grass_fraction_data
outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
[data7] SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
[data12] mean_airtemp
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.4.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.8.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.12.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.5.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.9.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.2.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.1.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.6.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.10.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.3.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.7.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.11.nc
[data14] mean_precip
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.10.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.3.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.7.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.11.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.4.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.8.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.1.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.12.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.5.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.9.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.2.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.6.nc
Hybrid	Provenance	Graph:	upstream	of	
Grass_fraction_data file!
29
C3_C4_map_present_NA
initialize_Grass_Matrix
Grass_variable
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
generate_netcdf_file_for_Grass_fraction
[data21] Grass_fraction_data
outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
[data7] SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
Demo	Time
30

Querying Provenance Information: Basic Notions and an Example from Paleoclimate Reconstruction

  • 1.
  • 2.
    SKOPE Project: Synthesized Knowledge Of Past Environments Example: Bocinsky, Kohler et al. study rain-fed maize of Anasazi – Four Corners; AD 600–1500.Climate change influenced Mesa Verde Migrations; late 13th century AD. Uses network of tree-ring chronologies to reconstruct a spatio- temporal climate field at a fairly high resolution (~800 m) from AD 1–2000. Algorithm estimates joint information in tree-rings and a climate signal to identify “best” tree-ring chronologies for climate reconstructing. … implemented as an R Script … 2 K. Bocinsky, T. Kohler, A 2000-year reconstruction of the rain-fed maize agricultural niche in the US Southwest. Nature Communications. doi:10.1038/ncomms6618
  • 3.
  • 4.
    GetModernClimate PRISM_annual_growing_season_precipitation SubsetAllData dendro_series_for_calibration dendro_series_for_reconstruction CAR_Analysis_unique cellwise_unique_selected_linear_models CAR_Analysis_union cellwise_union_selected_linear_models CAR_Reconstruction_union raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors CAR_Reconstruction_union_output ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tifZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif master_data_directory prism_directory tree_ring_datacalibration_years retrodiction_years ? YesWorkflow: Yes, scripts (often) are workflows, too! • R, MATLAB, Python, … scripts “hide” valuable dataflow, unless revealed using a workflow model. • YesWorkflow tool can be used to model, query underlying workflow and provenance info. 4
  • 5.
    YesWorkflow: Prospective & Retrospective Provenance … (almost) for free! • Simple YW annotations (@begin, @end, @in, @out, …) in the script (R, Python, MATLAB) are used to recreate the workflow view from the script … YW! 5 GetModernClimate PRISM_annual_growing_season_precipitation SubsetAllData dendro_series_for_calibration dendro_series_for_reconstructionCAR_Analysis_unique cellwise_unique_selected_linear_models CAR_Analysis_union cellwise_union_selected_linear_models CAR_Reconstruction_union raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors CAR_Reconstruction_union_output ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif master_data_directory prism_directory tree_ring_datacalibration_years retrodiction_years
  • 6.
    GetModernClimate PRISM_annual_growing_season_precipitation SubsetAllData dendro_series_for_calibration dendro_series_for_reconstruction CAR_Analysis_unique cellwise_unique_selected_linear_models CAR_Analysis_union cellwise_union_selected_linear_models CAR_Reconstruction_union raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors CAR_Reconstruction_union_output ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tifZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif master_data_directory prism_directory tree_ring_datacalibration_years retrodiction_years Paleoclimate Reconstruction (OpenSKOPE.org) • … explained using YesWorkflow! Kyle B., (computational) archaeologist: "It took me about 20 minutes to comment. Less than an hour to learn and YW-annotate, all-told." 6
  • 7.
  • 8.
    YW (prospective) and YW-Recon (retrospective) Provenance • 1. YW: Annotate Script => YW Model –Annotate @BEGIN..@END, @IN, @OUT – Visualize, share, be happy J • 2. Run script – Files are read and written – Folder- & Filenames have metadata • 3. YW-Recon – Use @URI tags that link YW Model ó Persisted Data – Run URI-template queries • cf. “ls -R” & RegEx matching • 4. YW-Query – Answer the user’s provenance queries 8
  • 9.
    YW (prospective) and YW-Recon (retrospective) Provenance • 1. YW: Annotate Script => YW Model –Annotate @BEGIN..@END, @IN, @OUT – Visualize, share, be happy J • 2. Run script – Files are read and written – Folder- & Filenames have metadata • 3. YW-Recon – Use @URI tags that link YW Model ó Persisted Data – Run URI-template queries • cf. “ls -R” & RegEx matching • 4. YW-Query – Answer the user’s provenance queries 9
  • 10.
    Blurring the line between generic provenance questions and science questions... • What version of GDAL (Geospatial Data Abstraction Library) was used? • What were the files and parameters used as inputs to the scripts used? • What geographic regions and years are covered by the PaleoCAR input? • Are any regions in the processed data not covered by the input data? • For each value displayed in the graphs or downloaded from the web app: – Is it the exact value output by PaleoCAR for the 30"x30" region containing the marker? – Or is it a value interpolated from multiple values in the PaleoCAR output? – If interpolated, what are the values and corresponding coordinates for the points used in the interpolation? – What formula or curve-fitting algorithm was used for performing the interpolation? • What are the estimated errors in the input data to data processing (and inputs to PaleoCAR) that result in these estimated errors? • … 10
  • 11.
    Blurring the line between generic provenance questions and science questions... • … • Interesting, provenance-based question for a reconstruction technique like PaleoCAR: – What tree-ring chronologies/species were selected for a particular reconstruction (say, summer temperature)? • Such information can reveal local climate patterns or long-range climate teleconnections. • => if a researcher mistrusts a particular tree-ring chronology, they might be interested in what (geographic and temporal) portions of a reconstruction are influenced by the suspect chronology (if any). 11
  • 12.
    Executive Summary • Research papers explain findings, methods; increasingly link to data & code (& exec environment => Whole Tale) • Prospectiveprovenance (= workflow definition) and retrospective provenance (e.g. data lineage) for script-based computational studies (R, Python, MATLAB, …) can be combined to support powerful hybrid provenance queries. – Provenance isn’t just metadata for others: “provenance-for-self” queries can be used by researchers during the studies. • YesWorkflow tool can be used to model prospective provenance, combine with and query retrospective provenance • SKOPE project provides rich use cases for “deep” (science- oriented) provenance queries. 12
  • 13.
  • 14.
  • 15.
    run/   ├──  raw  │      └──  q55   │              ├──  DRT240   │              │      ├──  e10000   │              │      │      ├──  image_001.raw   ...          ...  ...  ...   │              │      │      └──  image_037.raw   │              │      └──  e11000   │              │              ├──  image_001.raw   ...          ...          ...   │              │              └──  image_037.raw   │              └──  DRT322   │                      ├──  e10000   │                      │      ├──  image_001.raw   ...                  ...  ...   │                      │      └──  image_030.raw   │                      └──  e11000   │                              ├──  image_001.raw   ...                          ...   │                              └──  image_030.raw   ├──  data   │      ├──  DRT240   │      │      ├──  DRT240_10000eV_001.img   ...  ...  ...   │      │      └──  DRT240_11000eV_037.img   │      └──  DRT322   │              ├──  DRT322_10000eV_001.img   ...          ...   │              └──  DRT322_11000eV_030.img   │   ├──  collected_images.csv   ├──  rejected_samples.txt   └──  run_log.txt     YW-RECON: Prospective & Retrospective Provenance … (almost) for free! cassette_id sample_score_cutoff sample_spreadsheet file:cassette_{cassette_id}_spreadsheet.csv calibration_image file:calibration.img initialize_run run_log file:run/run_log.txt load_screening_results sample_namesample_quality calculate_strategy rejected_sample accepted_sample num_images energies log_rejected_sample rejection_log file:/run/rejected_samples.txt collect_data_set sample_id energy frame_number raw_image file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw transform_images corrected_image file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img total_intensitypixel_count corrected_image_path log_average_image_intensity collection_log file:run/collected_images.csv • URI-templates link conceptual entities to runtime provenance “left behind” by the script author … • … facilitating provenance reconstruction 15
  • 16.
    initialize_run run_log file:run/run_log.txt load_screening_results sample_name sample_quality calculate_strategy rejected_sample accepted_samplenum_imagesenergies log_rejected_sample rejection_log file:/run/rejected_samples.txt collect_data_set sample_idenergyframe_number raw_image file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw transform_images corrected_image file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img total_intensitypixel_count corrected_image_path log_average_image_intensity collection_log file:run/collected_images.csv sample_spreadsheet file:cassette_{cassette_id}_spreadsheet.csv calibration_image file:calibration.img cassette_id sample_score_cutoff Q: Where is the raw image of the corrected image DRT322_11000ev_030.img? run/   ├──  raw   │      └──  q55   │              ├──  DRT240   │              │      ├──  e10000   │              │      │      ├──  image_001.raw   ...          ...  ...  ...   │              │      │      └──  image_037.raw   │              │      └──  e11000   │              │              ├──  image_001.raw   ...          ...          ...   │              │              └──  image_037.raw   │              └──  DRT322   │                      ├──  e10000   │                      │      ├──  image_001.raw   ...                  ...  ...   │                      │      └──  image_030.raw   │                      └──  e11000   │                              ├──  image_001.raw   ...                          ...   │                              └──  image_030.raw   ├──  data   │      ├──  DRT240   │      │      ├──  DRT240_10000eV_001.img   ...  ...  ...   │      │      └──  DRT240_11000eV_037.img   │      └──  DRT322   │              ├──  DRT322_10000eV_001.img   ...          ...   │              └──  DRT322_11000eV_030.img   │   ├──  collected_images.csv   ├──  rejected_samples.txt   └──  run_log.txt     16
  • 17.
    main fetch_mask input_mask_file load_data input_data_file standardize_with_mask land_water_mask NEE_data simple_diagnose standardized_NEE_dataresult_NEE_pdf Get 3 views for the price of 1! result_NEE_pdf input_mask_file land_water_mask fetch_mask input_data_file NEE_data load_data standardized_NEE_data standardize_with_mask standardize_with_mask simple_diagnose fetch_mask land_water_mask load_data NEE_data standardize_with_mask standardized_NEE_data simple_diagnose result_NEE_pdf input_mask_file input_data_file Process view Data view Combined view 17
  • 18.
    Provenance in Action:DataONE Project A DataONE search (here: “grass”) yields different packages with provenance 18
  • 19.
    DataONE: Support forProvenance Yaxing’s script with inputs & output products Christopher’s YesWorkflow model Christopher using Yaxing’s outputs as inputs for his script Christopher’s results can be traced back all the way to Yaxing’s input 19
  • 20.
    Multi-Scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP) fetch_drought_variable drought_variable_1 fetch_effect_variable effect_variable_1 convert_effect_variable_units effect_variable_2 create_land_water_mask land_water_mask init_data_variables predrought_effect_variable_1 drought_value_variable_1 recovery_time_variable_1drought_number_variable_1 define_droughts sigma_dv_event month_dv_length detrend_deseasonalize_effect_variable effect_variable_3 calculate_data_variables recovery_time_variable_2 drought_value_variable_2 predrought_effect_variable_2 drought_number_variable_2 export_recovery_time_figure output_recovery_time_figure export_drought_value_variable_figure output_drought_value_variable_figure export_predrought_effect_variable_figure output_predrought_effect_variable_figure export_drought_number_variable_figure output_drought_number_figure input_drough_variable input_effect_variable Christopher Schwalm, Yaxing Wei 20
  • 21.
    module.__build_class__ module.__build_class__ simulate_data_collection 180 return 180 run_logger 201return 201 new_image_file 230 parser 231 cassette_id 236 add_option 241 add_option 246 add_option 248 set_usage 251 parse_args 251 args 251 options 254 module.len 24 cassette_id 24 sample_score_cutoff 24 data_redundancy 24 calibration_image_file 30 exists 33 exists 32 filepath 34 module.remove 33 exists 32 filepath 34 module.remove 33 exists 32 filepath 34 module.remove 36 run_log 37 write 38 str(sample_score_cutoff) 38 write 38 str(sample_score_cutoff) 49 str.format 49 sample_spreadsheet_file 50 spreadsheet_rows cassette_q55_spreadsheet.csv 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 72 str.format 72 write 73 open 73 rejection_log 74 str.format 74 TextIOWrapper.write 50 spreadsheet_rows 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 90 str.format 90 write 91 sample_id 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image calibration.img 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 50 spreadsheet_rows 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 90 str.format 90 write 91 sample_id 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 50 spreadsheet_rows 128 return run/run_log.txt run/rejected_samples.txt run/raw/q55/DRT240/e10000/image_001.raw run/data/DRT240/DRT240_10000eV_001.img run/collected_images.csv run/raw/q55/DRT240/e10000/image_002.raw run/data/DRT240/DRT240_10000eV_002.img run/raw/q55/DRT240/e11000/image_001.raw run/data/DRT240/DRT240_11000eV_001.img run/raw/q55/DRT240/e11000/image_002.raw run/data/DRT240/DRT240_11000eV_002.img run/raw/q55/DRT240/e12000/image_001.raw run/data/DRT240/DRT240_12000eV_001.img run/raw/q55/DRT240/e12000/image_002.raw run/data/DRT240/DRT240_12000eV_002.img run/raw/q55/DRT322/e10000/image_001.raw run/data/DRT322/DRT322_10000eV_001.img run/raw/q55/DRT322/e10000/image_002.raw run/data/DRT322/DRT322_10000eV_002.img run/raw/q55/DRT322/e11000/image_001.raw run/data/DRT322/DRT322_11000eV_001.img run/raw/q55/DRT322/e11000/image_002.raw run/data/DRT322/DRT322_11000eV_002.img noWorkflow: not only Workflow! • Scripts have provenance, too! • Transparently capture some/all provenance from Python script runs. • Use filter queries to “zoom” into relevant parts .. 21
  • 22.
    simulate_data_collection 230 parser =<optparse.OptionParser object at 0x7fcb6e16e3c8> 251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55']) 251 args = ['q55'] 251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}> 24 cassette_id = 'q55' 24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0 24 calibration_image_file = 'calibration.img' 49 str.format 49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv' 50 spreadsheet_rows(sample_spreadsheet_file) 50 sample_name = 'DRT240'50 sample_quality = 45 61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000]) 61 accepted_sample = 'DRT240'61 num_images = 2 61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240' 92 collect_next_image(casset ... _{frame_number:03d}.raw') 92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw' 106 str.format 106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img') calibration.img run/data/DRT240/DRT240_11000eV_002.img $ now dataflow -f "run/data/DRT240/DRT240_11000eV_002.img" $(NW_FILTERED_LINEAGE_GRAPH).gv: $(NW_FACTS) now helper df_style.py now dataflow -v 55 -f $(RETROSPECTIVE_LINEAGE_VALUE) -m simulation | python df_style.py -d BT -e > $(NW_FILTERED_LINEAGE_GRAPH).gv .. auto-“make” this! noWorkflow lineage of an image file Provenance information about Python function calls, variable assignments, etc. 22
  • 23.
    simulate_data_collection initialize_run run_log load_screening_results sample_namesample_quality calculate_strategy accepted_samplerejected_sample num_imagesenergies log_rejected_sample rejection_log collect_data_set sample_idenergyframe_number raw_image transform_images corrected_imagetotal_intensitypixel_count log_average_image_intensity collection_log sample_spreadsheet calibration_image sample_score_cutoffdata_redundancy cassette_id simulate_data_collection collect_data_set sample_id energy frame_number raw_image calculate_strategy accepted_sample num_imagesenergies load_screening_results sample_namesample_quality transform_images corrected_image sample_spreadsheet calibration_image sample_score_cutoff data_redundancy cassette_id module.__build_class__ module.__build_class__ simulate_data_collection 180 return 180 run_logger 201 return 201 new_image_file 230 parser 231 cassette_id 236 add_option 241 add_option 246 add_option 248 set_usage 251 parse_args 251 args 251 options 254 module.len 24 cassette_id 24 sample_score_cutoff 24 data_redundancy 24 calibration_image_file 30 exists 33 exists 32 filepath 34 module.remove 33 exists 32 filepath 34 module.remove 33 exists 32 filepath 34 module.remove 36 run_log 37 write 38 str(sample_score_cutoff) 38 write 38 str(sample_score_cutoff) 49 str.format 49 sample_spreadsheet_file 50 spreadsheet_rows cassette_q55_spreadsheet.csv 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 72 str.format 72 write 73 open 73 rejection_log 74 str.format 74 TextIOWrapper.write 50 spreadsheet_rows 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 90 str.format 90 write 91 sample_id 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image calibration.img 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 50 spreadsheet_rows 50 spreadsheet_rows(sample_spreadsheet_file) 51 str.format 51 write 50 sample_name 50 sample_quality 61 calculate_strategy 61 rejected_sample 61 energies 61 accepted_sample 61 num_images 90 str.format 90 write 91 sample_id 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format 93 write 92 energy 92 frame_number 92 intensity 92 raw_image_file 106 str.format 106 transform_image 106 corrected_image_file 106 total_intensity 106 pixel_count 107 str.format 107 write 118 average_intensity 119 open 119 collection_log_file 120 module.writer 120 collection_log 121 writer.writerow 92 collect_next_image 50 spreadsheet_rows 128 return run/run_log.txt run/rejected_samples.txt run/raw/q55/DRT240/e10000/image_001.raw run/data/DRT240/DRT240_10000eV_001.img run/collected_images.csv run/raw/q55/DRT240/e10000/image_002.raw run/data/DRT240/DRT240_10000eV_002.img run/raw/q55/DRT240/e11000/image_001.raw run/data/DRT240/DRT240_11000eV_001.img run/raw/q55/DRT240/e11000/image_002.raw run/data/DRT240/DRT240_11000eV_002.img run/raw/q55/DRT240/e12000/image_001.raw run/data/DRT240/DRT240_12000eV_001.img run/raw/q55/DRT240/e12000/image_002.raw run/data/DRT240/DRT240_12000eV_002.img run/raw/q55/DRT322/e10000/image_001.raw run/data/DRT322/DRT322_10000eV_001.img run/raw/q55/DRT322/e10000/image_002.raw run/data/DRT322/DRT322_10000eV_002.img run/raw/q55/DRT322/e11000/image_001.raw run/data/DRT322/DRT322_11000eV_001.img run/raw/q55/DRT322/e11000/image_002.raw run/data/DRT322/DRT322_11000eV_002.img simulate_data_collection 230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8> 251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55']) 251 args = ['q55'] 251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}> 24 cassette_id = 'q55' 24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0 24 calibration_image_file = 'calibration.img' 49 str.format 49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv' 50 spreadsheet_rows(sample_spreadsheet_file) 50 sample_name = 'DRT240'50 sample_quality = 45 61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000]) 61 accepted_sample = 'DRT240'61 num_images = 2 61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240' 92 collect_next_image(casset ... _{frame_number:03d}.raw') 92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw' 106 str.format 106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img') calibration.img run/data/DRT240/DRT240_11000eV_002.img lineage query lineage query YesWorkflow: Conceptual workflow model noWorkflow: Python trace model Need to bridge this gap with a shared model Would like to use YW model to query NW data! 23
  • 24.
  • 25.
    C3-C4 Prospective Provenance C3_C4_map_present_NA fetch_SYNMAP_land_cover_map_variable lon_variable lat_variable lon_bnds_variablelat_bnds_variable fetch_monthly_mean_air_temperature_data Tair_Matrix fetch_monthly_mean_precipitation_data Rain_Matrix initialize_Grass_Matrix Grass_variable examine_pixels_for_grass C3_Data C4_Data generate_netcdf_file_for_C3_fraction C3_fraction_data file:outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc generate_netcdf_file_for_C4_fraction C4_fraction_data file:outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc generate_netcdf_file_for_Grass_fraction Grass_fraction_data file:outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc SYNMAP_land_cover_map_data inputs/land_cover/SYNMAP_NA_QD.nc mean_airtemp file:inputs/narr_air.2m_monthly/air.2m_monthly_{start_year}_{end_year}_mean.{month}.nc mean_precip file:inputs/narr_apcp_rescaled_monthly/apcp_monthly_{start_year}_{end_year}_mean.{month}.nc 25
  • 26.
    Upstream Lineage of C3_fraction_data C3_C4_map_present_NA examine_pixels_for_grass C3_Data fetch_SYNMAP_land_cover_map_variable lon_variable lat_variable lon_bnds_variablelat_bnds_variable fetch_monthly_mean_precipitation_data Rain_Matrix fetch_monthly_mean_air_temperature_data Tair_Matrix generate_netcdf_file_for_C3_fraction C3_fraction_data SYNMAP_land_cover_map_data mean_airtempmean_precip 26
  • 27.
  • 28.
    Hybrid Provenance Graph 28 C3_C4_map_present_NA fetch_SYNMAP_land_cover_map_variable lon_variable lat_variable lon_bnds_variablelat_bnds_variable fetch_monthly_mean_air_temperature_data Tair_Matrix fetch_monthly_mean_precipitation_data Rain_Matrix initialize_Grass_Matrix Grass_variable examine_pixels_for_grass C3_Data C4_Data generate_netcdf_file_for_C3_fraction [data19] C3_fraction_data outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc generate_netcdf_file_for_C4_fraction [data20] C4_fraction_data outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc generate_netcdf_file_for_Grass_fraction [data21] Grass_fraction_data outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc [data7] SYNMAP_land_cover_map_data inputs/land_cover/SYNMAP_NA_QD.nc [data12] mean_airtemp inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.4.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.8.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.12.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.5.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.9.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.2.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.1.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.6.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.10.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.3.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.7.nc inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.11.nc [data14] mean_precip inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.10.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.3.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.7.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.11.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.4.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.8.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.1.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.12.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.5.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.9.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.2.nc inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.6.nc
  • 29.
    Hybrid Provenance Graph: upstream of Grass_fraction_data file! 29 C3_C4_map_present_NA initialize_Grass_Matrix Grass_variable fetch_SYNMAP_land_cover_map_variable lon_variable lat_variablelon_bnds_variable lat_bnds_variable generate_netcdf_file_for_Grass_fraction [data21] Grass_fraction_data outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc [data7] SYNMAP_land_cover_map_data inputs/land_cover/SYNMAP_NA_QD.nc
  • 30.