Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

3 kishor



Accessibility is a complex property of information, a composite of several different indicators that measure the ease with which one can get to and work with information from another source. These ...

Accessibility is a complex property of information, a composite of several different indicators that measure the ease with which one can get to and work with information from another source. These different indicators of accessibility scale differently with the size and scope of information. As the datasets get bigger, some indicators remain constant, some scale linearly, while others may scale in a more accelerated manner. At the University of Wisconsin Carbon Modeling group, we are building a very large, spatially accessible U.S. national climate dataset for ecosystem process modeling that is bringing up unique challenges and lessons for accessibility in the multi-Terabyte scale datasets. This paper describes the problem, and the information accessibility challenges and lessons learned while putting together its solution.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

3 kishor 3 kishor Presentation Transcript

  • Accessibility Considerations for Very Large Datasets Puneet Kishor University of Wisconsin-Madison and Creative Commons Monday, October 25, 2010
  • Acknowledgments to CODATA for inviting me, Creative Commons for funding my trip, University of Wisconsin-Madison for paying my salary, and most importantly, the US Federal Goverment for making all the data available to anyone, anywhere without any pre-conditions Monday, October 25, 2010
  • Research context: ecosystem process modeling of very large terrestrial ecosystems Monday, October 25, 2010 View slide
  • Information by numbers Monday, October 25, 2010 View slide
  • ax ad cp ve in yl d tm da tm vp pr sr ta 7 daily variables Monday, October 25, 2010
  • ax ad cp ve in yl d tm da tm vp pr sr ta 1 1 km km 2 cell 1 km Monday, October 25, 2010
  • 13 4587 .25 million cells 2889 Monday, October 25, 2010
  • 8401 days 8400 Monday, October 25, 2010
  • ax ad cp ve in yl d tm da tm vp pr sr ta 111 .32 billion septets 111.32 b Monday, October 25, 2010
  • 725 .78 raw gigabytes Monday, October 25, 2010
  • 10 times as much in a database Monday, October 25, 2010
  • 84 GB of NetCDF format in tar gzipped archives Monday, October 25, 2010
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 4˚ square chunks Monday, October 25, 2010
  • “½” incomplete documentation Monday, October 25, 2010
  • 0 ways to query X ? the data Monday, October 25, 2010
  • 1. Acquire NetCDF file of lat/lon values for each cell from the weather data 1 km2 estimates 2. Dump lat/lon values to CSV with Panoply 3. Import into ArcMap as XY data 4. Export as shapefile 5. Assign WGS84 datum to shapefile in ArcCatalog 6. Reproject to Lambert Spherical (“US National “10” Atlas Equal Area”) 7. Separate by 2x2 degree tile using "tile_num" attribute (so grid will match the netCDF met files) using defination query in ArcMap and exporting to individual shapefiles (256 tiles) as "mask". 8. Open lambert points in qGIS and make 1km grid (shapefile) for each 2x2 tile 9. Assign projection to output (EPSG:2163) 10. Add each new grid shapefile (one at a time) to ArcMap with 2x2 Grid as separate layer times the work to 11. Select by location (select from grid x that intersect mask x) 12. Export selected features of grid x (now will be unpack the data numbered sequentially by record in a way that matches the met NetCDF “ncells”) 13. Clean up: delete extra fields from qGIS (ID,MAXX,MINX,MAXY,MINY) add ncell_id (FID +1) block_id, block_name Monday, October 25, 2010
  • Many kinds of queries f<variable> <location> <point in time> avg(srad) at x,y on Dec 2, 2001 tmin for area on May 19, 1992 tmax at x,y on May 19, 1992 f<variable> <point location> <duration of time> tave at x,y during the first quarter of 1983 sum(vpd) at x,y during the last week of Mar, 2003 Monday, October 25, 2010
  • accessible adjective 1 (of a place) able to be reached or entered : the town is accessible by bus | the building has been made accessible to disabled people. • (of an object, service, or facility) able to be easily obtained or used : making learning opportunities more accessible to adults. • easily understood : his Latin grammar is lucid and accessible. • able to be reached or entered by people in wheelchairs : it provides specialized features such as nonslip floors and accessible entrances. 2 (of a person, typically one in a position of authority or importance) friendly and easy to talk to; approachable : he is more accessible than most tycoons. Monday, October 25, 2010
  • Accessible information is easy to: find, determine what one can do with it, acquire, and use Monday, October 25, 2010
  • Factors that affect accessibility: law; technology; culture; semantics; and economics Monday, October 25, 2010
  • Law makes sharing permissible; technology makes it possible; culture makes it acceptable; semantics make it understandable; and economics affordable Monday, October 25, 2010
  • It is permissible, acceptable, and affordable to access public sector information, but not necessarily possible or understandable Monday, October 25, 2010
  • Goals of the new storage: make the information technologically and semantically accessible Monday, October 25, 2010
  • Allow access by providing user- interface, application programming interface and documentation Monday, October 25, 2010