3 kishor


Published on

Accessibility is a complex property of information, a composite of several different indicators that measure the ease with which one can get to and work with information from another source. These different indicators of accessibility scale differently with the size and scope of information. As the datasets get bigger, some indicators remain constant, some scale linearly, while others may scale in a more accelerated manner. At the University of Wisconsin Carbon Modeling group, we are building a very large, spatially accessible U.S. national climate dataset for ecosystem process modeling that is bringing up unique challenges and lessons for accessibility in the multi-Terabyte scale datasets. This paper describes the problem, and the information accessibility challenges and lessons learned while putting together its solution.

Published in: Technology, Travel
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

3 kishor

  1. 1. Accessibility Considerations forVery Large Datasets Puneet Kishor University of Wisconsin-Madison and Creative Commons Monday, October 25, 2010
  2. 2. Acknowledgments to CODATA for inviting me, Creative Commons for funding my trip, University of Wisconsin-Madison for paying my salary, and most importantly, the US Federal Goverment for making all the data available to anyone, anywhere without any pre-conditions Monday, October 25, 2010
  3. 3. Research context: ecosystem process modeling of very large terrestrial ecosystems Monday, October 25, 2010
  4. 4. Information by numbers Monday, October 25, 2010
  5. 5. 7 daily variablestm axtm in taveprcpsrad vpd dayl Monday, October 25, 2010
  6. 6. 1 km2 cell 1 km 1 km tm axtm in taveprcpsrad vpd dayl Monday, October 25, 2010
  7. 7. 13million cells 4587 2889 .25 Monday, October 25, 2010
  8. 8. 8401days 8400 Monday, October 25, 2010
  9. 9. 111billion septets .32 tm axtm in taveprcpsrad vpd dayl 111.32b Monday, October 25, 2010
  10. 10. 725raw gigabytes .78 Monday, October 25, 2010
  11. 11. 10times as much in a database Monday, October 25, 2010
  12. 12. 84GB of NetCDF format in tar gzipped archives Monday, October 25, 2010
  13. 13. 2 3 4 5 8 9 10 11 6 12 7 13 14 1 4˚square chunks Monday, October 25, 2010
  14. 14. “½”incomplete documentation Monday, October 25, 2010
  15. 15. 0ways to query the data ?X Monday, October 25, 2010
  16. 16. 1. Acquire NetCDF file of lat/lon values for each cell from the weather data 1 km2 estimates 2. Dump lat/lon values to CSV with Panoply 3. Import into ArcMap as XY data 4. Export as shapefile 5. Assign WGS84 datum to shapefile in ArcCatalog 6. Reproject to Lambert Spherical (“US National Atlas Equal Area”) 7. Separate by 2x2 degree tile using "tile_num" attribute (so grid will match the netCDF met files) using defination query in ArcMap and exporting to individual shapefiles (256 tiles) as "mask". 8. Open lambert points in qGIS and make 1km grid (shapefile) for each 2x2 tile 9. Assign projection to output (EPSG:2163) 10. Add each new grid shapefile (one at a time) to ArcMap with 2x2 Grid as separate layer 11. Select by location (select from grid x that intersect mask x) 12. Export selected features of grid x (now will be numbered sequentially by record in a way that matches the met NetCDF “ncells”) 13. Clean up: delete extra fields from qGIS (ID,MAXX,MINX,MAXY,MINY) add ncell_id (FID +1) block_id, block_name “10”times the work to unpack the data Monday, October 25, 2010
  17. 17. Many kinds of queries f<variable> <location> <point in time> avg(srad) at x,y on Dec 2, 2001 tmin for area on May 19, 1992 tmax at x,y on May 19, 1992 f<variable> <point location> <duration of time> tave at x,y during the first quarter of 1983 sum(vpd) at x,y during the last week of Mar, 2003 Monday, October 25, 2010
  18. 18. accessible¦aksesəbəl¦ adjective 1 (of a place) able to be reached or entered : the town is accessible by bus | the building has been made accessible to disabled people. • (of an object, service, or facility) able to be easily obtained or used : making learning opportunities more accessible to adults. • easily understood : his Latin grammar is lucid and accessible. • able to be reached or entered by people in wheelchairs : it provides specialized features such as nonslip floors and accessible entrances. 2 (of a person, typically one in a position of authority or importance) friendly and easy to talk to; approachable : he is more accessible than most tycoons. Monday, October 25, 2010
  19. 19. Accessible information is easy to: find, determine what one can do with it, acquire, and use Monday, October 25, 2010
  20. 20. Factors that affect accessibility: law; technology; culture; semantics; and economics Monday, October 25, 2010
  21. 21. Law makes sharing permissible; technology makes it possible; culture makes it acceptable; semantics make it understandable; and economics affordable Monday, October 25, 2010
  22. 22. It is permissible, acceptable, and affordable to access public sector information, but not necessarily possible or understandableMonday, October 25, 2010
  23. 23. Goals of the new storage: make the information technologically and semantically accessible Monday, October 25, 2010
  24. 24. Allow access by providing user- interface, application programming interface and documentation Monday, October 25, 2010