The need of Interoperability in Office and GIS formats

1,745 views
1,664 views

Published on

Free GIS and Interoperability: The need of Interoperability in Office and GIS formats

GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione

[GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration]

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,745
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
52
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The need of Interoperability in Office and GIS formats

  1. 1. Free GIS and Interoperability GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione [GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration] GFOSS'04 ITC-irst, 16 Nov 2004 (last revised 10 2005) M. Neteler neteler at itc it http://mpa.itc.it ITC-irst, Povo (Trento), Italy
  2. 2. The need for Interoperability The problem <ul><li>nowadays data have to be exchanged across often very heterogeneous groups
  3. 3. the personal choice of application software/operating system should not affect the data exchange
  4. 4. data exchange standards are available
  5. 5. limited awareness for the need of interoperability
  6. 6. limited implementation of interoperability in processes and software
  7. 7. commonly used file formats let to believe in interoperability: “false friends” </li></ul>
  8. 8. What are Standardization & Interoperability? Standardization versus Interoperability Standardization: Written/published document describing data formats, models etc. Example Office Standards: ASCII, HTML, XML, ... Example GIS Standards: GML, ISO 08211, ISO/IEC 15444-1, WMS etc. Only published standards are acceptable. Interoperability: More than application of standardization, it also comprises the interpretation of the standard (sometimes definitions are incomplete)
  9. 9. Interoperability? The two dimensions of Interoperability Longitudinal Interoperability: time - long term storage Data shall be readable over time (years, decades, ...). This is of particular interest for data of public administration and long-term projects. Transversal Interoperability: sharing data between users Data shall be readable across user communities, independent from software or operating system used (freedom of software choice). Again, this is of particular interest for data of public administration and long-term projects.
  10. 10. Part I: Office Interoperability
  11. 11. Example: MS-Word .DOC format Are WORD.doc files a suitable for data exchange? <ul><li>the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability
  12. 12. the format is regularly changed (Word 1, 2, 95, 97, NT, 2000, XP, ... also named WinWORD 6, 8, 10,...) -> does not support longitudinal interoperability
  13. 13. Prone to MS-Windows macro viruses
  14. 14. severe security/privacy issues (example next slide) - DOC files contain sensitive information about user (unrelated to the contents) - deleted text may still be legible outside of MS-Word -> contents cannot be completely verified </li></ul>
  15. 15. Example: MS-Word .DOC format - security/privacy issues Descrambling a WORD.doc file <ul><li>Your unique MS-Windows user ID (or similar): PID_GUIDäAN{714738E3-FF4C-11D3-ZD7C-00E0281D67A7} This makes your (anonymous) document traceable .
  16. 16. Sometimes delete text is still visible (think of re-using an existing WORD file) A famous example: In February 2003, the British government of Tony Blair published a dossier on Iraq's security and intelligence organizations . This dossier was cited by Colin Powell in his address to the United Nations the same month. Dr. Glen Rangwala, a lecturer in politics at Cambridge University, quickly discovered that much of the material in the dossier was actually plagiarized from a U.S. researcher on Iraq. http://www.computerbytesman.com/privacy/blair.htm </li></ul>What you may find:
  17. 17. Descrambling a WORD.doc file: The British Iraq dossier 2003 1/2 http://nytimes.com Example: MS-Word .DOC format - security/privacy issues
  18. 18. [neteler@dandre2 gfoss04]$ tr -d [:cntrl:] < blair.doc ÐÏࡱá>þÿz|þÿÿÿyÿ [...] -xxxxí-o#o#{'?^,k6®äí-* RûuËÂG (É-$IRAQ ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONThis report draws upon a number of sources, including intelligence material, and shows how the Iraqi regime is constructed to have, and to keep, WMD, and is now engaged in a campaign of obstruction of the United Nations Weapons Inspectors. [...] [`azbhh§h»h?h-i/isjÿÿ cic22 JC:DOCUME~1 phamill LOCALS~1TempAutoRecovery save of Iraq - security.asd cic22 JC:DOCUME~1 phamill LOCALS~1TempAutoRecovery save of Iraq - security.asd cic22 JC:DOCUME~1 phamill LOCALS~1TempAutoRecovery save of Iraq - security.asd JPratt C:TEMPIraq - security.doc JPratt A:Iraq - security.doc ablackshaw!C: ABlackshaw Iraq - security.docablackshaw#C: ABlackshaw A;Iraq - security.doc ablackshaw A:Iraq - security.doc MKhan C:TEMPIraq - security.doc MKhan (C:WINNTProfilesmkhanDesktopIraq.docþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ PjÿzXVÿ*uzLl_ÿbêzLl_ [...] jP@GTimes New Roman5SymbolG&ArialHelveticaA&Arial Narrow?&Arial Black&quot;qÐh_r&Òr&aõq#JV,?RVW,º!¥À??20døi?fÿÿCIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONdefaultMKhanþÿàòùOh«+'³Ù0? ìø 4DPlx?¬?äDIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONraqdefaultefaefaNormal.dotN MKhan .d4ha Microsoft Word 8.0 C@ÒIk@n)§ÈÂ@&quot;ZöfËÂ@døèuËÂ#JVþÿÕÍÕ [...] http://www.computerbytesman.com/privacy/blair.htm Weapons of mass destruction Descrambling a WORD.doc file: The British Iraq dossier 2003 2/2 Example: MS-Word .DOC format - security/privacy issues
  19. 19. Example: MS-Excel .XLS format Are EXCEL.xls files a suitable for data exchange? <ul><li>the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability
  20. 20. the format is regularly changed (Excel 95, 97, NT, 2000, ...) -> does not support longitudinal interoperability
  21. 21. Prone to MS-Windows viruses
  22. 22. Limitation: max. 65535 lines in a table (2 16 )
  23. 23. Auto-conversion feature risky: Some fields/columns are automatically changed to date-time format (see example next slides) -> risk of accidental data damage high </li></ul>
  24. 24. Example: MS-Excel .XLS format – accidental data damage The “Human Genome Project” case 1/3 <ul><li>In 2004 scientists discovered that some gene names were being changed inadvertently to non-gene names. Citation: “ A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names ; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible ; the original gene names cannot be recovered. A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates . For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] [3] was being converted to '1-DEC.' ” </li></ul>Cited after: B.R. Zeeberg, J. Riss, D.W. Kane, K.J. Bussey, E. Uchio, W.M. Linehan, J.C. Barrett and J.N. Weinstein, BMC Bioinformatics 2004, 5:80 http://dx.doi.org/10.1186/1471-2105-5-80
  25. 25. The “Human Genome Project” case 2/3 Example: MS-Excel .XLS format – accidental data damage http://dx.doi.org/10.1186/1471-2105-5-80
  26. 26. The “Human Genome Project” case 3/3 Example: MS-Excel .XLS format – accidental data damage http://dx.doi.org/10.1186/1471-2105-5-80
  27. 27. Suggestions for “Office” data interoperability <ul><li>Text files: ASCII, HTML, RTF, XML, Latex Postscript/PDF for read-only documents
  28. 28. Tables: CSV, xBase (dBase), XML
  29. 29. Databases: SQL92-ASCII
  30. 30. Bibliography: BibTex </li></ul>
  31. 31. Suggestions for “Office” data interoperability Automated conversion tools can be used to provide all formats <ul><li>Text files: ASCII, HTML, RTF, XML Postscript/PDF
  32. 32. Tables: CSV, xBase (dBase), XML
  33. 33. Databases: SQL92-ASCII
  34. 34. Bibliography: BibTex </li></ul>Converters (examples): <ul><li>OpenOffice.org [1]
  35. 35. wvWare [2[
  36. 36. OpenOffice.org, xbase2pg [3]
  37. 37. ODBC, xbase2pg
  38. 38. Bibutils [4]
  39. 39. Bibtex2html [5], (Endnote) </li></ul>[1] http://OpenOffice.org itself uses XML as own standard format [2] http://wvware.sourceforge.net/ [3] http://www.klaban.torun.pl/prog/pg2xbase/ [4] http://www.scripps.edu/~cdputnam/software/bibutils/bibutils.html [5] http://www.lri.fr/~filliatr/bibtex2html/
  40. 40. OASIS: “Office” data interoperability Promotion of Open Document Exchange Format <ul><li>Proposed and implemented new open standard format: OASIS OpenDocument XML format
  41. 41. The OASIS OpenDocument format [1] is a vendor and implementation independent file format which guarantees freedom and independence
  42. 42. E.g., OpenOffice.org uses OASIS as default format from version 2.0 onwards as well as KOffice , StarOffice software and other vendors </li></ul>The OASIS OpenDocument file format is one of the file formats recommended by the European Commision [2] [1] http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office [2] http://europa.eu.int/idabc/en/document/3439
  43. 44. GIS Standards and Organizations GIS data sets are more than geometry: Metadata - geographic reference - colors, display attributes etc - history of data modifications 1990 1992 2004 1994 1997 http://www.opengeospatial.org
  44. 45. GIS Interoperability: GDAL and OGR libraries Data abstraction GDAL http://www.gdal.org Abstraction layer ENVI GeoTIFF SAR GRASS ECW HDF4 JPEG2000 MrSID ArcGRID Metadata - Number of bands - Color table - ... - Coordinate system - Projection 40 Frmts EPSG Codes PROJ.4
  45. 46. GIS Interoperability: GDAL and OGR libraries Data abstraction OGR http://www.gdal.org/ogr/ Metadata - Coordinate system - Projection Abstraction layer EPSG Codes ArcCover MITAB Oracle SHAPE PostGIS Geodatabase DGN 20 Frmts
  46. 47. GIS Data formats and support question GDAL Development: Raster formats Direct fundings: - Atlantis (ENVISAT, MFF, HKV Blobs) - eCognition Germany (FUJI BAS Format) - Los Alamos Nat. Labs (FITS) - OPeNDAP Inc. (OPeNDAP/DODS) - PeopleSoft ( ERDAS LAN ) - Safe Software (USGS SDTS, ISO8211 support) - Yukon Department of Environment (USGS DEM) Public formats/Open documents/Reverse engineered - ERDAS Imagine ( IMG ) - ERMAPPER ( ECW ) - ESRI formats ( ArcGrid ) - GDAL Virtual Format - JasPer ( JPEG2000 ); Kakadu (GeoJP2 interface for JPEG2000 = ISO/IEC 15444-1) - LizardTech ( MrSID , JPEG2000 ) - NOAA (AVHRR data)
  47. 48. GIS Data formats and support question OGR Development: Vector formats Direct fundings: - DM Solutions Group and GoMOOS ( SQLite RDBMS, Comma Sep. Values CSV ) - OPeNDAP Inc. (OPeNDAP/DODS) - Safe Software (FMEObjects) - SRC, LLC ( Oracle Spatial ) Public formats/Open documents/Reverse engineered - ESRI ( SHAPE , ArcCoverage ) - GML - IHO S-57 - MapInfo ( TAB and MIF/MID ) - Microsoft ( ODBC OGR) - Microstation ( DGN ) - MySQL (non-spatial data) OGR - OGDI Vectors (VMAP) - OGR Virtual Format - PostgreSQL/PostGIS - SDTS - UK Ordnance Survey (NTF) - U.S. Census (TIGER)
  48. 49. GIS formats Why so many formats? No big problem! Application specific requirements, which partially contradict each other <ul><li>high compression rate
  49. 50. small runtime storage requirements
  50. 51. coding without information loss
  51. 52. fast decoding
  52. 53. easy access to pixels
  53. 54. simple algorithm
  54. 55. Hardware-/CPU-independence “Good software” can handle numerous formats.
  55. 56. Software patents and rights of third parties: future traps ?! </li></ul>
  56. 57. GIS formats and Software Patents How software patents affect GIS users LZW (Lempel Ziv Welch) Compression <ul><li>Used in many raster formats (e.g. GIF)
  57. 58. Integrated into GRASS before it became patent, later replaced by Zlib Deflate
  58. 59. Unisys started to charge for usage after waiting some years </li></ul>MrSID (Multi-resolution Seamless Image Database) <ul><li>wavelet based image file format
  59. 60. three patents covering both the image compression and on the fly image decompression technology
  60. 61. GDAL support MrSID but requires MrSID SDK license </li></ul>ECW (ERMAPPER Compressed Wavelets) <ul><li>Patent pending
  61. 62. GPL released source code available (of patented code?) </li></ul>JPEG 2000 <ul><li>Situation not very clear </li></ul>
  62. 63. Summary <ul><li>The personal choice of application software/operating system should not affect the data exchange </li></ul><ul><li>longitudinal and transversal interoperability must be granted
  63. 64. Only documented formats may be used
  64. 65. There is no excuse: start to use interoperable formats today
  65. 66. GIS interoperability is at a better state than Office documents interoperability
  66. 67. Interoperability awareness needs to be promoted : today and in future </li></ul>
  67. 68. License of this document Document home: http://mpa.itc.it/gfoss04/neteler_gfoss04_interoperability2005.pdf This work is licensed under a Creative Commons License. http://creativecommons.org/licenses/by-sa/2.0/deed.en “ Free GIS and Interoperability”, © 2004-2005 Markus Neteler [ OpenOffice SXI file available upon request: neteler at itc it neteler at osgeo org ] License details: Attribution-ShareAlike 2.0 You are free: <ul><ul><li>to copy, distribute, display, and perform the work
  68. 69. to make derivative works
  69. 70. to make commercial use of the work
  70. 71. Under the following conditions: Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. </li></ul></ul>

×