Free GIS and Interoperability GIS Open Source, interoperabilità e cultura del dato  nei SIAT della Pubblica Amministrazione [GIS Open Source, interoperability and the 'culture of data'   in the spatial data warehouses of the Public Administration] GFOSS'04 ITC-irst, 16 Nov 2004 (last revised 10 2005) M. Neteler neteler at itc it http://mpa.itc.it  ITC-irst, Povo (Trento), Italy
The need for Interoperability The problem nowadays data have to be exchanged across often very heterogeneous groups
the personal choice of application software/operating system should not affect the data exchange
data exchange standards are available
limited awareness  for the need of interoperability
limited implementation of interoperability in processes and software
commonly used file formats let to believe in interoperability: “false friends”
What are Standardization & Interoperability? Standardization versus Interoperability  Standardization:  Written/published document describing data formats, models etc. Example Office Standards: ASCII, HTML, XML, ... Example GIS Standards: GML, ISO 08211, ISO/IEC 15444-1, WMS etc. Only  published standards  are acceptable. Interoperability:  More than application of standardization, it also comprises the   interpretation of the standard  (sometimes definitions are incomplete)
Interoperability? The two dimensions of Interoperability  Longitudinal Interoperability:  time - long term storage Data shall be readable over time (years, decades, ...). This is of particular interest for data of public administration and long-term projects. Transversal Interoperability:  sharing data between users Data shall be readable across user communities, independent from software or operating system used (freedom of software choice). Again, this is of particular interest for data of public administration and long-term projects.
Part I: Office Interoperability
Example: MS-Word .DOC format Are WORD.doc files a suitable for data exchange? the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability
the format is regularly changed (Word 1, 2, 95, 97, NT, 2000, XP, ...   also named WinWORD 6, 8, 10,...) -> does not support longitudinal interoperability
Prone to MS-Windows macro viruses
severe security/privacy issues (example next slide) - DOC files contain sensitive information about user (unrelated   to the contents) - deleted text may still be legible outside of MS-Word -> contents cannot be completely verified
Example: MS-Word .DOC format -  security/privacy issues  Descrambling a WORD.doc file Your unique MS-Windows user ID (or similar):    PID_GUIDäAN{714738E3-FF4C-11D3-ZD7C-00E0281D67A7} This makes your (anonymous) document  traceable .
Sometimes delete text is still visible (think of re-using an existing WORD file) A famous example:  In February 2003, the British government of Tony Blair published a  dossier on  Iraq's security and intelligence organizations . This dossier was cited by  Colin Powell in his address to the United Nations the same month. Dr. Glen Rangwala, a lecturer in politics at Cambridge University, quickly  discovered that much of the material in the dossier was actually plagiarized from a U.S. researcher on Iraq. http://www.computerbytesman.com/privacy/blair.htm What you may find:
Descrambling a WORD.doc file: The British Iraq dossier 2003 1/2 http://nytimes.com Example: MS-Word .DOC format -  security/privacy issues
[neteler@dandre2 gfoss04]$  tr -d [:cntrl:] < blair.doc ÐÏࡱá>þÿz|þÿÿÿyÿ  [...] -xxxxí-o#o#{'?^,k6®äí-* RûuËÂG (É-$IRAQ  ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONThis report draws upon a number of sources, including intelligence material, and shows how the Iraqi regime is constructed to have,  and to keep, WMD, and is now engaged in a campaign of obstruction of the  United Nations Weapons Inspectors. [...] [`azbhh§h»h?h-i/isjÿÿ cic22 JC:\DOCUME~1\ phamill \LOCALS~1\Temp\AutoRecovery save of Iraq - security.asd cic22 JC:\DOCUME~1\ phamill \LOCALS~1\Temp\AutoRecovery save of Iraq -  security.asd cic22 JC:\DOCUME~1\ phamill \LOCALS~1\Temp\AutoRecovery save of Iraq -  security.asd JPratt C:\TEMP\Iraq - security.doc JPratt A:\Iraq - security.doc ablackshaw!C:\ ABlackshaw \Iraq - security.docablackshaw#C:\ ABlackshaw \A;Iraq - security.doc ablackshaw A:\Iraq - security.doc MKhan C:\TEMP\Iraq - security.doc MKhan (C:\WINNT\Profiles\mkhan\Desktop\Iraq.docþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ PjÿzXVÿ*uzLl_ÿbêzLl_ [...] jP@GTimes New Roman5SymbolG&ArialHelveticaA&Arial Narrow?&Arial Black&quot;qÐh_r&Òr&aõq#JV,?RVW,º!¥À??20døi?fÿÿCIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONdefaultMKhanþÿàòùOh«+'³Ù0? ìø 4DPlx?¬?äDIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONraqdefaultefaefaNormal.dotN MKhan .d4ha Microsoft Word 8.0 C@ÒIk@n)§ÈÂ@&quot;ZöfËÂ@døèuËÂ#JVþÿÕÍÕ [...] http://www.computerbytesman.com/privacy/blair.htm Weapons of mass destruction Descrambling a WORD.doc file: The British Iraq dossier 2003 2/2 Example: MS-Word .DOC format -  security/privacy issues
Example: MS-Excel .XLS format Are EXCEL.xls files a suitable for data exchange? the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability
the format is regularly changed (Excel 95, 97, NT, 2000, ...) -> does not support longitudinal interoperability
Prone to MS-Windows viruses
Limitation: max. 65535 lines in a table (2 16 )
Auto-conversion feature risky: Some fields/columns are automatically changed to date-time format (see example next slides) -> risk of accidental data damage high
Example: MS-Excel .XLS format – accidental data damage The “Human Genome Project” case 1/3 In 2004 scientists discovered that some gene names were being changed  inadvertently to non-gene names. Citation: “ A little detective work traced  the problem to default date format conversions and  floating-point format conversions  in the very useful  Excel  program package.  The date conversions affect  at least 30 gene names ; the floating-point conversions  affect at least 2,000 if Riken identifiers are included.  These conversions are  irreversible ; the original gene names cannot be recovered. A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was  altering gene names that it considered to look like dates . For example, the tumor  suppressor DEC1 [Deleted in Esophageal Cancer 1] [3] was being converted  to '1-DEC.' ” Cited after: B.R. Zeeberg, J. Riss, D.W. Kane, K.J. Bussey, E. Uchio, W.M. Linehan,  J.C. Barrett and J.N. Weinstein, BMC Bioinformatics 2004, 5:80 http://dx.doi.org/10.1186/1471-2105-5-80
The “Human Genome Project” case 2/3 Example: MS-Excel .XLS format – accidental data damage http://dx.doi.org/10.1186/1471-2105-5-80
The “Human Genome Project” case 3/3 Example: MS-Excel .XLS format – accidental data damage http://dx.doi.org/10.1186/1471-2105-5-80

The need of Interoperability in Office and GIS formats

  • 1.
    Free GIS andInteroperability GIS Open Source, interoperabilità e cultura del dato nei SIAT della Pubblica Amministrazione [GIS Open Source, interoperability and the 'culture of data' in the spatial data warehouses of the Public Administration] GFOSS'04 ITC-irst, 16 Nov 2004 (last revised 10 2005) M. Neteler neteler at itc it http://mpa.itc.it ITC-irst, Povo (Trento), Italy
  • 2.
    The need forInteroperability The problem nowadays data have to be exchanged across often very heterogeneous groups
  • 3.
    the personal choiceof application software/operating system should not affect the data exchange
  • 4.
  • 5.
    limited awareness for the need of interoperability
  • 6.
    limited implementation ofinteroperability in processes and software
  • 7.
    commonly used fileformats let to believe in interoperability: “false friends”
  • 8.
    What are Standardization& Interoperability? Standardization versus Interoperability Standardization: Written/published document describing data formats, models etc. Example Office Standards: ASCII, HTML, XML, ... Example GIS Standards: GML, ISO 08211, ISO/IEC 15444-1, WMS etc. Only published standards are acceptable. Interoperability: More than application of standardization, it also comprises the interpretation of the standard (sometimes definitions are incomplete)
  • 9.
    Interoperability? The twodimensions of Interoperability Longitudinal Interoperability: time - long term storage Data shall be readable over time (years, decades, ...). This is of particular interest for data of public administration and long-term projects. Transversal Interoperability: sharing data between users Data shall be readable across user communities, independent from software or operating system used (freedom of software choice). Again, this is of particular interest for data of public administration and long-term projects.
  • 10.
    Part I: OfficeInteroperability
  • 11.
    Example: MS-Word .DOCformat Are WORD.doc files a suitable for data exchange? the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability
  • 12.
    the format isregularly changed (Word 1, 2, 95, 97, NT, 2000, XP, ... also named WinWORD 6, 8, 10,...) -> does not support longitudinal interoperability
  • 13.
    Prone to MS-Windowsmacro viruses
  • 14.
    severe security/privacy issues(example next slide) - DOC files contain sensitive information about user (unrelated to the contents) - deleted text may still be legible outside of MS-Word -> contents cannot be completely verified
  • 15.
    Example: MS-Word .DOCformat - security/privacy issues Descrambling a WORD.doc file Your unique MS-Windows user ID (or similar): PID_GUIDäAN{714738E3-FF4C-11D3-ZD7C-00E0281D67A7} This makes your (anonymous) document traceable .
  • 16.
    Sometimes delete textis still visible (think of re-using an existing WORD file) A famous example: In February 2003, the British government of Tony Blair published a dossier on Iraq's security and intelligence organizations . This dossier was cited by Colin Powell in his address to the United Nations the same month. Dr. Glen Rangwala, a lecturer in politics at Cambridge University, quickly discovered that much of the material in the dossier was actually plagiarized from a U.S. researcher on Iraq. http://www.computerbytesman.com/privacy/blair.htm What you may find:
  • 17.
    Descrambling a WORD.docfile: The British Iraq dossier 2003 1/2 http://nytimes.com Example: MS-Word .DOC format - security/privacy issues
  • 18.
    [neteler@dandre2 gfoss04]$ tr -d [:cntrl:] < blair.doc ÐÏࡱá>þÿz|þÿÿÿyÿ [...] -xxxxí-o#o#{'?^,k6®äí-* RûuËÂG (É-$IRAQ ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONThis report draws upon a number of sources, including intelligence material, and shows how the Iraqi regime is constructed to have, and to keep, WMD, and is now engaged in a campaign of obstruction of the United Nations Weapons Inspectors. [...] [`azbhh§h»h?h-i/isjÿÿ cic22 JC:\DOCUME~1\ phamill \LOCALS~1\Temp\AutoRecovery save of Iraq - security.asd cic22 JC:\DOCUME~1\ phamill \LOCALS~1\Temp\AutoRecovery save of Iraq - security.asd cic22 JC:\DOCUME~1\ phamill \LOCALS~1\Temp\AutoRecovery save of Iraq - security.asd JPratt C:\TEMP\Iraq - security.doc JPratt A:\Iraq - security.doc ablackshaw!C:\ ABlackshaw \Iraq - security.docablackshaw#C:\ ABlackshaw \A;Iraq - security.doc ablackshaw A:\Iraq - security.doc MKhan C:\TEMP\Iraq - security.doc MKhan (C:\WINNT\Profiles\mkhan\Desktop\Iraq.docþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ PjÿzXVÿ*uzLl_ÿbêzLl_ [...] jP@GTimes New Roman5SymbolG&ArialHelveticaA&Arial Narrow?&Arial Black&quot;qÐh_r&Òr&aõq#JV,?RVW,º!¥À??20døi?fÿÿCIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONdefaultMKhanþÿàòùOh«+'³Ù0? ìø 4DPlx?¬?äDIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION AND INTIMIDATIONraqdefaultefaefaNormal.dotN MKhan .d4ha Microsoft Word 8.0 C@ÒIk@n)§ÈÂ@&quot;ZöfËÂ@døèuËÂ#JVþÿÕÍÕ [...] http://www.computerbytesman.com/privacy/blair.htm Weapons of mass destruction Descrambling a WORD.doc file: The British Iraq dossier 2003 2/2 Example: MS-Word .DOC format - security/privacy issues
  • 19.
    Example: MS-Excel .XLSformat Are EXCEL.xls files a suitable for data exchange? the format is undocumented, to some extend it was reverse-engineered -> does not support transversal interoperability
  • 20.
    the format isregularly changed (Excel 95, 97, NT, 2000, ...) -> does not support longitudinal interoperability
  • 21.
  • 22.
    Limitation: max. 65535lines in a table (2 16 )
  • 23.
    Auto-conversion feature risky:Some fields/columns are automatically changed to date-time format (see example next slides) -> risk of accidental data damage high
  • 24.
    Example: MS-Excel .XLSformat – accidental data damage The “Human Genome Project” case 1/3 In 2004 scientists discovered that some gene names were being changed inadvertently to non-gene names. Citation: “ A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names ; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible ; the original gene names cannot be recovered. A default date conversion feature in Excel (Microsoft Corp., Redmond, WA) was altering gene names that it considered to look like dates . For example, the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] [3] was being converted to '1-DEC.' ” Cited after: B.R. Zeeberg, J. Riss, D.W. Kane, K.J. Bussey, E. Uchio, W.M. Linehan, J.C. Barrett and J.N. Weinstein, BMC Bioinformatics 2004, 5:80 http://dx.doi.org/10.1186/1471-2105-5-80
  • 25.
    The “Human GenomeProject” case 2/3 Example: MS-Excel .XLS format – accidental data damage http://dx.doi.org/10.1186/1471-2105-5-80
  • 26.
    The “Human GenomeProject” case 3/3 Example: MS-Excel .XLS format – accidental data damage http://dx.doi.org/10.1186/1471-2105-5-80
  • 27.
    Suggestions for “Office”data interoperability Text files: ASCII, HTML, RTF, XML, Latex Postscript/PDF for read-only documents
  • 28.
    Tables: CSV, xBase(dBase), XML
  • 29.
  • 30.
  • 31.
    Suggestions for “Office”data interoperability Automated conversion tools can be used to provide all formats Text files: ASCII, HTML, RTF, XML Postscript/PDF
  • 32.
    Tables: CSV, xBase(dBase), XML
  • 33.
  • 34.
    Bibliography: BibTex Converters(examples): OpenOffice.org [1]
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
    Bibtex2html [5], (Endnote)[1] http://OpenOffice.org itself uses XML as own standard format [2] http://wvware.sourceforge.net/ [3] http://www.klaban.torun.pl/prog/pg2xbase/ [4] http://www.scripps.edu/~cdputnam/software/bibutils/bibutils.html [5] http://www.lri.fr/~filliatr/bibtex2html/
  • 40.
    OASIS: “Office” datainteroperability Promotion of Open Document Exchange Format Proposed and implemented new open standard format: OASIS OpenDocument XML format
  • 41.
    The OASIS OpenDocumentformat [1] is a vendor and implementation independent file format which guarantees freedom and independence
  • 42.
    E.g., OpenOffice.org uses OASIS as default format from version 2.0 onwards as well as KOffice , StarOffice software and other vendors The OASIS OpenDocument file format is one of the file formats recommended by the European Commision [2] [1] http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office [2] http://europa.eu.int/idabc/en/document/3439
  • 43.
  • 44.
    GIS Standards andOrganizations GIS data sets are more than geometry: Metadata - geographic reference - colors, display attributes etc - history of data modifications 1990 1992 2004 1994 1997 http://www.opengeospatial.org
  • 45.
    GIS Interoperability: GDALand OGR libraries Data abstraction GDAL http://www.gdal.org Abstraction layer ENVI GeoTIFF SAR GRASS ECW HDF4 JPEG2000 MrSID ArcGRID Metadata - Number of bands - Color table - ... - Coordinate system - Projection 40 Frmts EPSG Codes PROJ.4
  • 46.
    GIS Interoperability: GDALand OGR libraries Data abstraction OGR http://www.gdal.org/ogr/ Metadata - Coordinate system - Projection Abstraction layer EPSG Codes ArcCover MITAB Oracle SHAPE PostGIS Geodatabase DGN 20 Frmts
  • 47.
    GIS Data formatsand support question GDAL Development: Raster formats Direct fundings: - Atlantis (ENVISAT, MFF, HKV Blobs) - eCognition Germany (FUJI BAS Format) - Los Alamos Nat. Labs (FITS) - OPeNDAP Inc. (OPeNDAP/DODS) - PeopleSoft ( ERDAS LAN ) - Safe Software (USGS SDTS, ISO8211 support) - Yukon Department of Environment (USGS DEM) Public formats/Open documents/Reverse engineered - ERDAS Imagine ( IMG ) - ERMAPPER ( ECW ) - ESRI formats ( ArcGrid ) - GDAL Virtual Format - JasPer ( JPEG2000 ); Kakadu (GeoJP2 interface for JPEG2000 = ISO/IEC 15444-1) - LizardTech ( MrSID , JPEG2000 ) - NOAA (AVHRR data)
  • 48.
    GIS Data formatsand support question OGR Development: Vector formats Direct fundings: - DM Solutions Group and GoMOOS ( SQLite RDBMS, Comma Sep. Values CSV ) - OPeNDAP Inc. (OPeNDAP/DODS) - Safe Software (FMEObjects) - SRC, LLC ( Oracle Spatial ) Public formats/Open documents/Reverse engineered - ESRI ( SHAPE , ArcCoverage ) - GML - IHO S-57 - MapInfo ( TAB and MIF/MID ) - Microsoft ( ODBC OGR) - Microstation ( DGN ) - MySQL (non-spatial data) OGR - OGDI Vectors (VMAP) - OGR Virtual Format - PostgreSQL/PostGIS - SDTS - UK Ordnance Survey (NTF) - U.S. Census (TIGER)
  • 49.
    GIS formats Whyso many formats? No big problem! Application specific requirements, which partially contradict each other high compression rate
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
    Software patents and rights of third parties: future traps ?!
  • 57.
    GIS formats andSoftware Patents How software patents affect GIS users LZW (Lempel Ziv Welch) Compression Used in many raster formats (e.g. GIF)
  • 58.
    Integrated into GRASSbefore it became patent, later replaced by Zlib Deflate
  • 59.
    Unisys started tocharge for usage after waiting some years MrSID (Multi-resolution Seamless Image Database) wavelet based image file format
  • 60.
    three patents coveringboth the image compression and on the fly image decompression technology
  • 61.
    GDAL support MrSIDbut requires MrSID SDK license ECW (ERMAPPER Compressed Wavelets) Patent pending
  • 62.
    GPL released sourcecode available (of patented code?) JPEG 2000 Situation not very clear
  • 63.
    Summary The personalchoice of application software/operating system should not affect the data exchange longitudinal and transversal interoperability must be granted
  • 64.
    Only documentedformats may be used
  • 65.
    There is noexcuse: start to use interoperable formats today
  • 66.
    GIS interoperability isat a better state than Office documents interoperability
  • 67.
    Interoperability awareness needs to be promoted : today and in future
  • 68.
    License of thisdocument Document home: http://mpa.itc.it/gfoss04/neteler_gfoss04_interoperability2005.pdf This work is licensed under a Creative Commons License. http://creativecommons.org/licenses/by-sa/2.0/deed.en “ Free GIS and Interoperability”, © 2004-2005 Markus Neteler [ OpenOffice SXI file available upon request: neteler at itc it neteler at osgeo org ] License details: Attribution-ShareAlike 2.0 You are free: to copy, distribute, display, and perform the work
  • 69.
  • 70.
    to make commercialuse of the work
  • 71.
    Under the followingconditions: Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.