Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(Research) Data Management in Archaeology

511 views

Published on

Lecture at the three-week Summer School "Data Curation" for archaeologists from Sudan, Yemen, Libya, Palestine and Tunisia at the German Archaeological Institute in Berlin from 16 July to 5 August 2017.

The workshop was planned together with the Arab League Educational, Cultural and Scientific Organization (ALECSO), as well as with the Sudanese Anti-National Service for NCAM (National Corporation for Antiquities and Museums).

Published in: Education
  • Be the first to comment

  • Be the first to like this

(Research) Data Management in Archaeology

  1. 1. COORDINATION FUNDING Maurice Heinrich (Research) Data Management in Archaeology Summer School ALECSO / NCAM DAI Berlin  July 27., 2017
  2. 2. AGENDA 2 1. Research Data Center – IANUS 2. Digital Research Data in Ancient Studies 3. Data Formats 4. Problems & Challenges 5. Data Management 6. Save – Back Up – Archive 7. Best Practices
  3. 3. 3 1. WHAT IANUS IS »» financed by the DFG »» coordination for the community »» 2011–2014: requirements analysis, inspections, conception »»  2015–2017: implementing, test operations, start archiving »» ab 2018: regular operations »» 9 employees (4 FTE, 5 HTE) ›› project coordinators & public relations ›› data curators ›› software developers
  4. 4. 4 1. WHO IANUS IS Verband der Landesarchäologen in der Bundesrepublik Deutschland
  5. 5. 5 1. WHOM IANUS ADRESSES exemplary diciplines in ancient studies in Germany
  6. 6. 6 1. WHOM IANUS ADRESSES ancient studies-institutions in Germany
  7. 7. 7 ›› create an infrastructure to archive existing data for the future ›› raise awerness for the reusability of (research) data ›› support the sciences by providing easy access to the data ›› enable researchers & projects to manage their data in a sustainable & sensible way ›› become a national adress fo IT-related questions in ancient studies 1. AIMS OF IANUS
  8. 8. 8 future core tasks ›› long-term preservation ›› giving access ›› registry for archaeological ressources („German ArchSearch“) ›› education & training ›› project support ›› it-recommendations 1. CORE TASKS OF IANUS
  9. 9. 9 1. IANUS — OAIS-WORLFOW
  10. 10. 10 1. IANUS — DATA ACCESS
  11. 11. 11 1. IANUS — DATA ACCESS
  12. 12. 12 1. IANUS — DATA ACCESS
  13. 13. 13 2. DIGITAL DATA IN ANCIENT STUDIES
  14. 14. 14 variety of disciplines ›› archaeology ›› philology ›› ancient history ›› anthropology ›› archaeometry ›› construction history ›› material sciences ›› ... Fächervielfalt in der virtuellen Fachbibliothek Propylaeum https://www.propylaeum.de/altertumswissenschaften/presse/ 2. DIGITAL DATA IN ANCIENT STUDIES
  15. 15. 15 variety of methods and questions ›› documentation ›› excavation ›› survey & prospection ›› architecture documentation ›› sampling ›› conservation & restoration ›› mapping ›› ... CT-Scan einer Mumie https://news.usc.edu/files/2013/03/Mummy-CT-Scan.jpg Napoleon in Ägypten (1798-1801) http://www.ingolfo.de/800px-bonaparte-aux-pyramides_680_508.jpg 2. DIGITAL DATA IN ANCIENT STUDIES
  16. 16. 16 Screenshot einer Harris-Matrix https://www.cg.tuwien.ac.at/research/projects/LEOPOLD/Images/HMCScreenShot_02.jpg variety of data and documents ›› vector ›› cad ›› databases ›› remote sensing / satellite ›› geophysics ›› gis ›› laser scannings GIS-Analyse aus dem Projekt „Fürstensitze“ http://www.fuerstensitze.de/5276_Laufende-Arbeiten-31508.html 2. DIGITAL DATA IN ANCIENT STUDIES
  17. 17. 17 variety of data and documents ›› mark-up text ›› photogrammetry ›› raster images ›› 3D / virtual reality ›› tables ›› statistics ›› ... Rekonstruktionen der Satet-Tempel auf Elephantine http://proceedings.caaconference.org/paper/42_ferschin_et_al_caa2007/ Prähistorische Steinaxt mit und ohne Textur https://www.culturartis.de/home/portfolio/3d-scan-und-druck/ 2. DIGITAL DATA IN ANCIENT STUDIES
  18. 18. 18 3. DATA FORMATS
  19. 19. 19 3. DATA FORMATS What are digital (research) data? ›› digitized analog data sets ›› digital born data Where are the digital (research) data generated? ›› research & projects ›› management / administration ›› other work processes What kinds are there? ›› unprocessed / primary (raw) data ›› processed / secondary data ›› published & unpublished finalized data (results)
  20. 20. 20 test data survey from 19 data collections »» live-data, i.e. not prepared for archiving ›› no systematic data selection, format validation, labelling of files / folders ›› no complete documentation, metadata, licences, etc. ›› often only parts of a larger data collections Projekt-Nr Projekt-Name Institution Datum Datentransfer Meta- Daten Umfang (MB) Anzahl Dateien Anzahl Formate 2013-001_TEST Taganrog DAI Zentrale, Berlin 23. Mai. 2013 nach Rücksprache kopiert aus DAI Cloud ja 84.130 21.566 56 2013-002_TEST Milet, Faustina-Thermen DAI Zentrale, Berlin 16. Mai. 2013 nach Rücksprache kopiert aus DAI Cloud nein 97.885 27.401 97 2013-003_TEST Pergamon DAI Istanbul 14. Jun. 2013 nach Rücksprache kopiert aus DAI Cloud ja 89.472 30.139 229 2013-004_TEST Tell Zira'a DAI NatWiss-Referat, Berlin 14. Feb. 2013 FileServer (DAI interner Server) ja 99 42 5 2013-005_TEST Wendel Neanderthal-Museum / NESPOS, Mettmann 6. Feb. 2013 Webportal (Dropbox) ja 2.008 2.192 4 2013-006_TEST Troja Universität Tübingen 27. Jun. 2013 Festplatte per Post nein 302.060 134.228 82 2013-007_TEST Altägyptisches Wörterbuch BBAW Berlin 16. Mai. 2013 Webportal (mydrive.ch) nein 273 11 2 2013-008_TEST Aleppo, Virtual Archaeology HTW Berlin 15. Jul. 2013 Festplatte per Post ja 126.362 3.278 6 2013-009_TEST Archäometriedatenbank München Prähistorische Sammlung München 5. Mär. 2013 DVD per Post nein 1.100 8.571 107 2013-010_TEST Burgen im Rheinland LVR Rheinland, 10. Mai. 2013 email ja 3 14 5 3. DATA FORMATS
  21. 21. 21 quantities in total »» 684,9 GByte disk space »» 237.403 files in 7.537 folders »» max. directory depth: 12 levels »» 462 file formats average of an archaeological project »» 38 GByte disk space »» 12.425 files in 380 folders »» max. directory depth: 4 levels »» 40 file formats 3. DATA FORMATS
  22. 22. 22 3. DATA FORMATS
  23. 23. 23 Reduce »» diversity and complexity in preferred & accepted file formats »» definition of significant properties with regard to content and technical charateristics »» non-proprietary, software independent, open formats »» in relevant formats for community à development of requirements / guidelines for producers / data providers in order to submit data in a suitable form 3. DATA FORMATS
  24. 24. 24 3. DATA FORMATS AIP – Archive Format DIP – Presentation Format PDF/A-1 pdf preferred pdf/A-2 pdf/A PDF/A-2 pdf preferred pdf/A-2 pdf/A PDF/A-3 pdf accepted pdf/A-2 + additional files pdf/A Other PDF-Variants pdf accepted pdf/A-2 pdf/A Portable Document Format (PDF/A) pdf preferred pdf/A pdf/A Other PDF-Variants pdf accepted pdf/A-2 pdf/A OpenDocument Format odt preferred odt + pdf/A odt, pdf/A Microsoft Office XML docx preferred docx + pdf/A docx, pdf/A Microsoft Word doc accepted docx + pdf/A docx, pdf/A Rich Text Format rtf accepted docx + pdf/A docx, pdf/A Open Office XML sxw accepted odt + pdf/A odt, pdf/A Plain Text txt preferred txt txt Structured Text, Markup xml, sgml, html, etc. + dtd, xsd, etc. preferred xml, sgml, html, etc. + dtd, xsd, etc. xml, sgml, html, etc. + dtd, xsd, etc. Baseline TIFF v. 6, uncompressed tiff, tif preferred tiff (uncompressed v.6) jpeg Adobe Digital Negative dng preferred dng dng, jpeg Portable Network Graphic png accepted tiff (uncompressed v.6) png Joint Photographic Expert Group jpeg, jpg accepted tiff (uncompressed v.6) jpeg Graphics Interchange Format gif accepted tiff (uncompressed v.6) png Windows Bitmap bmp accepted tiff (uncompressed v.6) png Photoshop (Adobe) psd accepted tiff (uncompressed v.6) png, jpeg CorelPaint cpt accepted tiff (uncompressed v.6) png, jpeg JPEG2000 jp2, jpx accepted tiff (uncompressed v.6) jp2, jpx, jpeg RAW image format nef, crw, etc. accepted dng jpeg Scalable Vector Graphics 1.1, uncompressed svg preferred svg svg Computer Graphics Metafile cgm accepted svg svg WebCGM cgm accepted svg svg Drawing Interchange Format (Autodesk) dxf accepted dxf (2010 AC1024) dxf Drawing (Autodesk) dwg accepted dxf (2010 AC1024) dxf DATA FORMATS & DATA MIGRATION – May 2017 – PDF- DOCUMENTS TEXTS/DOCUMENTS SIP – Delivery Format RASTERGRAPHICSGRAPHICS
  25. 25. 25 4. PROBLEMS & CHALLENGES
  26. 26. 26 4. PROBLEMS & CHALLENGES „Digital information lasts forever — or for five years, which ever comes first.“ Jeff Rothenberg, RAND Corp. 1997
  27. 27. 27 4. PROBLEMS & CHALLENGES Zusammenstellung unterschiedlicher Speichermedien durch Archaeology Data Service in York / UK technical readability »» aging of storage media
  28. 28. 28 4. PROBLEMS & CHALLENGES technical readability »» outdated file formats / software, data corruption https://commons.wikimedia.org/wiki/ File:Data_loss_of_image_file.JPG
  29. 29. 29 4. PROBLEMS & CHALLENGES as regards content comprehensibility »» answers to questions like: who, what, when, how and why? »» incomplete documentation »» missing or unstructured metadata »» implicit & explicit information / meanings „Implizite Semantik - Tagging - strukturierte Metadaten“ am Beispiel von Schlüssel; http://dokmagazin.de/ueber-die-bedeutung-semantischer-metadaten-und-war- um-ihre-generierung-nicht-einfach-maschinen-und-algorithmen-ueberlassen-werden-sollte/
  30. 30. 30 as regards content readability »» different structure & naming 4. PROBLEMS & CHALLENGES
  31. 31. 31 4. PROBLEMS & CHALLENGES Conclusions scientifc data in ancient studies is highly »» unique because they describe individual, non-reproducible objects and contexts »» durable because they have beyond the limits of projects – high scientifc relevance »» distributed and disparate as players and use in administration, tourism, science and education is very different »» heterogeneous in content and form (different disciplines) »» at risk because specialized concepts and infrastructures to sustainable management of digital data are missing »» sustainable reusable, if these are structured, described (metadata) and documented in a standardized manner
  32. 32. 32 5. DATA MANAGEMENT
  33. 33. 33 5. DATA MANAGEMENT What is Data Management? »» Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets over time. Why should you take care? »» In order to ensure that stored / archived digital data can be used, understood, and applied not only today, but also tomorrow.
  34. 34. 34 5. DATA MANAGEMENT Aims of (Research) Data Management »» development and implementation of methods, procedures, guidelines and best practices »» clear appropriate and responsibilities, sustainable data documentation »» uniform, non-personal organization of the data »» efficient handling of own and foreign data »» minimize the risk of data loss »» cross-institutional data usage
  35. 35. 35 5. DATA MANAGEMENT Benefits and Value »» Transfer of knowledge to others irrespective of individuals, projects and institutions »» Preservation of primary and secondary data for the future, not only by publications »» Allow reuse of data for new tasks, questions and methods »» Cost reduction in the generation of new data and avoid redundant data collections »» More efficient work due to better interoperability and exchange »» Compliance with legal requirements, such as the obligation to keep information »» Increase the relevance of own work through increased visibility
  36. 36. 36 5. DATA MANAGEMENT Checklist for a Data Management Plan, v4.0 Please cite as: DCC. (2013). Checklist for a Data Management Plan. v.4.0. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/data-management-plans DCC Checklist DCC Guidance and questions to consider Administrative Data ID A pertinent ID as determined by the funder and/or institution. Funder State research funder if relevant Grant Reference Number Enter grant reference number if applicable [POST-AWARD DMPs ONLY] Project Name If applying for funding, state the name exactly as in the grant proposal. Project Description Questions to consider: - What is the nature of your research project? - What research questions are you addressing? - For what purpose are the data being collected or created? Guidance: Briefly summarise the type of study (or studies) to help others understand the purposes for which the data are being collected or created. PI / Researcher Name of Principal Investigator(s) or main researcher(s) on the project. PI / Researcher ID E.g ORCID http://orcid.org/ Project Data Contact Name (if different to above), telephone and email contact details Date of First Version Date the first version of the DMP was completed Date of Last Update Date the DMP was last changed Related Policies Questions to consider: - Are there any existing procedures that you will base your approach on? - Does your department/group have data management guidelines? - Does your institution have a data protection or security policy that you will follow? - Does your institution have a Research Data Management (RDM) policy? - Does your funder have a Research Data Management policy? - Are there any formal standards that you will adopt? Guidance: List any other relevant funder, institutional, departmental or group policies on data management, data sharing and data security. Some of the information you give in the
  37. 37. 37 5. DATA MANAGEMENT Categories of (Research) Data Management Plans »» frameworks and administrative information ›› conditions, objectives, project promoters, etc. »» responsibilities ›› assure conditions, backups, permission, integrity of data, etc. »» legal aspects ›› data covered by copyright / protection, how documented, requirements for publishing the data, which license for third parties, etc. »» methods ›› used methods, guidelines / requirements, which documentation method, affect the method the amount of data, etc.
  38. 38. 38 5. DATA MANAGEMENT »» specifications, guidelines and standards ›› check for laws, regulations, infrastructure, standards, etc., quality of the data, etc. »» costs ›› kind of personnel / storage / infrastructure / tools / electricity, for reproducible data: storage vs. recovery, etc. »» external partners or service providers ›› coop with whom, implications, exchange, rights of data, etc. »» hardware and software ›› what is available, special needs, fulfillment of requirements, check replacement of paid software by open source, etc.
  39. 39. 39 5. DATA MANAGEMENT »» data types & data formats ›› methods – types – formats, requirements of data (archive, reuse) open / proprietary, implications for hard- / software, etc. »» reuse of existing data ›› existing data by own / third parties, access / reuse options, »» creation of new data ›› decision of unique / reproducable, sensitive / protective data, ... »» amount of data ›› expectation, versioning, consequences for storage / backup /  archive
  40. 40. 40 5. DATA MANAGEMENT »» file storage / file backup ›› necessary actions, where (hard disk, server), determination number of redundant copies, current anti virus software, ›› backup intervalls by whom / how / how often, responsibility, overwrite protection (read only), check data integrity/completeness ›› disaster management, recovery management been rehearsed, »» file management ›› how files ordered / named / versionned, namimg rules, handling of different file version, repository structure documented, etc.
  41. 41. 41 5. DATA MANAGEMENT »» documentation ›› understandable describtion of data for short / longterm, kind of information, time, requirements, changes & updates, how to store / save / archive metadata, exceptions, support tools, provenance etc. »» quality assurance ›› critera for existing standards, data are accurate / consistent /  authentic / complete, clearly documented (who did what for what purpose), checklists, activities against accidental deletion / manipulation of data, etc. »» data exchange ›› between whom and how, requirements rights / restrictions / - technical infrastructure, access policy, rights of use, exchange formats, etc.
  42. 42. 42 5. DATA MANAGEMENT »» medium term data storage ›› reasons for keeping data, requirements time / locations, how, selction must / should – kept / deleted, access rights, how long, where, responsibility for keeping the data, costs, etc. »» longterm data storage (archiving) ›› selection, criteria for selection, suitable archive solution, contact to an existing archive, who is doing what, etc. »» accessibility & reuse ›› how should the data accessible, what additional information to understand the data, who can use, which licence, are there restrictions, etc.
  43. 43. 43 5. DATA MANAGEMENT Conclusion »» document your ›› methods, terms, systems and questions »» use common standards and define working rules »» make your data explicit, not implicit »» implement (research) data management plans »» structure your data in a comprehensible way »» involve all relevant actors and describe workflows à the higher the data quality is the easier it can be archived for the future and the better it can be reused by anyone
  44. 44. 44 6. SAVE – BACKUP – ARCHIVE
  45. 45. 45 differentiation – terms / concepts »» different storage concepts ›› save ›› backup ›› (longterm) archiving 6. SAVE – BACKUP – ARCHIVE
  46. 46. 46 differentiation – terms / concepts »» different storage concepts ›› save — transfer data from a working memory of a programm or a RAM of a computer to a disk drive (mainly computer internal) 6. SAVE – BACKUP – ARCHIVE
  47. 47. 47 differentiation – terms / concepts »» different storage concepts ›› backup — copy of saved data (sync to second instance of redundant data) for disaster-recover reasons (mainly on external drive / network) 6. SAVE – BACKUP – ARCHIVE
  48. 48. 48 differentiation – terms / concepts »» different storage concepts ›› (longterm) archiving — preservation of digital information, to enable / gurantee the long time accessibility for the re-use of data, incl. bitstream preservation, i.e. physical conservation of a given bit sequence 6. SAVE – BACKUP – ARCHIVE
  49. 49. 49 6. SAVE – BACKUP – ARCHIVE
  50. 50. 50 7. BEST PRACTICES IT-Empfehlungen
  51. 51. 51 Guides to Good Practice »» published by ›› Archaeology Data Service (ADS), United Kingdom ›› The Digital Archaeological Record (tDAR), USA »» central web portal with information about ›› the application of IT in archaeology ›› adressing all phases of a data lifecycle ›› collect, curate and promote exsiting standards, including practical help to apply them (e.g. tutorials, templates, tools, best practice examples) ›› wiki to enable collaborative development on the standards and guides 7. BEST PRACTICES
  52. 52. 52 7. BEST PRACTICES
  53. 53. 53 7. BEST PRACTICES Data Management Plans »» published by ›› DMPOnline (DCC), United Kingdom ›› DMPTool, university of California, USA ›› Data Management Plans
  54. 54. 54 7. BEST PRACTICES Digital Preservation »» published by Digital Preservation Coalition (DPC), UK ›› information about ›› tools ›› preservation strategies ›› technical solutions ›› ...
  55. 55. 55 FURTHER INFORMATIONEN IT-Recommedations (only in german) »» https://www.ianus-fdz.de/it-empfehlungen Guides to Good Practice »» http://guides.archaeologydataservice.ac.uk/ Data Management Plans »» DMP à http://www.dcc.ac.uk/resources/data-management-plans »» Data Managemen Planing Tool à https://dmptool.org/ »» Data Management Plan Online à https://dmponline.dcc.ac.uk/ Digital Preservation Coalition »» http://www.dpconline.org/knowledge-base
  56. 56. https://www.ianus-fdz.de THANK YOU ! Forschungsdatenzentrum Archäologie & Altertumswissenschaften Austausch Digitale Daten Forschung Nachnutzung Archivierung Planung Datenerhaltung Metadaten Dokumentation IT-Empfehlungen IANUS c/o Deutsches Archäologisches Institut Podbielskiallee 69-71 D-14195 Berlin Tel.: +49-(0)30-187711-359 Project Leaders Prof. Dr. Friederike Fless Prof. Dr. Ortwin Dally Project Coordinators Maurice Heinrich Dr. Felix F. Schäfer Further Informations homepage: https://www.ianus-fdz.de twitter: @Ianus_fdz facebook: IANUS-Forschungsdatenzentrum youtube: IANUS-Forschungsdatenzentrum

×