SlideShare a Scribd company logo
1 of 36
Download to read offline
Case-Study: Publishing to the
“Web of Data” in Archaeology

      Quality and Workflows



                              Eric Kansa
                UC Berkeley / OpenContext.org



      Unless otherwise indicated, this work is licensed under a Creative Commons
         Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
“Small Science” data sharing
                                                              is hard:
                                                              (1) Complexity
                                                              (2) Scalability
                                                              (3) Ethics, cultural property
                                                                  claims, IP
                                                              (4) Incentives
                                                              (5) Preservation
Image Credit: “Grand Canyon NPS” via Flickr (CC-By)
  http://www.flickr.com/photos/grand_canyon_nps/5975537378/
Thousand Flowers




         ●
             Open Context: Open access,
             open licensed data for
             arhaeology
         ●
             Archiving by California Digital
             Library
         ●
             Persistent Identifiers (DOIs,
             ARKs)
         ●
             Web services
         ●
             NSF/NEH links for data
             management plans
Thousand Flowers




Fills a Gap:

Most data sources are institutional.
Open Context publishes individual,
small group contributions
Thousand Flowers




Fills a Gap:
                                       Challenge:
Most data sources are institutional.   Diverse
Open Context publishes individual,     contributions,
small group contributions              needing lots of
                                       work to clean-
                                       up and “link” to
                                       the Web of Data
•
    3-year project Oct 2010 – Sep 2013


•
    Funded with a National Leadership Grant from the
    Institute for Museum and Library Services, LG-06-
    10-0140-10, “Dissemination Information Packages
    for Information Reuse”


•
    Ixchel Faniel, PI & Elizabeth Yakel, Co-PI


    http://www.dipir.org
DIPIR Collaboration
The Big DIPIR Questions
Research Questions
1. What are the significant
properties of data that
facilitate reuse by the
designated communities at the
three sites?
2. How can these significant
properties be expressed as
representation information to
ensure the preservation of
meaning and enable data
reuse?
Open Context Interviewees
•
    22 Ph.D. or graduate students
    interviewed
    –
        13 men
    –
        9 women
•
    Novices / Experts
    –
        19 experts
    –
        3 novices
•
    Interviewees who where
    curators or professors also
    with a curatorial role = 6
Raw Data is Unappetizing?
Data Documentation Practices
I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each
specimen on paper when I was in graduate school so that's what I
started …then quickly, I was like, "This is ridiculous.“… I just started
using an Excel spreadsheet that has sort of slowly gotten bigger and
bigger over time with more variables or columns…I've added …color
coding…I also use…a very sort of primitive numerical coding system,
again, that I inherited from my research advisers…So, this little book
that goes with me of codes which is sort of odd, but …we all know
that a 14 is a sheep.” (CCU13)
Data Documentation Practices
I use an Excel spreadsheet…which I … inherited from my research
advisers. …my dissertation advisor was still recording data for each
specimen on paper when I was in graduate school so that's what I
started …then quickly, I was like, "This is ridiculous.“… I just started
using an Excel spreadsheet that has sort of slowly gotten bigger and
bigger over time with more variables or columns…I've added …color
coding…I also use…a very sort of primitive numerical coding system,
again, that I inherited from my research advisers…So, this little book
that goes with me of codes which is sort of odd, but …we all know
that a 14 is a sheep.” (CCU13)


                                          A long way to go before we
                                          get usable, intelligible data
Sometimes data is better
served cooked.
Thousand Flowers



        ●
            Clean-up and document
            contributed data
        ●
            Map to ArchaeoML (general
            ontology)
        ●
            Mint URIs to entities
            (potsherds, projects, contexts,
            people)
        ●
            Link to important vocabularies /
            collections (Pleiades,
            Encyclopedia of Life)
        ●
            Working on CIDOC-CRM
            (RDF) representations (not
            straightforward)
Open Context: Record
Open Context: Record




                       ●
                           XHTML + RDFa (Dublin Core,
                           Open Annotation, etc.)
                       ●
                           XML (ArchaeoML)
                       ●
                           Atom
                       ●
                           RDF (draft CIDOC)
                       ●
                           Link to GitHub versioned file
Open Context: Record
Open Context: Record
Open Context: Visutalization of Data Linked to the EOL
My Precious Data




  Image Credit: “Lord of the Rings” (2003, New
      Line), All Rights Reserved Copyright
Data sharing as publication
Data Publishing
Publishing




             Data Quality and Standards
             Alignment
             (1) Check consistency
             (2) Edit functions
             (3) Align to common standards
                 (“Linked Data” if applicable)
             (4) Issue tracking, version
                 control
Publishing




             Tools of the Trade

              (1) Google Refine (check, edit,
                  consistancy)
              (2) Mantis (issue-tracker,
                  coordinate edits, metadata
                  creation)
Publishing




             Tools of the Trade

              (1) Domain scientists (Editorial
                  Board) check data
              (2) Iterative “coproduction”
                  between contributors and
                  editoris
Publishing




               Project Metadata


             Column Descriptions
Web of Data (2011)




         Main Contributors:

              ●
                  Institutions (esp. government)

              ●
                  Thematic collections / projects
Publishing




             Entity Reconciliation

              (1) With Google Refine
              (2) Implemented, EOL and
                  Pleiades (gazetteer)
              (3) Use existing mappings to
                  improve future reconciliation
●
    CDL Archiving Service
●
    EZID for persistent Identity: DOIs
    (aggregate resources), ARKs
    (granular resources) and Merritt
    Repository
●
    Helps build trust in community
CDL as Infrastructure



●
    Platform / Services
    disciplinary communities
    can use for “Data
    Publishing”
●
    Different communities
    work out
    semantic/interoperability
    needs, editorial policies,   University of California (System)
    incentives, etc.                       Repository,
                                          All disciplines
                                   (UC-funded library, grants)
CDL as Infrastructure                                   Future data
                                 Future data                           publisher
                                  publisher




●
    Platform / Services
    disciplinary communities
    can use for “Data
    Publishing”
●
    Different communities
    work out
    semantic/interoperability
    needs, editorial policies,             University of California (System)
    incentives, etc.                                 Repository,
                                                    All disciplines
                                               (UC-funded library, grants)
eScholarship: UC’s OA Publishing Platform
Platform for traditional publishing
Also supports new genres
Summary




 Outcomes of Publishing Data:
  (1) Communicate and set
      expectations about content and
      quality
  (2) Organize workflows to improve
      data quality and usability
  (3) Make “datasets” first class citizens
      in world of scholarly
      communications
Final Thoughts

Publication needs to evolve!

 (1) Participating in Linked Data is
     a great goal, but far removed
     from most everyday practice

 (2) Researchers need help.

 (3) 19th century publication norms
     poorly suited to 21st century
     methods, research, public
     goals

More Related Content

What's hot

Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Guus van den Brekel
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosOCLC
 
Describing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgDescribing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgOCLC
 
Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Rose Holley
 
Towards collaboration at scale: Libraries, the social and the technical
Towards collaboration at scale:  Libraries, the social and the technicalTowards collaboration at scale:  Libraries, the social and the technical
Towards collaboration at scale: Libraries, the social and the technicallisld
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the userlisld
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentConstance Malpas
 
Infrastructure, engagement, innovation: library directions
Infrastructure, engagement, innovation: library directionsInfrastructure, engagement, innovation: library directions
Infrastructure, engagement, innovation: library directionslisld
 
Libraries: technology as artifact and technology in practice
Libraries: technology as artifact and technology in practiceLibraries: technology as artifact and technology in practice
Libraries: technology as artifact and technology in practicelisld
 
What business are we in?
What business are we in?What business are we in?
What business are we in?lisld
 
Digital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project FeedbackDigital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project Feedbackjisc-elearning
 
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...Getaneh Alemu
 
Final project posters for lis 653 spring 2014
Final project posters for lis 653 spring 2014Final project posters for lis 653 spring 2014
Final project posters for lis 653 spring 2014PrattSILS
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive MetadataOCLC
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Janifer Gatenby
 
OUR space: the new world of metadata
OUR space: the new world of metadataOUR space: the new world of metadata
OUR space: the new world of metadataKaren S Calhoun
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless OpportunityRachel Frick
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 

What's hot (20)

Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0Virtual Research Networks : Towards Research 2.0
Virtual Research Networks : Towards Research 2.0
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Describing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgDescribing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.org
 
Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...Social metadata for libraries, archives and museums: Research findings from t...
Social metadata for libraries, archives and museums: Research findings from t...
 
Towards collaboration at scale: Libraries, the social and the technical
Towards collaboration at scale:  Libraries, the social and the technicalTowards collaboration at scale:  Libraries, the social and the technical
Towards collaboration at scale: Libraries, the social and the technical
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the user
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environment
 
Infrastructure, engagement, innovation: library directions
Infrastructure, engagement, innovation: library directionsInfrastructure, engagement, innovation: library directions
Infrastructure, engagement, innovation: library directions
 
Libraries: technology as artifact and technology in practice
Libraries: technology as artifact and technology in practiceLibraries: technology as artifact and technology in practice
Libraries: technology as artifact and technology in practice
 
What business are we in?
What business are we in?What business are we in?
What business are we in?
 
Digital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project FeedbackDigital Visitors and Residents: Project Feedback
Digital Visitors and Residents: Project Feedback
 
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
 
Final project posters for lis 653 spring 2014
Final project posters for lis 653 spring 2014Final project posters for lis 653 spring 2014
Final project posters for lis 653 spring 2014
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive Metadata
 
Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19Multilingual presentation ifla 2013 08-19
Multilingual presentation ifla 2013 08-19
 
OUR space: the new world of metadata
OUR space: the new world of metadataOUR space: the new world of metadata
OUR space: the new world of metadata
 
Discovering Our Way
Discovering Our WayDiscovering Our Way
Discovering Our Way
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless Opportunity
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 

Similar to IASSIT Kansa Presentation

What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
Beyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeEric Kansa
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeologyguest756e05
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...aceas13tern
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 
Vila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxVila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxLIS EPI Meeting
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRobert H. McDonald
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
Global lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataGlobal lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataMinerva Lin
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semanticsplan4all
 

Similar to IASSIT Kansa Presentation (20)

Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Beyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional PracticeBeyond Preservation: Situating Archaeological Data in Professional Practice
Beyond Preservation: Situating Archaeological Data in Professional Practice
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeology
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionality
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
Vila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-reduxVila LOD-innovacion- bib-semweb-redux
Vila LOD-innovacion- bib-semweb-redux
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
Global lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataGlobal lodlam_communities and open cultural data
Global lodlam_communities and open cultural data
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
 
EZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data CitationEZID: Easy Persistent Identifiers and Data Citation
EZID: Easy Persistent Identifiers and Data Citation
 
CAEPIA 2011
CAEPIA 2011CAEPIA 2011
CAEPIA 2011
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

IASSIT Kansa Presentation

  • 1. Case-Study: Publishing to the “Web of Data” in Archaeology Quality and Workflows Eric Kansa UC Berkeley / OpenContext.org Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
  • 2. “Small Science” data sharing is hard: (1) Complexity (2) Scalability (3) Ethics, cultural property claims, IP (4) Incentives (5) Preservation Image Credit: “Grand Canyon NPS” via Flickr (CC-By) http://www.flickr.com/photos/grand_canyon_nps/5975537378/
  • 3. Thousand Flowers ● Open Context: Open access, open licensed data for arhaeology ● Archiving by California Digital Library ● Persistent Identifiers (DOIs, ARKs) ● Web services ● NSF/NEH links for data management plans
  • 4. Thousand Flowers Fills a Gap: Most data sources are institutional. Open Context publishes individual, small group contributions
  • 5. Thousand Flowers Fills a Gap: Challenge: Most data sources are institutional. Diverse Open Context publishes individual, contributions, small group contributions needing lots of work to clean- up and “link” to the Web of Data
  • 6. 3-year project Oct 2010 – Sep 2013 • Funded with a National Leadership Grant from the Institute for Museum and Library Services, LG-06- 10-0140-10, “Dissemination Information Packages for Information Reuse” • Ixchel Faniel, PI & Elizabeth Yakel, Co-PI http://www.dipir.org
  • 8. The Big DIPIR Questions Research Questions 1. What are the significant properties of data that facilitate reuse by the designated communities at the three sites? 2. How can these significant properties be expressed as representation information to ensure the preservation of meaning and enable data reuse?
  • 9. Open Context Interviewees • 22 Ph.D. or graduate students interviewed – 13 men – 9 women • Novices / Experts – 19 experts – 3 novices • Interviewees who where curators or professors also with a curatorial role = 6
  • 10. Raw Data is Unappetizing?
  • 11. Data Documentation Practices I use an Excel spreadsheet…which I … inherited from my research advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, "This is ridiculous.“… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13)
  • 12. Data Documentation Practices I use an Excel spreadsheet…which I … inherited from my research advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, "This is ridiculous.“… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13) A long way to go before we get usable, intelligible data
  • 13. Sometimes data is better served cooked.
  • 14. Thousand Flowers ● Clean-up and document contributed data ● Map to ArchaeoML (general ontology) ● Mint URIs to entities (potsherds, projects, contexts, people) ● Link to important vocabularies / collections (Pleiades, Encyclopedia of Life) ● Working on CIDOC-CRM (RDF) representations (not straightforward)
  • 16. Open Context: Record ● XHTML + RDFa (Dublin Core, Open Annotation, etc.) ● XML (ArchaeoML) ● Atom ● RDF (draft CIDOC) ● Link to GitHub versioned file
  • 19. Open Context: Visutalization of Data Linked to the EOL
  • 20. My Precious Data Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright
  • 21. Data sharing as publication
  • 23. Publishing Data Quality and Standards Alignment (1) Check consistency (2) Edit functions (3) Align to common standards (“Linked Data” if applicable) (4) Issue tracking, version control
  • 24. Publishing Tools of the Trade (1) Google Refine (check, edit, consistancy) (2) Mantis (issue-tracker, coordinate edits, metadata creation)
  • 25. Publishing Tools of the Trade (1) Domain scientists (Editorial Board) check data (2) Iterative “coproduction” between contributors and editoris
  • 26. Publishing Project Metadata Column Descriptions
  • 27. Web of Data (2011) Main Contributors: ● Institutions (esp. government) ● Thematic collections / projects
  • 28. Publishing Entity Reconciliation (1) With Google Refine (2) Implemented, EOL and Pleiades (gazetteer) (3) Use existing mappings to improve future reconciliation
  • 29. CDL Archiving Service ● EZID for persistent Identity: DOIs (aggregate resources), ARKs (granular resources) and Merritt Repository ● Helps build trust in community
  • 30. CDL as Infrastructure ● Platform / Services disciplinary communities can use for “Data Publishing” ● Different communities work out semantic/interoperability needs, editorial policies, University of California (System) incentives, etc. Repository, All disciplines (UC-funded library, grants)
  • 31. CDL as Infrastructure Future data Future data publisher publisher ● Platform / Services disciplinary communities can use for “Data Publishing” ● Different communities work out semantic/interoperability needs, editorial policies, University of California (System) incentives, etc. Repository, All disciplines (UC-funded library, grants)
  • 32. eScholarship: UC’s OA Publishing Platform
  • 35. Summary Outcomes of Publishing Data: (1) Communicate and set expectations about content and quality (2) Organize workflows to improve data quality and usability (3) Make “datasets” first class citizens in world of scholarly communications
  • 36. Final Thoughts Publication needs to evolve! (1) Participating in Linked Data is a great goal, but far removed from most everyday practice (2) Researchers need help. (3) 19th century publication norms poorly suited to 21st century methods, research, public goals