Montemayor_AIMS_Inventory_Presentation_revised

•Download as PPTX, PDF•

0 likes•54 views

Gabe Montemayor

PDS Imaging Node
AIMS Inventory
Component
GABRIEL MONTEMAYOR

Introduction
•Cal Poly Pomona Computer Science student
•Found out about SIRI Internship through professor
•Interested in building something from the ground up

Problem
• PDS IMG houses over 700 TB of digital images archives
• Currently have a loose understanding of the data that we have
• Need to enable a better picture of the archive data
• Need to inventory and query archive easily

Solution – AIMS Inventory Component
• Crawl through the archive to examine each file or directory
• Represent each item in the archive as an Archive Product
• Extract the file metadata for each Archive Product
• Track all information about each Archive Product to maintain the inventory
• Index and store each Archive Product

Archive Product
• Create an object to organize
the metadata associated
with each file/directory in
the archive
• Metadata Extractor for
Archive Product

File Manager
WHAT IT DOES
•Open-source software part of Apache
OODT CAS
•Goal is to collect, catalog, and store
files
•Similar to idea of iTunes Library
•More powerful, can store any type of
data
WHAT WE NEED
•Collect, catalog and store Archive
Products
•Extend and configure software for use
with Archive Products
• Use Archive Product Metadata Extractor
• Alter a few xml files

Crawler
• Apache OODT CAS
• Traverse the many directories and files within the data archive
• Push each Archive Product to the File Manager
• No additional extension for crawler is needed
• Minor configuration changes

Solr
• Use of the Apache Solr software to
index and store information on each
Archive Product
• Allows user to query the indexed
data
• Many possible add-ons
• Google-like search of PDS
documentation
• Create a core where the archive will
be stored
• Create a special configuration for PDS
IMG Archive Products
• Change the solrconfig.xml file to allow the
core to use a manually edited schema file
• Includes a modified schema.xml file which
indexes the metadata fields specific to
Archive Products
• Modify the filemgr.properties file to
integrate Filemgr with Solr

Documentation
• Added documentation to
the the Confluence wiki
page for other PDS IMG
developers
• Future extensions to the
software will be easier
• Added instructions to extract
more metadata
• Wiki Page

Future
• Utilize the Banana
software, which runs on top
of Solr
• Offers a rich and more
flexible user interface
• Free search PDS
documentation

What I learned
• Used many different open-source projects
• Learned about the software creation process
• Learned more about the data systems field
• Opportunity to apply Computer Science knowledge

What's hot

DIAL UpdateThe HDF-EOS Tools and Information Center

Creating your own private Download Center with Bintray Baruch Sadogursky

MongoDB at YleMongoDB

O365Con19 - Tips and Tricks for Complex Migrations to SharePoint Online - And...NCCOMMS

State of the Art Logging. Kibana4Solr is Here! lucenerevolution

SqlBits SQL Server on RDS - John McCormackJohn McCormack

Wot’s in a nameSwiftConsortium

Using the extensibility benefits of EPiServerPatrick van Kleef

EPiServer ChartsPatrick van Kleef

【EDD Workshop@140829】iOS Models-Core Data_by Boska-LeeEZTABLE

Hidden gems in SXA that you might not be aware ofPrzemysław Taront

Building Search Engines - Lucene, SolR and ElasticsearchRahul Singh

MidLink - re:Invent 2018 meetup presentation - new amazon s3 featuresEliran Yamin

Idencia File Sharing: Documents in the FieldJeff Pollock

Global azurebootcamp2019vancouver aks_presentation_by_ashprasad_arjavprasadashishpd

Polyglot Persistence Utilizing Open Source Databases as a Swiss Pocket KnifeSeveralnines

Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Edureka!

Sqlite IntroductionPraveen Nair

Drupal performanceGabi Lee

Code PaLOUsa - Architecting a Content Management Solution with SharePoint 2013Patrick Tucker

What's hot (20)

DIAL Update

Creating your own private Download Center with Bintray

MongoDB at Yle

O365Con19 - Tips and Tricks for Complex Migrations to SharePoint Online - And...

State of the Art Logging. Kibana4Solr is Here!

SqlBits SQL Server on RDS - John McCormack

Wot’s in a name

Using the extensibility benefits of EPiServer

EPiServer Charts

【EDD Workshop@140829】iOS Models-Core Data_by Boska-Lee

Hidden gems in SXA that you might not be aware of

Building Search Engines - Lucene, SolR and Elasticsearch

MidLink - re:Invent 2018 meetup presentation - new amazon s3 features

Idencia File Sharing: Documents in the Field

Global azurebootcamp2019vancouver aks_presentation_by_ashprasad_arjavprasad

Polyglot Persistence Utilizing Open Source Databases as a Swiss Pocket Knife

Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...

Sqlite Introduction

Drupal performance

Code PaLOUsa - Architecting a Content Management Solution with SharePoint 2013

Viewers also liked

holcim whiteboard 2013_ENGIngrid Visnovska

design highlights v2Tim Renaud, PMP

ODOEPresentationMatthew Bragg

Kanoa Ishihara - State Theatre Culture of Inclusion GrantKanoa Ishihara

Ecolijimperfekt

salvatore-paulino_resume_6-20Salvatore Paulino

Darcy'sJobDescriptionsDarcy Bross

montemayor_jp_portafolio_finalJuan pablo Montemayor

Nuevo presentación de microsoft office power pointsucuc123

Dru Resume (1)Dru Rekha

AlenkaSelcan-ustvarjalkaAlenka Selcan

Karoll pabonKaroll Pabon Ochoa

Viewers also liked (12)

holcim whiteboard 2013_ENG

design highlights v2

ODOEPresentation

Kanoa Ishihara - State Theatre Culture of Inclusion Grant

Ecoli

salvatore-paulino_resume_6-20

Darcy'sJobDescriptions

montemayor_jp_portafolio_final

Nuevo presentación de microsoft office power point

Dru Resume (1)

AlenkaSelcan-ustvarjalka

Karoll pabon

Similar to Montemayor_AIMS_Inventory_Presentation_revised

From Box to Hydra via ArchivematicaJisc RDM

Wikipedia Cloud Search WebinarSearch Technologies

Git.From thorns to the starsStrannik_2013

Alfresco Day Stockholm 2015 - Alfresco OneNicole Szigeti

CNIT 121: 13 Investigating Mac OS X SystemsSam Bowne

In Place Analytics For File and Object DataSandeep Patil

Spectrum scale object analyticsSmita Raut

DRI Introductory Training: Preparing Your Collections for the DRIdri_ireland

In-browser storage and meJason Casden

Digital Preservation with Archivematica: An IntroductionArtefactual Systems - Archivematica

Internet content as research dataNational Library of Australia

It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...mharpasu

USG Summit - September 2014 - Web Management using DrupalEric Sembrat

Daniel Garcia ContentBox: CFSummit 2023Ortus Solutions, Corp

Apache Content Technologiesgagravarr

CNIT 152: 13 Investigating Mac OS X SystemsSam Bowne

Movingto moodle2 v1 1Jisc RSC East Midlands

Azure data lake sql konf 2016Kenneth Michael Nielsen

Search all the thingscyberswat

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY

Similar to Montemayor_AIMS_Inventory_Presentation_revised (20)

From Box to Hydra via Archivematica

Wikipedia Cloud Search Webinar

Git.From thorns to the stars

Alfresco Day Stockholm 2015 - Alfresco One

CNIT 121: 13 Investigating Mac OS X Systems

In Place Analytics For File and Object Data

Spectrum scale object analytics

DRI Introductory Training: Preparing Your Collections for the DRI

In-browser storage and me

Digital Preservation with Archivematica: An Introduction

Internet content as research data

It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...

USG Summit - September 2014 - Web Management using Drupal

Daniel Garcia ContentBox: CFSummit 2023

Apache Content Technologies

CNIT 152: 13 Investigating Mac OS X Systems

Movingto moodle2 v1 1

Azure data lake sql konf 2016

Search all the things

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?

Montemayor_AIMS_Inventory_Presentation_revised

1. PDS Imaging Node AIMS Inventory Component GABRIEL MONTEMAYOR

2. Introduction •Cal Poly Pomona Computer Science student •Found out about SIRI Internship through professor •Interested in building something from the ground up

3. Problem • PDS IMG houses over 700 TB of digital images archives • Currently have a loose understanding of the data that we have • Need to enable a better picture of the archive data • Need to inventory and query archive easily

4. Solution – AIMS Inventory Component • Crawl through the archive to examine each file or directory • Represent each item in the archive as an Archive Product • Extract the file metadata for each Archive Product • Track all information about each Archive Product to maintain the inventory • Index and store each Archive Product

5. Architecture

6. Archive Product • Create an object to organize the metadata associated with each file/directory in the archive • Metadata Extractor for Archive Product

8. File Manager WHAT IT DOES •Open-source software part of Apache OODT CAS •Goal is to collect, catalog, and store files •Similar to idea of iTunes Library •More powerful, can store any type of data WHAT WE NEED •Collect, catalog and store Archive Products •Extend and configure software for use with Archive Products • Use Archive Product Metadata Extractor • Alter a few xml files

9. Crawler • Apache OODT CAS • Traverse the many directories and files within the data archive • Push each Archive Product to the File Manager • No additional extension for crawler is needed • Minor configuration changes

10. Solr • Use of the Apache Solr software to index and store information on each Archive Product • Allows user to query the indexed data • Many possible add-ons • Google-like search of PDS documentation • Create a core where the archive will be stored • Create a special configuration for PDS IMG Archive Products • Change the solrconfig.xml file to allow the core to use a manually edited schema file • Includes a modified schema.xml file which indexes the metadata fields specific to Archive Products • Modify the filemgr.properties file to integrate Filemgr with Solr

11.

12. Demo

13. Test Archive

14. Documentation • Added documentation to the the Confluence wiki page for other PDS IMG developers • Future extensions to the software will be easier • Added instructions to extract more metadata • Wiki Page

15. Future • Utilize the Banana software, which runs on top of Solr • Offers a rich and more flexible user interface • Free search PDS documentation

16. What I learned • Used many different open-source projects • Learned about the software creation process • Learned more about the data systems field • Opportunity to apply Computer Science knowledge

17. Questions

Editor's Notes

1. My name’s Gabe, CPP CS student who will be graduating in the Spring 2. Found out about SIRI internship through an email from the CPP advisor for this internship 3. Chose computer science because I was always interested in creating something from the ground up wanted to design something, implement it, test it, and see others use it 4. This project specifically seemed like a good opportunity to build something from the ground up Transition: The problem I have been trying to solve during this internship is…
Transition: The solution to this problem is….
The solution to this problem is the AIMS Inventory Component This software does the following task:
This architecture shows the various software that we used to accomplish these tasks And shows how the software is connected to each other 1. Crawler takes each file directly from the Data Archive 2. Each file and its file metadata is represented as an Archive Product 3. FileManager extracts the metadata from the Archive Product 4. Solr indexes and stores the Archive Product and the extracted metadata
Started this project with the creation of the Archive Product to represent each file/directory within the archive Also had to create methods to extract the metadata for each archive product Diagram shows that the Inventory will consist of Archive Product objects This is the metadata that each Archive Product will contain
This is a screen shot of my code. These are the function headers of the functions I created to extract the metadata for each Archive Product function name explains task of function (get MD5 checksum gets checksum for each file) All of these functions are called by the doExtract method
Apache Object Oriented Data Tecehnology Catalog and Archive Services iTunes Library: want to store music also want to store music metadata – if the product being stored is a song, metadata would include Title, Album, Year, Track Number Chose this software because of its extensibility, can be configured and extended to work on any type of data Need to collect, catalog, and store archive products Instead of song metadata like album and track number, extract ArchiveProduct metadata such as file size, checksum, mission To extend for use with Archive Products: made use of the metadata extractor code shown earlier, altered a few xml files
Each file ingested by FileManager would be pushed to Solr for storage While File Manager could techinically handle storage, this allows for storage and querying in a more user friendly way Possibility of Google-like search, many add ons
Example of query, will be demoed Can display certain fields, can search by field
This is the test archive I have been testing the project on Consists of two missions: Cassini and MSL Within each mission, there are sample files: includes img files, and files that would be considered Old Volumes Data, Staged Data, and Safed Data, or considered Extra
Documented the whole process of creating this software on our wiki page All modifications to the xml files mentioned are listed here Example of usefullness of this wiki is it is easier to extract other metadata in the future because of instructions provided
Open source: learned how to read documentation, implement software, extend and configure it for specific needs learned about some of the problems with open source, documentation is sparse and sometimes not updated Software creation process: designing, implementing, documenting, writing reports, presenting Learned a lot in my classes, haven’t been able to apply until this internship

Montemayor_AIMS_Inventory_Presentation_revised

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Montemayor_AIMS_Inventory_Presentation_revised

Similar to Montemayor_AIMS_Inventory_Presentation_revised (20)

Montemayor_AIMS_Inventory_Presentation_revised

Editor's Notes