Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SCAPE Webinar: Tools for uncovering preservation risks in large repositories

490 views

Published on

This presentation origins from a webinar presented by Luís Faria. The webinar presents the SCAPE developed tools Scout and C3PO and demonstrates how to identify preservation risks in your content and, at the same time, share your content profile information with others to open new opportunities.

Scout, the preservation watch system, centralizes all the necessary knowledge on the same platform, cross-referencing this knowledge to uncover all preservation risks. Scout automatically fetches information from several sources to populate its knowledge base. For example, Scout integrates with C3PO to get large-scale characterization profiles of content. Furthermore, Scout aims to be a knowledge exchange platform, to allow the community to bring together all the necessary information into the system. The sharing of information opens new opportunities for joining forces against common problems.

The webinar was held 26 June 2014.

Published in: Technology, News & Politics
  • Be the first to comment

  • Be the first to like this

SCAPE Webinar: Tools for uncovering preservation risks in large repositories

  1. 1. Luis  Faria  lfaria@keep.pt   KEEP  SOLUTIONS  www.keep-­‐solu=ons.com SCAPE  webminar   July  26,  2014 Tools  for  uncovering  preserva=on   risks  in  your  large  repositories
  2. 2. Repository Format obsolescence Emerging technology Consumer trends New standards Organisation mission Bit rot Resource capability System availability Security breach Economical limitations Social and political factors Producer trends Organisation policies 2 Why do we need monitoring?
  3. 3. Repository Format obsolescence Emerging technology Consumer trends New standards Organisation mission Bit rot Resource capability System availability Security breach Economical limitations Social and political factors Producer trends Organisation policies 3 Why do we need monitoring? Risks Opportunities
  4. 4. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 4 5.41%& 0.77%& 1.54%& 1.93%& 2.32%& 2.70%& 2.70%& 5.02%& 7.34%& 9.27%& 15.83%& 26.64%& 28.57%& 0.00%& 5.00%& 10.00%& 15.00%& 20.00%& 25.00%& 30.00%& Other& Data&intensive&industry& Non&affiliated& Big&data&science& Digital&preservaDon&vendor& Research&funder& Large&enterprise& Publisher&or&content&producer& Small&or&medium&enterprise& Local&government&insDtuDon& NaDonal&government&insDtuDon& Memory&insDtuDon&or&content&holder& University& What%descrip-ons%fit%your%organiza-on?% Preserva'on  monitoring  survey 181 valid   par=cipants
  5. 5. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Preserva'on  monitoring  survey 5 92%$ 89%$ 78%$ 77%$ 76%$ 76%$ 75%$ 74%$ 69%$ 68%$ 64%$ 41%$ 51%$ 41%$ 40%$ 44%$ 23%$ 27%$ 17%$ 28%$ 25%$ 30%$ 18%$ 9%$ 18%$ 13%$ 12%$ 24%$ 22%$ 25%$ 25%$ 19%$ 23%$ 41%$ 40%$ 41%$ 46%$ 44%$ 53%$ 51%$ 58%$ 47%$ 55%$ 46%$ 0.00%$ 10.00%$ 20.00%$ 30.00%$ 40.00%$ 50.00%$ 60.00%$ 70.00%$ 80.00%$ 90.00%$ 100.00%$ File$corrup7on$ Backup$failure$ Staff$not$enough$or$adequate$ SoDware$plaForm$obsolescence$ Hardware$plaForm$obsolescence$ Lack$of$context$informa7on$ Incorrect$ac7on$results$ Consumers$misalignment$ Outdated$preserva7on$plans$ Producers$misalignment$ Content$not$aligned$with$policies$ Importance$(normalized$mean)$ Monitoring$ Not$monitoring$ Uncertain$or$No$answer$
  6. 6. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 6 Tools  for  uncovering  preserva'on  risks Content FITS C3PO Scout FITS  output     (XML) </> File  characteris=cs  distribu=on   (graphs  and  drill-­‐down  analysis) File  and  world  proper=es     throughout  =me  and  no=fica=ons
  7. 7. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). • hp://fitstool.org   • Characteriza=on   • Iden=fica=on   • Feature  extrac=on   • Valida=on   • Support  for:   • DROID   • JHove   • Apache  Tika   • ADL  Tool   • Exidool   • FFIdent   • File  U=lity  (windows  port)   • NLNZ  Metadata  Extractor   • OIS  Audio,  File  and  XML  Informa=on FITS  -­‐  File  Informa'on  Tool  Set • hps://github.com/keeps/fits/tree/keeps   • Developed  by  KEEPS   • Added  support  for:   • FIDO   • Microsod  Office   • Adobe  Illustrator   • Corel  Draw   • Email  (EML)   • Autocad  (DWG)   • Shapefile   • RTF,  TXT   • Databases  (DBML) 7
  8. 8. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). FITS  -­‐  File  Informa'on  Tool  Set • Demonstra=on   • Download  from  hp://fitstool.org   ! • Execute  for  a  file   ! ! • Execute  for  a  directory 8 ./fits.sh  -­‐i  file.png ./fits.sh  -­‐r  -­‐i  source_directory/  -­‐o  output_directory/
  9. 9. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). FITS  performance • hps://github.com/keeps/fits-­‐tes=ng   • 3  to  6  seconds  per  file   • 12  TB  -­‐  A  year     • hp://www.openplanetsfounda=on.org/blogs/2013-­‐01-­‐09-­‐year-­‐fits   • Other  op=ons  for  scalability:   • Fido   • Apache  Tika   • Nanite 9
  10. 10. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). C3PO  -­‐  Clever,  Cra?y  Content  Profile  of  Objects • hp://ifs.tuwien.ac.at/imp/c3po   • Web  applica=on   • Content  characteris=cs  aggrega=on     • Drill-­‐down  analysis 10
  11. 11. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). C3PO  install • Download  binaries  at:   • hp://dl.bintray.com/peshkira/c3po/   • Install  mongodb:   • hp://www.mongodb.org/   • Install  Apache  Tomcat   • hp://tomcat.apache.org/   • Put  C3PO  web  app  in  Apache  Tomcat   • Remove  ROOT  dir  for  webapps  and  rename  C3PO  web  app  to  ROOT.war   • Start  Apache  Tomcat  and  connect  to:   • hp://localhost:8080/   • Usage  guide:   • hps://github.com/peshkira/c3po/wiki/Usage-­‐Guide 11
  12. 12. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). C3PO  performance Dataset:  Statsbiblioteket  (Denmark)   • Size:  440M  files  (12  TB)   • Process  =me:  388h  (16  days)  /  50h  for  XML  report   • Average  =me:  2.5s  per  1000  files   • Web  applica=on  has  2.5  million  FITS  files  limit   12
  13. 13. Scout:  a  preserva'on  watch  system This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Monitors  aspects  of  the  world  to  detect  preserva=on  risks  and  opportuni=es 13 Content Policies Web Scout Risk notification Human knowledge Registries
  14. 14. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 14 Information Sources • Format registries & software catalogues • Digital repositories & web archives • Organizational objectives • Experiments • Simulation • Human knowledge
  15. 15. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). 15 Current information sources • Repository content and events • SCAPE Policy model • PRONOM • Web semantic extraction • Web page renderability experiments
  16. 16. 16 Define triggers • Notify me when there are tools that can render the format X.
  17. 17. 17 Define triggers Simple query with templates
  18. 18. 18 Receive notifications Email HTTP Push API There  are  tools  that  can  render  format  X.
  19. 19. 19 Interfaces Web page REST API
  20. 20. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). How to be a part of Scout • Checkout • Site: http://openplanets.github.io/scout/ • Report: http://www.scape-project.eu/deliverable/d12-2- final-version-of-the-preservation-watch-component • Demo: http://scout.scape.keep.pt • Integrate your content • Contribute with information (soon) • Use Scout form for manual input of knowledge 20
  21. 21. This  work  was  par,ally  supported  by  the  SCAPE  Project.   The  SCAPE  project  is  co-­‐funded  by  the  European  Union  under  FP7  ICT-­‐2009.4.1  (Grant  Agreement  number  270137). Roadmap • User  support   • More  trigger  templates   • More  adaptors   • KrakeN  /  Propminer     • Sodware  catalogues   • Other  format  registries   • Other  experiments  informa=on  sources   • Manual  input  (human  knowledge)   • Simula=on 21
  22. 22. Luis  Faria  lfaria@keep.pt   KEEP  SOLUTIONS  www.keep-­‐solu=ons.com SCAPE  webminar   July  26,  2014 Tools  for  uncovering  preserva=on   risks  in  large  repositories

×