Advertisement
Advertisement

More Related Content

Similar to DSpace at ILRI: A semi-technical overview of “CGSpace”(20)

Advertisement

More from ILRI(20)

Advertisement

DSpace at ILRI: A semi-technical overview of “CGSpace”

  1. A semi-technical overview of “CGSpace” DSpace at ILRI Alan Orth KAINET Open Data and Open Science’ Workshop Nairobi, Kenya, 18 June 2015
  2. History of DSpace at ILRI ● 2009: ILRI launches Mahider (“repository” in Amharic) ● 2010: Other CGIAR centers and programs join our platform and share hard / soft costs ● 2011: Rebranded as “CGSpace” ● 2015: 9 CGIAR centers, ~50,000 items, ~250k hits/month
  3. “CGSpace” in June, 2015
  4. How we use DSpace ● Content people embedded in each department help capture results (presentations, papers, brochures, etc) ● Primary location for institutional outputs! ● No posting PDFs on corporate website! ● Integrate with website and blogs via RSS feeds ● Direct ALL traffic to DSpace! ● For data sets, videos, etc we make a metadata- only accession with a link to eg YouTube
  5. ● Communities, sub-communities, and collections ● Tempting to model after organization hierarchy! ● (we did) ● … but organization hierarchies change! DSpace hierarchies
  6. Mostly organized by output type now...
  7. Metadata ● Standard Dublin Core is available ● No AGROVOC ● You can create custom controlled vocabularies in arbitrary namespaces, eg: cg.subject.ilri
  8. Custom metadata in ILRI report Not AGROVOC!
  9. “Discovery” facets ● Context-aware metadata summaries ● Side effect: helps spot metadata inconsistencies! ● … Open Access, Open access, open Access, etc.
  10. Search engine optimization (SEO) Help Google Scholar consume your content! ● XML sitemaps ● Consistent domain name, eg: cgspace.cgiar.org ● Persistent links for resources ● Website speed and HTTPS both a plus ● Sign up for Google Webmaster Tools to submit sitemap, control indexing, see stats, etc
  11. Sitemap view in Google Webmaster Tools
  12. Importance of persistent links ● Website addresses change… ● mahider.ilri.org -> cgspace.cgiar.org ● But resources stay the same! http://hdl.handle.net/10568/67073 ● “Handle” service from handle.net ● Everything under prefix 10568 is CGSpace ● Default DSpace handle prefix is 123456789!
  13. dc.identifier.uri specifies an item’s persistent universal resource identifier (URI)
  14. Getting data INTO DSpace ● Day-to-day submission is manual, by a small army of editors ● One-time batch uploads of items from other systems in CSV format (InMagic!) ● OAI-PMH for metadata only ● OAI-ORE for metadata + bitstreams (eg, from another DSpace or Sharepoint, etc) ● SWORD (haven't tried) ● REST API (DSpace 5+, haven't tried)
  15. Getting data OUT OF DSpace ● REST API for structured JSON or XML ● OAI-PMH for metadata ● OAI-ORE for metadata + bitstreams (PDFs, etc) ● RSS feeds for websites / blogs ● XML sitemaps for search engines* *Google discontinued the use of OAI for discovering site content in 2008! http://googlewebmastercentral.blogspot.com/2008 /04/retiring-support-for-oai-pmh-in.html
  16. CCAFS website, driven by Drupal + DSpace APIs
  17. “Latest outputs” on project blog populated via RSS, links to CGSpace
  18. Open source workflow on GitHub https://github.com/ilri/DSpace
  19. Skills needed in your organization Besides content people(!)... ● Prioritize Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git) ● General: computer science background ● Web developers a diverse bunch... ● Java development experience doesn't hurt
  20. Extra considerations ● Item mapping ● Maintenance tasks (background batch jobs) ● Backups of assetstore and PostgreSQL! ● Altmetrics tracks social media mentions ● Separate production / development environments ● CGSpace server is $80/month ● ~20GB of PDFs, ~8GB of Solr data
  21. Getting help ● “DSpace Tech” mailing list ● “dspace” tag on StackOverflow website ● a.orth@cgiar.org

Editor's Notes

  1. Introduce self as computer scientist, apologize for limited knowledge of library stuff. How we do things plus lessons learned.
  2. Mention search engine stumbling and parsing vs consuming structured content
Advertisement