GOKb and Refine (Kuali Days 2013)

390 views
302 views

Published on

Presented at Kuali Days 2013 by Kristin Antelman and David Kay

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
390
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • LicensingWhat are you buying, what are you paying and how can you use itPurchasingCombination of bundles and 1-n individual itemsManaging AccessHow to get to it, what is wrong when you can’tManage ChangesWhat you bought and where you get it from changesEvaluatingMeasure value (quantitative and qualitative measures)
  • GOKb and Refine (Kuali Days 2013)

    1. 1. •The problem space •Tool Selection •Enhancements •Open Refine in Action •Features and Limitations •The Data Journey •So What? Kristin Antelman (North Carolina State University) David Kay (Sero Consulting, UK)
    2. 2. Problem Space / Domain Requirement • Unstructured messy data – Critical data is largely poorly controlled text strings (titles, publishers) – Data is sloppy: duplicate rows, blank rows, multiple values in single column, incorrectly formatted dates – Standards and identifiers exist but have poor -- or incorrect -- adoption • Bad data – Titles associated with wrong identifiers – Data is out of date (has changed) – Key data is missing
    3. 3. Problem Space / Domain Requirement
    4. 4. Library Book Lifecycle 4 Buy Circulate Preserve
    5. 5. Library E-Content Lifecycle 5 Create bundle License Purchase Manage access Manage changes Evaluate Archival access
    6. 6. Open Refine GOKb Database Kuali OLE Library API Ingest Publisher Source Data Ingest The Data Improvement Workflow
    7. 7. From Vision to Implementation July 2012 to October 2013 • Straw Man • Feasibility Study • Iterative Development
    8. 8. Lucas van Valckenborch (1535 or later–1597) [Public domain], via Wikimedia Commons Aspiratio n
    9. 9. Tools Selection
    10. 10. Feasibility Study Knowledge Integration – Summer 2012 Options • Open Rules • Drools • DIY • Google Refine Considerations • Open • Performance • Rule Syntax & Interface * • Rule Management * • Rule Precedence Support • Auditing • Deployment *
    11. 11. Open Rules Drools Expert DIY
    12. 12. Critical Factors • Geared to the main objective • Suited to the expected user skill sets • Ease of deployment • Scales in the ways we need • An open platform for integration and extensions • Supported by an active community Selection of Google Refine := Open Refine
    13. 13. Open Refine Extensions
    14. 14. GOKb Open Refine Extensions in the current release (September 2013) • Server side management – Projects – Check-out, Check-in – Rules • Refine UI extensions geared to GOKb expectations – Pre-edit checks – e.g. New file? White space? – Authority validation – e.g. Organisations – Feedback panel – Errors and Warnings – Access to Quick Resolutions involving stored transformations – Pre-processing impact assessment – what this will do to the database – Update options - Incremental and Replacement • Post-ingest support within GOKb – Audit trail, To do checklist
    15. 15. GOKb Open Refine Screen
    16. 16. Why Open Refine is a good fit for us (and may be for you as well) • Extensible • Supports collaboration/shared workspace • Supports users at multiple levels of expertise – Cross between a spreadsheet and a database for novices – GREL, JSON scripting – API calls to external data sets • But sometimes it’s not the right tool….
    17. 17. IMPORT TSV, CSV Text XML RDF Google spreadsheets EXPLORE Faceting Clustering FIX Normalization Common transforma- tions EXPORT TSV, CS Excel HTML Templating exporter
    18. 18. Round Trip Data Journ
    19. 19. OpenRefine GOKb Database Target Application s e.g. OLE Route 2 Route 4 Route 3 Route 1 API API Route 1 – New project Route 2 – CRED user edits Route 3 - Update project Route 4 – CRED Delta ingest Ingest What’s Next? The Round Trip RESTful APIs Supporting JSON
    20. 20. So what? Or … why might you be interested? The Application • Data cleansing / enhancement • Reuse … Automation • Managing distributed activity • Leveraging Refine and Excel user skills • Note - GOKb Extensions are Open Source The Meta Challenge • Kuali software and the evolving ecosystem • Tool selection • An example of community innovation
    21. 21. Open Refine Resources Tutorials, FAQs and the Open Refine wiki http://openrefine.org/documentation. About GREL https://github.com/OpenRefine/OpenRefine/wiki/Understanding-Expressions- Common formulas for editing with GREL https://github.com/OpenRefine/OpenRefine/wiki/Recipes Step-by-step tutorials http://www.davidhuynh.net/spaces/nicar2011/tutorial.pdf, http://freeyourmetadat a.org Book by the freeyourmetadata authors http://www.packtpub.com/openrefine-guide-for-data-analysis-and-linking-dataset- to-the-web/book GOKb guidance on Open Refine https://wiki.kuali.org/display/OLE/OpenRefine Twitter @OpenRefine

    ×