Be the first to like this
Mahendra Mahey, manager of British Library Labs (BL Labs) will examine some of the BL’s digital collections/data & discuss challenges he has had in making the BL's cultural heritage data available openly or onsite at the British Library.
Mahendra will invite delegates to explore data-sets at their leisure, setting a challenge for those who are interested, skilled in exploring, finding patterns and grouping data. They could become data-set authors/creators of derived data-sets, based on pre-existing digital collections/data provided on the day or already available on https://data.bl.uk.
The workshop will conclude with reflections from the delegates and possibly highlighting a number derived data-sets that were generated by participants on the day that could now potentially exist on https://data.bl.uk. If selected, these new derived data-sets will be attributed with the creators' / authors' details and each will have its own cite-able Digital Object Identifier (D.O.I). These new data-sets would then be available for reuse by any researcher in the world.
GUIDANCE FOR THIS WORKSHOP
We strongly recommend you come to this workshop with an appropriate device such as a laptop pre-installed with appropriate tools to analayse different kinds of data-sets, e.g. Microsoft Excel may work with smaller data-sets such as metadata (see other data exploration tools below). If you don't have one, and would still like to attend, please request to 'pair up' with someone who is willing to share and has already signed up.
Other data exploration tools include: Notepad++ (e.g. for viewing text and XML); Open Refine (e.g. for cleaning data); Tableau Public (e.g. for visualising data); Google Fusion Tables (e.g for visualising geo-spatial data); Spacy (e.g. for text and data mining), RStudio (an open source Statistical package), MATLAB (data analysis tool) & NLTK (Natural Language processing).
Please note that this workshop is NOT about training you in using any of these tools, just tools you may be already familiar with to explore and find patterns in our data.
Datatypes you may be examining in this workshop could include: .ZIP, .PDF, .TXT, .CSV, .TSV. .XLS, .XLSX, RDF, .nt, XML (TEI, ALTO and bespoke), .JSON, .JPG, .JPEG, .TIFF and .WARC
Please ensure you are able to read these files on your device before the workshop if you are interested in exploring them during our session.
Slides for session: http://goo.gl/
URL for specific data: http://
Mahendra Mahey tweets at @BL_Labs & @mahendra_mahey