social pharmacy d-pharm 1st year by Pragati K. Mahajan
Stop making tools! Nobody likes them anyway...
1. DANS is een instituut van KNAW en NWO
Data Archiving and Networked ServicesData Archiving and Networked Services
Stop making tools !
Nobody likes them anyway...
Christophe Guéret (@cgueret)
New Trends in eHumanities
16 April 2015
DANS is een instituut van KNAW en NWO
3. What kind of tool ?
●
Could be
– An interactive web site
– An “app” for smart phones
– A stand-alone software
●
Goal is to always let users consume the data
for their need
●
Actual tooling will depend on the skills and
preferences of the team member coding it!
4. Behaving scholars do a bit more
Data
collection
Data cleaning
and integration
Data
processing
Tool
New data Existing data
Happy users
5. The myths of long term use
●
Data and software sent to a digital trusted
repository will for sure be re-used later
●
Tools can be maintained after the project and
further improved to fit new needs
●
If the tool is not being used enough it should
be adapted to fit more user needs
6. In reality
●
Data that is not easy to use is not used
●
Tools are not maintained once the person
who coded it has moved onto other things
●
It is not possible to make everyone happy
and fit all research questions with one tool
7. Data re-use: could you do it ?
CEDAR all open on
github: data, queries
and scripts.
●
Usage example:
– Download dumps
– Install triple store
– Load data & wait
– Recursively query for
provenance
8. Data is the important thing
http://redmonk.com/jgovernor/2007/04/05/why-
applications-are-like-fish-and-data-is-like-wine/
Data
Tool
10. So what needs to be done ?
Do not bake the data into the tool. Instead
build the tool on top of the data, and ensure
others can do the same
Data
collection
Data cleaning
and integration
Data
processing
Data exposition
Tool 1 Tool 2 ...
11. In fact, do not write any tool
●
Focus on exposing the data
– Less time spent coding and less code
– Easier and cheaper to maintain
●
To increase availability, expose your data on
the Web
●
Exposing != Make a package and put it
somewhere
12. The magic keyword 1 : “API”
●
“In computer programming, an application
programming interface (API) is a set of
routines, protocols, and tools for building
software applications” - Wikipedia
●
Regardless of data, all the software you use
is a layered cake bound by software APIs
– Presentation software > GUI toolkit >
Rendering System > Operating System >
Hardware
13. Example (courtesy of Wikipedia)
●
In this code “nextLine” and “close” are part
of the API of “Scanner”
14. APIs can be on the Web too
●
HTTP can be used as an API too.
●
Get a specific record from a database
– http://example.com/api?action=show&id=500
●
Delete a record in a database
– http://example.com/api?action=delete&id=500
●
But don't do it that way! This is abusing the
role of the “GET” method from HTTP
15. Generic design for tool + API
●
Tools consume the data provided by a set of
APIs over the Web
●
If you are coding tools
– Forget about server-side page rendering
– Learn Javascript
Data API ToolMySQL, R, ... HTTP, JSON, ...
16. The magic keyword 2 : “REST”
●
“Representational State Transfer (REST) is a
software architecture style consisting of
guidelines and best practices for creating
scalable web services” - Wikipedia
●
For example: instead of using GET to do a
delete just use the DELETE method from
HTTP on the target resource
17. The magic keyword 3 : “JSON”
●
“JSON (/ d e s n/ JAY-s n), or JavaScriptˈ ʒ ɪ ə ə
Object Notation, is an open standard format
that uses human-readable text to transmit
data objects consisting of attribute–value
pairs” - Wikipedia
18. A step further with JSON-LD
●
JSON-LD is Linked Data expressed in JSON.
Let users follow links across datasets
●
Example of JSON data that is not JSON-LD
Ok, but what is the API call to get
more information about the board ?
● Need to figure it out in some way
● With LD you would get a link
Part of the result from http://api.openonderwijsdata.nl/api/v1/get_document/duo/po_school/2013-20YF
19. Web APIs
●
There is a lot of them (> 12k) and their
number is increasing rapidly. See:
http://www.programmableweb.com/
●
Some examples:
– https://dev.twitter.com/rest/public
– http://www.slideshare.net/developers/documentation
– http://developer.rottentomatoes.com/docs
– https://www.flickr.com/services/api/
21. Give less to share more
●
Noticed something about the examples given
in the previous slide on Web APIs ?
●
None of them would give you a copy of their
dataset, yet they have an API to let you
access the data !
●
=> API enable fine-grained access to data
22. Monetize a service, not a dataset
●
APIs open up the opportunity for monetizing
the usage of the data instead of the data
itself
●
Users can be charged per API call
●
Similar “download VS API” approaches
– Paid game VS Free to play
– Music download VS Streaming music
23. Extra technical bonuses
●
Most of the processing happens on the client
side, so less resources needed to serve the data
●
Finer tracking of data usage
●
Extra possibilities to do caching, do round-robin,
use CDNs etc => more easy to scale
28. To summarise
●
When your data is ready to be shared make first
an API for it. This will minimise friction in re-use.
●
If you want/need to write a end-user tool make it
use your own API (and others !)
●
Plan maintenance for the API to keep it running.