Invenio was born at CERN as a digital library software solution to run the CERN document server, managing over 1,000,000 bibliographic records in high-energy physics since 2002. Covering articles, books, journals, photos, videos, and more. It has been recently completely rewritten with Elasticsearch becoming a key element in this new version. Invenio is no longer limited to digital repository functionalities, but can be used for many purposes. This talk will describe how Elasticsearch has been integrated in the new Invenio version.
1. Elasticsearch: a key element of Invenio 3
Elasticsearch Meetup
Johnny Mariéthoz
Lausanne, 2017/03/10
2. About Me
12 years as computer scientist in machine learning
7 years as Invenio developer and instance maintainer
bass and double bass player
newbie as analog camera photographer
Library Network of Western Switzerland 2 Lausanne, 2017/03/10
3. Library Network of Western Switzerland
Library Network of Western Switzerland 3 Lausanne, 2017/03/10
4. RERO: Library Network of Western Switzerland
220 libraries
academic libraries, heritage
libraries, public libraries, school
libraries or specialized libraries
50’000 students
5 cantons: FR, GE, JU, NE, VS
280’000 registered patrons
3 academic universities
Geneva, Fribourg, Neuchâtel
1 University of Applied Sciences
central office
in Martigny
19 employees
Library Network of Western Switzerland 4 Lausanne, 2017/03/10
5. Typical Data Centered Web Application
Data Web Server
Data
Schema
Persistant
Storage
Search
Engine
PID Store
External
Files
REST API
HTML WEB
Pages
GUI
Search Engines:
Google, etc.
External Services
Access Rights
Files
Download /
Preview for users
Other
Formats
Browser
apps
Desktop
apps
Library Network of Western Switzerland 5 Lausanne, 2017/03/10
6. Common Needed Features
data management with versioning, validation and PID
(Persistent Identifiers)
search engine
rights management (ACL, oauth)
web page management with templates (search results and
others, such as news, front-page, etc.)
url management (routing)
REST API generation
format conversion/migration
CLI utilities
data acquisition (html forms based editor)
Library Network of Western Switzerland 6 Lausanne, 2017/03/10
7. Development
modular software architecture
easy new module creation
webassets management for less, sass, nodejs, etc.
asynchronous task management
unit testing and logging
i18n (translations)
web front-end and back-end
and many more...
Library Network of Western Switzerland 7 Lausanne, 2017/03/10
9. History
digital library and document repository software
created by CERN
mature platform: first public release v0.0.9 in 2002
open source project
originated in high-energy physics
institutional repository: CERN Document Server
integrated library system: CERN Document Server
disciplinary repository: INSPIRE
open research data server: ZENODO
self-contained python, mysql web application until 1.x
transition with v2.x
complete new rewritten v3
Library Network of Western Switzerland 9 Lausanne, 2017/03/10
10. Used Technologies
set of python modules
include module interaction mechanisms
delivered as a framework around state-of-the-art
technologies
steep learning curve
Library Network of Western Switzerland 10 Lausanne, 2017/03/10
12. Elasticsearch Integration
SQL only for persistent data, no more SQL query during
the HTTP request
data model using JSON, JSON-Schema and ES Mapping
sorting, facets and query configuration by type of object
use official elasticsearch python package:
elasticsearch-dsl
CLI to create indexes, push mappings and index the data
Library Network of Western Switzerland 12 Lausanne, 2017/03/10
15. Schema and Mapping
JSON-SCHEMA: schemas/records/record-v1.0.0.json
should be included in the data ($schema)
used for data validation
is the documentation for humans
name is important: i.e. index_name: records-record-v1.0.0,
document_type: record-v1.0.0 for record indexing
Mapping mappings/records/record-v1.0.0.json
can be set using a CLI
name is important: index_name: records-record-v1.0.0 with
alias=records, document_type: record-v1.0.0 during the
index creation
Library Network of Western Switzerland 15 Lausanne, 2017/03/10
20. Conclusion
very generic and flexible tool
great open source community (many thanks to the CERN)
easy to prototype and develop new applications and
features
demands time to master (learning curve)
at the center of RERO’s future developments
swiss open access research publications repository
(SONAR)
new Integrated Library System (ILS) for public libraries (3
years project)
and many more projects...
Library Network of Western Switzerland 20 Lausanne, 2017/03/10