Let's do data research work: the
creation of a portal with research
information from Catalan
Universities
Ramon Ros i Gorn...
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Softwa...
New merged consortium in 2014
for catalan universities with more services and projects
• The current CBUC ones
• The curre...
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Softwa...
CSUC’s DSpace repositories
Coming soon on 2014
from 2001
www.tdx.cat
from 2009
www.mdx.cat
from 2012
repositori.filmoteca....
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Softwa...
Situation in 2012
– CBUC promotes IR since 1999
– Some universities (UPC & UPF) already have
research portals
– There are ...
What
• To create a portal to find the research outputs of the Catalan
research system
Why
• To increase the visibility of ...
PRC building. Firsts decisions
 Identifiers  ORCID
 Software  DSpace + CINECA CRIS
 Data mapping
 Data flow  from l...
ORCID as researcher identifier
1. Selection of identifier
– Decision based in a CBUC report: Sistemes d’identificació unív...
Evoloution of ORCID registered
researchers
* Data provided by ORCID. Number of researchers registered with their universit...
Software
• Based on DSpace‐CRIS of CINECA (like Hong Kong 
University)
• Main challenges (to adapt/develop)
– From one ins...
PRC entities
Universities
Departaments 
& Institutes
Research
groups
Researchers
Research
projects
Publications
(Articles ...
Lots of discussion on data mapping...
DSpace with the CRIS module.
Main entities
15
DSpace
Publication
CRIS module
Person
OrganizationOrganization
Project
DSpace with the CRIS module.
Detailed entities
16
DSpace
Publication
CRIS module
Person. Researcher
Organization. Research...
Data flow, protocols, sources and formats
Other
DRAC
Universitas XXI
GREC
SIGMA
UNEIX
Local and consortia
repositories. 
M...
CERIF model
cfExpertise
AndSkills
cfEquipmentcfFunding
cfFacility
cfService
cfCitation
cfEvent
cfLanguage cfCurrency
cfCou...
Simplification of CERIF for PRC
Simplified CERIF subset for PRC
cfPerson
cfProject
cfOrganisation
Unit
cfResult
Publication
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Softwa...
Main achievements
• Good working team
• People from ≠ universities and ≠ services
• Agreement: to use ORCID for researcher...
Step 1: prototipe
Sample data
Manual entry
Step 2: first batch load
Data sample from all universities.
CSV/XLS format Step...
Outline
1. Who we are
2. What we have (DSpace repositories)
3. The PRC project and firsts decisions
 Identifiers
 Softwa...
Work to be done & challenges
• Organizational
• More meetings with expert group
• ORCID ids implementation
• MoU for perso...
Thanks!
Ramon Ros i Gorné
(CSUC) 
ramon.ros@csuc.cat
http://www.csuc.cat
Upcoming SlideShare
Loading in …5
×

Let's do data research work: the creation of a portal with research information from Catalan Universities

770 views

Published on

Published in: Science, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
770
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Let's do data research work: the creation of a portal with research information from Catalan Universities

  1. 1. Let's do data research work: the creation of a portal with research information from Catalan Universities Ramon Ros i Gorné also Lluís M. Anglada i de Ferrer, Sandra Reoyo i Tudó and Ricard de la Vega i Sivera (CSUC) Open Respositories 2014  Helsinki, June 13th
  2. 2. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  3. 3. New merged consortium in 2014 for catalan universities with more services and projects • The current CBUC ones • The current CESCA ones • Join purchases (electricity, printing, cleaning, facilities, etc.) • Common data center • Portal for the research output (PRC) • Electronic administrative procedures. • Etc.
  4. 4. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  5. 5. CSUC’s DSpace repositories Coming soon on 2014 from 2001 www.tdx.cat from 2009 www.mdx.cat from 2012 repositori.filmoteca.cat Coming soon on 2014 from 2005 www.recercat.cat from 2010 calaix.gencat.cat Pilot on 2012 from 2013 www.cirax.cat
  6. 6. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  7. 7. Situation in 2012 – CBUC promotes IR since 1999 – Some universities (UPC & UPF) already have research portals – There are new standards and protocols that help interoperability between IR and CRIS – Research output is becoming more important for the univeristy managers.
  8. 8. What • To create a portal to find the research outputs of the Catalan research system Why • To increase the visibility of the research done in Catalonia • To foster OA • To increase interoperability between data How • Taking advantage of the leverage work previously done – In IR, CRIS and statistical data (Uneix) • The central idea: the works done for the portal will improve local IR and CRIS • Following international best practices – Narcis / Holland; HKU Scholars Hub / Hong Kong Decision in 2012
  9. 9. PRC building. Firsts decisions  Identifiers  ORCID  Software  DSpace + CINECA CRIS  Data mapping  Data flow  from local CRIS systems  Data exchange format  CERIF XML
  10. 10. ORCID as researcher identifier 1. Selection of identifier – Decision based in a CBUC report: Sistemes d’identificació unívoca d’investigadors / Àngel Borrego 2. Technical work – Modify all the local CRIS in order to allow to load the ORCID identifier – Promotion of ORCID id in other working groups: repositories, CCUC, Mendeley… 3. ORCID diffusion – We studied the ORCID API to create ORCID id automatically, but we decided not to use it – Merchandising, translations, videos, ‘good practices’ document ... – UB (the biggest university) have a mandate for an ORCID id in some process related with research assessment
  11. 11. Evoloution of ORCID registered researchers * Data provided by ORCID. Number of researchers registered with their university email. 0 200 400 600 800 1000 1200 1400 1600 1800 UB UAB UPC UPF UdG UdL URV UOC UVic UIC URL oct ‐13 feb ‐14 abr ‐14 jun ‐14 oct‐13 feb‐14 abr‐14 jun‐14 TOTAL UB 206 106 1263 128 1703 UAB 176 90 36 287 589 UPC 368 59 39 196 662 UPF 135 75 299 119 628 UdG 69 38 16 20 143 UdL 6 7 1 2 16 URV 102 48 42 25 217 UOC 43 11 11 14 79 UVic 18 150 2 24 194 UIC 11 2 5 41 59 URL 30 33 78 22 163 TOTAL 1164 619 1792 878 4453
  12. 12. Software • Based on DSpace‐CRIS of CINECA (like Hong Kong  University) • Main challenges (to adapt/develop) – From one institution to multi‐institution – From submit contents to harvest from local CRIS instances – Massive import mechanisms are needed (XML‐CERIF….)
  13. 13. PRC entities Universities Departaments  & Institutes Research groups Researchers Research projects Publications (Articles +  Books+ ETDs)
  14. 14. Lots of discussion on data mapping...
  15. 15. DSpace with the CRIS module. Main entities 15 DSpace Publication CRIS module Person OrganizationOrganization Project
  16. 16. DSpace with the CRIS module. Detailed entities 16 DSpace Publication CRIS module Person. Researcher Organization. Research groupOrganization. University ‐> comunities Organization. Department ‐> collections Author Project
  17. 17. Data flow, protocols, sources and formats Other DRAC Universitas XXI GREC SIGMA UNEIX Local and consortia repositories.  Mainly DSpace Catalan government DataWarehouse PRC. Based on DSpace+Cineca CRIS. 12 university CRIS  systems (from 4  different vendors) Protocol: OAI‐PMH/SWORD Format: DC Protocol: OAI‐PMH Format: CERIF‐XML Protocol: XLS files Format: UNEIX defined
  18. 18. CERIF model cfExpertise AndSkills cfEquipmentcfFunding cfFacility cfService cfCitation cfEvent cfLanguage cfCurrency cfCountry cfCurriculum Vitae cfPrize cfQualification cfGeographic BoundingBox cfPostalAddress cfElectronicAddress cfPerson cfProject cfOrganisation Unit cfResultPatent cfResult Publication cfResultProduct cfIndicator cfMeasurement cfFederated Identifier
  19. 19. Simplification of CERIF for PRC
  20. 20. Simplified CERIF subset for PRC cfPerson cfProject cfOrganisation Unit cfResult Publication
  21. 21. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  22. 22. Main achievements • Good working team • People from ≠ universities and ≠ services • Agreement: to use ORCID for researchers • Already done – We succeed to export 20 complete data records from 11 universities (using 5 different CRIS) – All the CRIS systems already have a field for ORCID – A good program selected • Adopted by EUROCRIS as repository because CERIF compliance
  23. 23. Step 1: prototipe Sample data Manual entry Step 2: first batch load Data sample from all universities. CSV/XLS format Step 3: full batch load All data from all universities. CSV/XLS format Step 4: CERIF‐XML  ingest First manual CERIF‐XML ingest Step 5: OAI‐PMH automatic ingest. Full syncronization with local  CRIS systems. Implementation steps
  24. 24. Outline 1. Who we are 2. What we have (DSpace repositories) 3. The PRC project and firsts decisions  Identifiers  Software  Data mapping  Data flow  Data exchange format 4. Current status 5. Work to be done
  25. 25. Work to be done & challenges • Organizational • More meetings with expert group • ORCID ids implementation • MoU for personal data • External adaptation • Local CRIS system to adapt XML‐CERIF wrapping (export) • Portal implementation • Ingest the full data of all institutions • Design and build the user interfaces • Develop the CERIF‐XML import mechanisms • Think about depuration & deduplication data mechanisms
  26. 26. Thanks! Ramon Ros i Gorné (CSUC)  ramon.ros@csuc.cat http://www.csuc.cat

×