Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

WP3: overzicht van de voortgang van WP# op de CLARIAH-dag

456 views

Published on

Voortgangsoverzicht door Sjef Barbiers over de voortgang van WP3: Linguistiek

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

WP3: overzicht van de voortgang van WP# op de CLARIAH-dag

  1. 1. 1       Common Lab Research Infrastructure for the Arts and Humanities
  2. 2. 2     •  WP3  as  part  of  CLARIAH   – Discipline:  Linguis=cs   – Data  type:  primarily  textual  data   •  WP3  as  successor  of  CLARIN   •  WP3  ‘incorporates’  Nederlab  (NWO-­‐groot)  
  3. 3. 3     •  Linguis=cs   – Support  for  the  researcher  in  each  stage  of  a   research  project   • What  is  needed     • What  is  available   • What  func=onality  must  be  created  /   improved     •  Coopera=on  projects  with  WP2,  WP4  Soc   Econ  &  WP5  Media  Studies  
  4. 4. 4     •  Theme  1:  Data  and  metadata   •  Theme  2:  Interoperability   •  Theme  3:  Enrichment  and  annota=on   •  Theme  4:  Search  and  research  
  5. 5. 5     •  New  Resources     –  text  corpora,  crowd  sourcing,  survey  tool,  databases   •  Exis=ng  Resources   –  browsing  &  searching  for  data  and  tools  and  selec=ng  them   •  Enriching  resources   –  cura=on,  linguis=c  annota=ons,  transcrip=on,  named  en==es   •  Searching  /  analyzing  (enriched)  resources   •  Representa=on/visualiza=on  search  results   •  Store  new  resources  in  CLARIAH   •  Make  enhanced  publica=ons  
  6. 6. 6     •  Incorporate  data  /  tools  in  CLARIAH   – With  proper  metadata   – With  IPR/Ethical  Issues  properly  dealt  with   – Archiving  /  Ingest  func=onality   – Deployment  Framework   • How  to  run  services  efficiently   – Required:  standardiza=on  (input  –  output   formats),  metadata,  interface  elements   – Interoperability  (syntac=c  and  seman=c)  
  7. 7. 7     •  Interoperability   •  Linked  Open  Data   •  CMDI  è  RDF   •  En==es   •  Vocabularies   •  PICCL  
  8. 8. 8     •  Coopera=on  WP4  /  WP5   – Text  -­‐>  structured  data   – WP4:  e.g.  detect  strikes  in  newspapers  of     1965,  Athena   – WP5:  probably    convert  scanned  and  OCR’ed   `filmladders’  into  structured  data     – Speech  -­‐>  text  
  9. 9. 9     WP3  Demos  
  10. 10. 10     •  Search  applica=on    for  treebanks     •  LASSY,  CGN   •  One’s  own  corpus   •  Special  word  rela=ons  interface,    XPATH  interface     •  New:   •  meta-­‐data  in  the  search  query    (period,  sex,  region,  etc.)   •  results  can  be  presented  as  aggregate  or  split  by  metadata   •  Illustra=ons:   •  CGN  (Spoken  Dutch  Corpus)  with  metadata   •  Dutch  CHILDES  Corpora  with  metadata   •  hjp://zardoz.service.rug.nl:8067/    
  11. 11. 11     •  Search  applica=on  for  treebanks  (LASSY,  CGN,  SONAR)   •  Example-­‐based  interface,    XPATH  interface     •  New:  Uploading  one’s  own  corpus  
  12. 12. 12    
  13. 13. 13    
  14. 14. 14    
  15. 15. 15     •  Meertens  (Metadata,  Search,  Ingest,   Interoperability)   •  RUN  (Curate,  Enrich)   •  VU  (Interoperability,  Text-­‐>  Structured)   •  INL  (Search,  Metadata,  Interoperability)   •  RUG  (Enrich,  Search)   •  UU  (Metadata,  Search,  Interoperability)  
  16. 16. 16     WP  scien=fic  leader    Sjef  Barbiers      (Meertens)   Technical  coordinator    Daan  Broeder    (Meertens)   WP3  advisor          Jan  Odijk           Leader  RUN          Antal  van  den  Bosch   Leader  VU          Piek  Vossen   Leader  INL          Jan  Theo  Bakker   Leader  UU          Jan  Odijk   Leader  RUG          Gertjan  van  Noord   Leader  Meertens      Marc  Kemps  Snijders      

×