SlideShare a Scribd company logo
1 of 29
Download to read offline
Knowledge	
  Discovery	
  in	
  	
  
Climate	
  Change	
  Domain	
  
Pinar	
  Öztürk,	
  Erwin	
  Marsi	
  
Norwegian	
  University	
  of	
  Science	
  and	
  Technology	
  
(NTNU),	
  Norway	
  
	
  
Natalia	
  Manola	
  	
  
University	
  of	
  Athens,	
  Greece	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
1	
  
Outline	
  
•  IntroducQon	
  of	
  Ocean-­‐Certain	
  (OC)	
  	
  EU-­‐	
  project	
  
•  Knowledge	
  Discovery	
  in	
  OC	
  
•  Decisions	
  underlying	
  OC’s	
  Knowledge	
  Discovery	
  system	
  
–  Type	
  of	
  knowledge	
  to	
  focus	
  on	
  
–  Corpus	
  
–  Text	
  mining	
  subtasks	
  
–  Technology/tool	
  
–  External	
  sources	
  
•  Some	
  results	
  (and	
  examples)	
  so	
  far	
  
•  Conclusions	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
2	
  
EU	
  Project	
  “Ocean	
  Certain”	
  
•  Title:	
  Ocean	
  Food	
  web	
  Patrol	
  –	
  Climate	
  Effects:	
  Reducing	
  
Targeted	
  UncertainQes	
  with	
  an	
  InteracQve	
  Network	
  
•  Work	
  programme	
  topic	
  :	
  F7-­‐	
  ENV.2013.6.1-­‐1	
  
•  EU	
  funding:	
  7.1	
  Mill	
  Euro	
  
–  Our	
  workpackage	
  :	
  40	
  man/month	
  
–  3	
  years,	
  with	
  start	
  Nov.	
  2014	
  
List of participants:
Partner no. * Participant organisation name Country
1 (Coordinator) Norwegian University of Science and Technology Norway
2 University of Bergen Norway
3 GEOMAR Helmholtz Centre for Ocean Research Kiel Germany
4 Vlaamse Instelling voor Technologisch Onderzoek Belgium
5 DEU-IMST Turkey
6 University of Gothenburg Sweden
7 Griffith University Australia
8 Universidad Austral de Chile Chile
9 National Research Council and Institute of Marine Sciences Italy
10 Centre for Environment, Fisheries & Aquaculture Science UK
11 World Ocean Council UK
12 Universidad de Concepción Chile
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
3	
  
Overarching	
  goal	
  of	
  Ocean-­‐Certain	
  
•  IdenQfying	
  the	
  interacQons	
  (impacts	
  and	
  feedbacks)	
  
between	
  the	
  climate	
  related	
  oceanic	
  processes,	
  food	
  
web	
  and	
  biological	
  pump	
  
•  Determining	
  qualitaQve	
  and	
  quanQtaQve	
  changes	
  in	
  the	
  
funcQonaliQes	
  of	
  the	
  “food	
  web”	
  and	
  esQmaQng	
  the	
  
efficiency	
  of	
  the	
  “biological	
  pump”	
  in	
  exporQng	
  carbon	
  to	
  
deep	
  sea	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
4	
  
Diatom
Autotrophic
flagellates
Heterotrophic
bacteria
Meso
zooplankton
Heterotrophic
flagellates
Ciliates
B-DOC
Refractory
DOC
Biological Pump
Macro & micro
nutrients
CO2	
  
Slow	
   Fast	
  
Figure	
  from	
  Thingstad et al. (2008)	
  
Climate	
  change	
  domain	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
5	
  
Main	
  obstacle	
  of	
  scienQfic	
  discovery	
  
	
  
•  is	
  oien	
  not	
  lack	
  of	
  scienQfic	
  research	
  and	
  reporQng	
  of	
  
these	
  –	
  i.e.,	
  not	
  knowledge	
  
•  is	
  the	
  lack	
  of	
  ability	
  of	
  linking	
  various	
  disciplines	
  and	
  
making	
  sense	
  out	
  of	
  the	
  accumulated/documented	
  
knowledge	
  	
  across	
  disciplines	
  –	
  i.e.,	
  inferring	
  new	
  
knowledge	
  from	
  the	
  exisQng	
  knowledge	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
6	
  
Why	
  knowledge	
  linking	
  is	
  challenging	
  
•  Vast	
  amount	
  of	
  literature	
  	
  
and	
  growing	
  –	
  info	
  overload	
  
•  Increased	
  specialisaQon	
  
•  Isolated	
  research	
  communiQes	
  	
  
and	
  literatures	
  –	
  research	
  silos	
  
•  Different	
  convenQons	
  and	
  
terminology	
  
	
  
	
  
StraT:	
  StraQficaQon	
  
DOC:	
  Dissolved	
  organic	
  carbon	
  
EP:	
  Carbon	
  Export	
  (synonym:	
  biological	
  pump)	
  
MOC:	
  Meridional	
  Overturning	
  current	
  
NAO:	
  North	
  AtlanQc	
  oscilliaQon	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
	
  
	
  
MOC	
  
NAO	
  
DOC	
  
EP	
  
StraT	
  
	
  
Biogeochemistry	
  
Marine	
  
ecosystems	
  
Physical	
  
Oceanography	
  
7	
  
ComputaQonal	
  support	
  for	
  handling	
  scienQfic	
  
text	
  
•  Support	
  the	
  user	
  in	
  various	
  ways	
  
-  Search	
  	
  
-  QuesQon-­‐answering	
  
-  CitaQon	
  analysis	
  	
  
-  Trend	
  discovery	
  
-  Hypothesis	
  generaQon	
  –	
  literature-­‐based	
  knowledge	
  discovery	
  
	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
8	
  
Search	
  
•  Literature	
  search	
  works	
  reasonably	
  well	
  
–  ScienceDirect,	
  Google	
  Scholar,	
  Medline/PubMed,	
  ...	
  
	
  
•  However,	
  keyword	
  search	
  only	
  returns	
  arQcles	
  
–  Who	
  has	
  Qme	
  to	
  sii	
  through	
  hundreds/thousands	
  of	
  abstracts	
  
or	
  full	
  papers?	
  
	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
9	
  
 Hypothesis	
  generaQon	
  
•  Two	
  main	
  cogniQve	
  tasks	
  
–  IdenQfying	
  important	
  knowledge	
  pieces	
  
–  Inferring	
  new	
  knowledge	
  from	
  these	
  pieces	
  
	
  
•  Focus	
  in	
  this	
  presentaQon:	
  IdenQficaQon	
  of	
  knowledge	
  pieces	
  
in	
  scienQfic	
  papers	
  
	
  
•  ComputaQonal	
  method:	
  	
  
–  Literature-­‐based	
  knowledge	
  discovery	
  (LBKD)	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
10	
  
LBKD	
  history	
  –	
  Swanson	
  example	
  	
  
	
  
A	
  
B	
  
C	
  
1.  RelaQon	
  of	
  spreading	
  depression	
  to	
  the	
  visual	
  
scotomata	
  of	
  classical	
  migraine	
  	
  
2.  Magnesium	
  in	
  the	
  extracellular	
  cerebral	
  fluid	
  can	
  
prevent	
  or	
  terminate	
  spreading	
  depression	
  
	
  
3.  INFER:	
  	
  	
  migraine	
  ß-­‐àmagnesium	
  deficiency	
  
	
  
Inference:	
  
A	
  influences	
  B	
  
B	
  influences	
  C	
  
Hence	
  A	
  influences	
  C	
  
(From	
  Wikipedia)	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
11	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
Example:	
  Hypothesis	
  GeneraQon	
  in	
  OC	
  	
  
	
  
	
  
↑iron	
  →	
  ↓CO2	
  
↑iron	
  →	
  ↑phytoplankton	
  
	
  ↑phytoplankton	
  →	
  ↑photosynthesis	
  
↑photosynthesis	
  →	
  ↓CO2	
  
	
  
Iden7fy	
  important	
  
knowledge	
  pieces	
  
Infer	
  new	
  knowledge	
  from	
  
these	
  pieces	
  
	
  
12	
  
Text	
  mining	
  for	
  extracQon	
  of	
  knowledge	
  pieces	
  
•  Text	
  mining	
  deals	
  with	
  idenQficaQon	
  and	
  extracQon	
  of	
  
phrases/sentences	
  of	
  interest	
  
	
  
•  Techniques	
  :	
  natural	
  language	
  processing,	
  informaQon	
  
retrieval,	
  machine	
  learning,	
  informaQon	
  extracQon,	
  
various	
  staQsQcs-­‐based	
  techniques	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
13	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
Design	
  of	
  	
  a	
  text	
  mining	
  system	
  for	
  OC	
  -­‐1	
  
Process	
   Change	
  events	
  
Biological	
  Pump	
  	
  
Variables	
   RelaQonships	
  between	
  	
  
	
  	
  	
  	
  	
  events	
  
Increase/decrease	
  
/change	
  
pH,	
  temperature	
  
Chemical	
  
compounds	
  
Biological	
  species	
  
Causal/correlaQonal	
  
↑iron	
  
	
  ↑phytoplankton	
  
	
  
↑iron	
  →	
  ↑phytoplankton	
  
“Gran	
  (1933)	
  was	
  among	
  the	
  first	
  to	
  
demonstrate	
  that	
  the	
  addi7on	
  of	
  iron	
  to	
  
seawater	
  may	
  s7mulate	
  the	
  growth	
  of	
  
phytoplankton.”	
  
…..that	
  the	
  addi7on	
  of	
  
iron	
  to	
  seawater	
  may	
  
s-mulate	
  the	
  growth	
  of	
  
phytoplankton.”	
  
1. Decide	
  what	
  type	
  of	
  knowledge	
  to	
  aFend	
  to	
  
14	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
Change	
  	
  events	
  
“Gran	
  (1933)	
  was	
  among	
  the	
  first	
  to	
  
demonstrate	
  that	
  the	
  addi7on	
  of	
  iron	
  to	
  
seawater	
  may	
  s7mulate	
  the	
  growth	
  of	
  
phytoplankton.”	
  
•  ↑iron	
  	
  
•  	
  ↑phytoplankton	
  
15	
  
Event	
  expressions	
  in	
  natural	
  language	
  
•  Same	
  event	
  may	
  be	
  expressed	
  in	
  various	
  ways	
  in	
  natural	
  language	
  
•  E.g.,	
  “increase”:	
  
–  “Rise	
  in	
  atmospheric	
  CO2	
  levels…”	
  
–  “…addiQon	
  of	
  iron…”	
  
–  “elevated	
  value	
  of	
  	
  …..”	
  	
  
	
  
•  E.g.,	
  “decrease”:	
  
–  “…to	
  slow	
  down	
  calcificaQon	
  in	
  corals..”	
  
–  “decreasing	
  temperature…”	
  
–  “…reduced	
  pH	
  	
  value…”	
  
	
   LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
16	
  
Design	
  of	
  	
  a	
  text	
  mining	
  system	
  for	
  OC-­‐	
  2	
  
2.  Design	
  and	
  construct	
  Corpus	
  
–  Decide	
  which	
  disciplines,	
  publishers,	
  journals	
  
.  Ensure	
  sufficient	
  coverage,	
  i.e	
  number	
  and	
  variety	
  of	
  
publicaQons	
  
.  Currently	
  10	
  K	
  papers	
  from	
  Nature	
  
.  Problems	
  with	
  open	
  access	
  –	
  text	
  mining	
  &sharing	
  rights	
  
	
  
3.  Determine	
  text	
  mining	
  subtask(s):	
  
–  Event	
  extracQon	
  
–  Causal/CorrelaQonal	
  relaQonships	
  between	
  events	
  
–  Recognizing	
  EnQty	
  menQons	
  
–  Linking	
  to	
  ontologies	
  and	
  generalizaQon	
  of	
  terms	
  LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
17	
  
Design	
  of	
  	
  a	
  text	
  mining	
  system	
  for	
  OC	
  -­‐	
  3	
  
4.  IdenQfy	
  the	
  tools	
  to	
  be	
  used	
  
	
  
5.  Decide	
  the	
  external	
  sources	
  to	
  be	
  used	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
18	
  
Design	
  of	
  	
  a	
  text	
  mining	
  system	
  for	
  Climate	
  
Change	
  domain	
  -­‐	
  4	
  
•  Preliminary	
  yet	
  
	
  
•  Our	
  Strategy:	
  try	
  to	
  map	
  if/which	
  of	
  the	
  exisQng	
  tools,	
  methods,	
  
and	
  external	
  sources	
  developed	
  for	
  other	
  domains	
  (e.g.,	
  
biomedicine,	
  news	
  text,	
  digital	
  heritage	
  etc)	
  are	
  relevant	
  
•  Tools:	
  
–  NLP	
  tools	
  &	
  AnnotaQon	
  tools,	
  e.g.,	
  Stanford’s	
  NLP,	
  GATE,	
  
Brat	
  annotaQon	
  tool	
  
•  External	
  resources	
  
–  Controlled	
  vocabularies,	
  terminologies,	
  thesauruses,	
  
ontologies,	
  data	
  bases	
  
–  Examples:	
  dbpedia,	
  Wiki,	
  WordNet,	
  Chebi,	
  Oscar,	
  
ChemSpot,,	
  linnaeus2	
  	
  LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
19	
  
Example:	
  IdenQfy	
  tools	
  &external	
  resources	
  for	
  
named-­‐enQty-­‐recogniQon	
  (NER)	
  in	
  Ocean-­‐Certain	
  
•  Named	
  enQQes	
  are	
  the	
  enQQes	
  of	
  interest	
  
–  Examples	
  in	
  	
  news	
  text:	
  people	
  names,	
  organisaQons,	
  places	
  
–  Examples	
  in	
  Ocean-­‐certain:	
  chemical	
  compounds,	
  biological	
  species,	
  
locaQons	
  
•  A	
  lot	
  of	
  NER	
  systems	
  but	
  mostly	
  built	
  for	
  other	
  domains	
  (e.g,	
  
news,	
  humaniQes	
  or	
  biomedicine)	
  	
  	
  	
  	
  
•  Check	
  whether/which	
  exisQng	
  NER	
  systems	
  can	
  be	
  used	
  for	
  
processing	
  papers	
  in	
  the	
  climate	
  change	
  domain	
  
•  In	
  parQcular,	
  we	
  are	
  evaluaQng	
  :	
  
–  CoreNLP	
  (for	
  geographical	
  locaQons)	
  	
  	
  
–  Linnaeus2	
  	
  	
  (species)	
  
–  Oscar3	
  (chemical	
  compounds)	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
20	
  
EvaluaQon	
  of	
  exisQng	
  NER	
  tools	
  
AnnotaQon	
  using	
  Brat	
  
Selected	
  abstract-­‐corpus	
  
Manually	
  tagged	
  corpus	
  
NER	
  system	
  (e.g.	
  Oscar)	
  
Test	
  abstract	
  
System-­‐tagged	
  abstract	
  
EvaluaQon	
  algorithm	
  
Judgment	
  of	
  appropriateness	
  of	
  the	
  NER	
  system	
  to	
  CC	
  domain	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
21	
  
From	
  Sean	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
22	
  This	
  slide	
  is	
  prepared	
  by	
  Sean	
  Holloway	
  –	
  MSc	
  student(NTNU),	
  2015	
  
NER	
  candidates	
  and	
  the	
  external	
  resources	
  
NER	
  System	
  Results	
  
Oscar3	
  
ChemSpot	
  
Linnaeus2	
  
SPECIES	
  
OrganismTagger	
  
IllinoisNE	
  
CoreNLP	
  
OpenNLP	
  
CC	
  corpus	
  
(abstracts)	
  
species	
  
chemical	
  substance	
   loca-on	
  
23	
  
EvaluaQon	
  experiments	
  are	
  run	
  by	
  Sean	
  Holloway	
  –	
  MSc	
  student,	
  2015	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
Sharing	
  extended	
  resources?	
  
•  Preprocessed	
  scienQfic	
  papers	
  in	
  machine	
  readable	
  format	
  
–  10	
  K	
  full	
  papers	
  from	
  Nature	
  but	
  we	
  cannot	
  share	
  them	
  
•  Annotated	
  papers	
  
–  Two	
  types	
  of	
  annotaQons 	
  	
  
•  For	
  EnQty	
  recogniQon	
  
•  For	
  relaQon	
  and	
  event	
  recogniQon	
  
•  Currently	
  crawling	
  	
  open	
  access	
  (PLOS	
  first)	
  	
  publicaQons-­‐	
  
aiming	
  to	
  prepare	
  and	
  share	
  a	
  large	
  volume	
  corpus	
  for	
  CC	
  
domain	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
24	
  
Annotated	
  gold	
  standard	
  –	
  not	
  shared	
  	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
25	
  
Summary	
  
•  Text	
  mining	
  as	
  a	
  support	
  to	
  scienQfic	
  discovery	
  
–  The	
  preliminary	
  results	
  promising	
  for	
  extracQon	
  of	
  	
  
•  enQQes/variables,	
  	
  
•  change	
  events	
  and	
  	
  
•  relaQons	
  between	
  events	
  
	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
26	
  
Conclusion	
  
•  Some	
  of	
  the	
  exisQng	
  tools	
  (general	
  and	
  specific	
  to	
  other	
  domains)	
  
may	
  be	
  useful	
  
	
  
•  However,	
  we	
  need	
  to	
  adapt	
  and	
  extend	
  these	
  for	
  the	
  CC	
  domain 	
  	
  
	
  
•  Corpus	
  is	
  an	
  important	
  problem	
  
–  We	
  cannot	
  share	
  the	
  preprocessed	
  and	
  annotated	
  corpus	
  we	
  
create	
  
–  We	
  would	
  not	
  possible	
  use	
  others’	
  resources	
  because	
  of	
  the	
  
same	
  reasons	
  
–  RepeQQon	
  of	
  task	
  	
  (inefficient	
  use	
  of	
  money	
  and	
  Qme)	
  
–  Slows	
  down	
  our	
  own	
  work	
  as	
  well	
  as	
  the	
  knowledge	
  discovery	
  
research	
  in	
  CC	
  domain,	
  because	
  of	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
27	
  
Future	
  work	
  
•  PreparaQon	
  of	
  a	
  corpus	
  for	
  the	
  CC	
  domain	
  –	
  larger	
  volume	
  and	
  sharable	
  
•  Currently	
  working	
  on	
  automated	
  crawling&preprocessing	
  that	
  fits	
  to	
  
variaQons	
  in	
  various	
  publishers	
  
•  We	
  need	
  more	
  annotaQon,	
  meaning	
  more	
  people,	
  more	
  funding	
  
•  Planning	
  to	
  apply	
  EU	
  	
  and	
  Norwegian	
  Research	
  council	
  for	
  funding	
  
•  Organizing	
  a	
  workshop	
  (in	
  connecQon	
  with	
  the	
  OC	
  project)	
  to	
  gather	
  
people	
  working	
  in	
  text	
  mining	
  in	
  Earth	
  science	
  
•  Want/need	
  collaboraQon	
  with	
  other	
  people/universiQes	
  
•  The	
  work	
  presented	
  here	
  is	
  partly	
  reported	
  in	
  :	
  
Marsi,	
  Erwin;	
  Özturk,	
  Pinar;	
  Aamot,	
  Elias;	
  Sizov,	
  Gleb	
  Valerjevich;	
  Ardelan,	
  Murat	
  
Van.	
  (2014)	
  Towards	
  Text	
  Mining	
  in	
  Climate	
  Science:	
  ExtracQon	
  of	
  QuanQtaQve	
  
Variables	
  and	
  their	
  RelaQons.	
  Proceedings	
  of	
  the	
  Ninth	
  Interna7onal	
  Conference	
  on	
  
Language	
  Resources	
  and	
  Evalua7on	
  (LREC'14).	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
28	
  
Demo	
  
•  hvp://www.idi.ntnu.no/~emarsi/ocwp1/
chavarex	
  
LIBER	
  Conference	
  2015,	
  
ÖztürkMarsiManola	
  
29	
  

More Related Content

Viewers also liked

Maxaudience - 700 Clients Served
Maxaudience - 700 Clients ServedMaxaudience - 700 Clients Served
Maxaudience - 700 Clients ServedLisa Cooney
 
How to select an engagement ring of premium quality
How to select an engagement ring of premium quality How to select an engagement ring of premium quality
How to select an engagement ring of premium quality Taula
 
2580H membrane press machine
2580H membrane press machine2580H membrane press machine
2580H membrane press machineFeida Wang
 
Liber presentation-london2015
Liber presentation-london2015Liber presentation-london2015
Liber presentation-london2015pinarozturk99
 
10 types infographics
10 types infographics10 types infographics
10 types infographicseldavis44
 
Fundamentals of Film Finance
Fundamentals of Film FinanceFundamentals of Film Finance
Fundamentals of Film FinanceRob Ogden
 

Viewers also liked (8)

German_Final_Project
German_Final_ProjectGerman_Final_Project
German_Final_Project
 
Maxaudience - 700 Clients Served
Maxaudience - 700 Clients ServedMaxaudience - 700 Clients Served
Maxaudience - 700 Clients Served
 
Example
ExampleExample
Example
 
How to select an engagement ring of premium quality
How to select an engagement ring of premium quality How to select an engagement ring of premium quality
How to select an engagement ring of premium quality
 
2580H membrane press machine
2580H membrane press machine2580H membrane press machine
2580H membrane press machine
 
Liber presentation-london2015
Liber presentation-london2015Liber presentation-london2015
Liber presentation-london2015
 
10 types infographics
10 types infographics10 types infographics
10 types infographics
 
Fundamentals of Film Finance
Fundamentals of Film FinanceFundamentals of Film Finance
Fundamentals of Film Finance
 

Similar to slides-LIBER2015

Egu2015 cornell don pico
Egu2015 cornell don picoEgu2015 cornell don pico
Egu2015 cornell don picoSarah Cornell
 
Open access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, librariesOpen access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, librariesIryna Kuchma
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18Dag Endresen
 
Paola De Castro. Critical introduction to scientific journals and the editori...
Paola De Castro. Critical introduction to scientific journals and the editori...Paola De Castro. Critical introduction to scientific journals and the editori...
Paola De Castro. Critical introduction to scientific journals and the editori...Paola De Castro
 
Open science in the context of transdisciplinary research
Open science in the context of transdisciplinary researchOpen science in the context of transdisciplinary research
Open science in the context of transdisciplinary researchYasuhisa Kondo
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesAmanda Whitmire
 
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)EarthCube
 
Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Martin Donnelly
 
Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Alex Hardisty
 
Webinar: An overview and explanation of the creation of the communication res...
Webinar: An overview and explanation of the creation of the communication res...Webinar: An overview and explanation of the creation of the communication res...
Webinar: An overview and explanation of the creation of the communication res...Global CCS Institute
 
VERSO: Ecosystem Responses in the Southern Ocean
VERSO: Ecosystem Responses in the Southern OceanVERSO: Ecosystem Responses in the Southern Ocean
VERSO: Ecosystem Responses in the Southern OceanBruno Danis
 
GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021Dag Endresen
 
Open Research and Archaeology
Open Research and ArchaeologyOpen Research and Archaeology
Open Research and ArchaeologyCrossref
 

Similar to slides-LIBER2015 (20)

Egu2015 cornell don pico
Egu2015 cornell don picoEgu2015 cornell don pico
Egu2015 cornell don pico
 
PEPRS: Recording The Extent Preserved
PEPRS: Recording The Extent PreservedPEPRS: Recording The Extent Preserved
PEPRS: Recording The Extent Preserved
 
Open access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, librariesOpen access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, libraries
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18
 
Paola De Castro. Critical introduction to scientific journals and the editori...
Paola De Castro. Critical introduction to scientific journals and the editori...Paola De Castro. Critical introduction to scientific journals and the editori...
Paola De Castro. Critical introduction to scientific journals and the editori...
 
Open science in the context of transdisciplinary research
Open science in the context of transdisciplinary researchOpen science in the context of transdisciplinary research
Open science in the context of transdisciplinary research
 
A.onlineassignment
A.onlineassignmentA.onlineassignment
A.onlineassignment
 
Visibility and internationalization USARB Through Institutional Repository
Visibility and internationalization USARB Through Institutional Repository Visibility and internationalization USARB Through Institutional Repository
Visibility and internationalization USARB Through Institutional Repository
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
EarthCube's OceanLink - Project Overview and Presentation Updates (March 2014)
 
Ngsp
NgspNgsp
Ngsp
 
Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?Open Access and Open Data: what do I need to know (and do)?
Open Access and Open Data: what do I need to know (and do)?
 
Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)Biodiversity Virtual e-Laboratory (BioVeL)
Biodiversity Virtual e-Laboratory (BioVeL)
 
Webinar: An overview and explanation of the creation of the communication res...
Webinar: An overview and explanation of the creation of the communication res...Webinar: An overview and explanation of the creation of the communication res...
Webinar: An overview and explanation of the creation of the communication res...
 
VERSO: Ecosystem Responses in the Southern Ocean
VERSO: Ecosystem Responses in the Southern OceanVERSO: Ecosystem Responses in the Southern Ocean
VERSO: Ecosystem Responses in the Southern Ocean
 
GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021
 
Open Research and Archaeology
Open Research and ArchaeologyOpen Research and Archaeology
Open Research and Archaeology
 

Recently uploaded

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 

slides-LIBER2015

  • 1. Knowledge  Discovery  in     Climate  Change  Domain   Pinar  Öztürk,  Erwin  Marsi   Norwegian  University  of  Science  and  Technology   (NTNU),  Norway     Natalia  Manola     University  of  Athens,  Greece   LIBER  Conference  2015,   ÖztürkMarsiManola   1  
  • 2. Outline   •  IntroducQon  of  Ocean-­‐Certain  (OC)    EU-­‐  project   •  Knowledge  Discovery  in  OC   •  Decisions  underlying  OC’s  Knowledge  Discovery  system   –  Type  of  knowledge  to  focus  on   –  Corpus   –  Text  mining  subtasks   –  Technology/tool   –  External  sources   •  Some  results  (and  examples)  so  far   •  Conclusions   LIBER  Conference  2015,   ÖztürkMarsiManola   2  
  • 3. EU  Project  “Ocean  Certain”   •  Title:  Ocean  Food  web  Patrol  –  Climate  Effects:  Reducing   Targeted  UncertainQes  with  an  InteracQve  Network   •  Work  programme  topic  :  F7-­‐  ENV.2013.6.1-­‐1   •  EU  funding:  7.1  Mill  Euro   –  Our  workpackage  :  40  man/month   –  3  years,  with  start  Nov.  2014   List of participants: Partner no. * Participant organisation name Country 1 (Coordinator) Norwegian University of Science and Technology Norway 2 University of Bergen Norway 3 GEOMAR Helmholtz Centre for Ocean Research Kiel Germany 4 Vlaamse Instelling voor Technologisch Onderzoek Belgium 5 DEU-IMST Turkey 6 University of Gothenburg Sweden 7 Griffith University Australia 8 Universidad Austral de Chile Chile 9 National Research Council and Institute of Marine Sciences Italy 10 Centre for Environment, Fisheries & Aquaculture Science UK 11 World Ocean Council UK 12 Universidad de Concepción Chile LIBER  Conference  2015,   ÖztürkMarsiManola   3  
  • 4. Overarching  goal  of  Ocean-­‐Certain   •  IdenQfying  the  interacQons  (impacts  and  feedbacks)   between  the  climate  related  oceanic  processes,  food   web  and  biological  pump   •  Determining  qualitaQve  and  quanQtaQve  changes  in  the   funcQonaliQes  of  the  “food  web”  and  esQmaQng  the   efficiency  of  the  “biological  pump”  in  exporQng  carbon  to   deep  sea   LIBER  Conference  2015,   ÖztürkMarsiManola   4  
  • 5. Diatom Autotrophic flagellates Heterotrophic bacteria Meso zooplankton Heterotrophic flagellates Ciliates B-DOC Refractory DOC Biological Pump Macro & micro nutrients CO2   Slow   Fast   Figure  from  Thingstad et al. (2008)   Climate  change  domain   LIBER  Conference  2015,   ÖztürkMarsiManola   5  
  • 6. Main  obstacle  of  scienQfic  discovery     •  is  oien  not  lack  of  scienQfic  research  and  reporQng  of   these  –  i.e.,  not  knowledge   •  is  the  lack  of  ability  of  linking  various  disciplines  and   making  sense  out  of  the  accumulated/documented   knowledge    across  disciplines  –  i.e.,  inferring  new   knowledge  from  the  exisQng  knowledge   LIBER  Conference  2015,   ÖztürkMarsiManola   6  
  • 7. Why  knowledge  linking  is  challenging   •  Vast  amount  of  literature     and  growing  –  info  overload   •  Increased  specialisaQon   •  Isolated  research  communiQes     and  literatures  –  research  silos   •  Different  convenQons  and   terminology       StraT:  StraQficaQon   DOC:  Dissolved  organic  carbon   EP:  Carbon  Export  (synonym:  biological  pump)   MOC:  Meridional  Overturning  current   NAO:  North  AtlanQc  oscilliaQon   LIBER  Conference  2015,   ÖztürkMarsiManola       MOC   NAO   DOC   EP   StraT     Biogeochemistry   Marine   ecosystems   Physical   Oceanography   7  
  • 8. ComputaQonal  support  for  handling  scienQfic   text   •  Support  the  user  in  various  ways   -  Search     -  QuesQon-­‐answering   -  CitaQon  analysis     -  Trend  discovery   -  Hypothesis  generaQon  –  literature-­‐based  knowledge  discovery     LIBER  Conference  2015,   ÖztürkMarsiManola   8  
  • 9. Search   •  Literature  search  works  reasonably  well   –  ScienceDirect,  Google  Scholar,  Medline/PubMed,  ...     •  However,  keyword  search  only  returns  arQcles   –  Who  has  Qme  to  sii  through  hundreds/thousands  of  abstracts   or  full  papers?     LIBER  Conference  2015,   ÖztürkMarsiManola   9  
  • 10.  Hypothesis  generaQon   •  Two  main  cogniQve  tasks   –  IdenQfying  important  knowledge  pieces   –  Inferring  new  knowledge  from  these  pieces     •  Focus  in  this  presentaQon:  IdenQficaQon  of  knowledge  pieces   in  scienQfic  papers     •  ComputaQonal  method:     –  Literature-­‐based  knowledge  discovery  (LBKD)   LIBER  Conference  2015,   ÖztürkMarsiManola   10  
  • 11. LBKD  history  –  Swanson  example       A   B   C   1.  RelaQon  of  spreading  depression  to  the  visual   scotomata  of  classical  migraine     2.  Magnesium  in  the  extracellular  cerebral  fluid  can   prevent  or  terminate  spreading  depression     3.  INFER:      migraine  ß-­‐àmagnesium  deficiency     Inference:   A  influences  B   B  influences  C   Hence  A  influences  C   (From  Wikipedia)   LIBER  Conference  2015,   ÖztürkMarsiManola   11  
  • 12. LIBER  Conference  2015,   ÖztürkMarsiManola   Example:  Hypothesis  GeneraQon  in  OC         ↑iron  →  ↓CO2   ↑iron  →  ↑phytoplankton    ↑phytoplankton  →  ↑photosynthesis   ↑photosynthesis  →  ↓CO2     Iden7fy  important   knowledge  pieces   Infer  new  knowledge  from   these  pieces     12  
  • 13. Text  mining  for  extracQon  of  knowledge  pieces   •  Text  mining  deals  with  idenQficaQon  and  extracQon  of   phrases/sentences  of  interest     •  Techniques  :  natural  language  processing,  informaQon   retrieval,  machine  learning,  informaQon  extracQon,   various  staQsQcs-­‐based  techniques   LIBER  Conference  2015,   ÖztürkMarsiManola   13  
  • 14. LIBER  Conference  2015,   ÖztürkMarsiManola   Design  of    a  text  mining  system  for  OC  -­‐1   Process   Change  events   Biological  Pump     Variables   RelaQonships  between              events   Increase/decrease   /change   pH,  temperature   Chemical   compounds   Biological  species   Causal/correlaQonal   ↑iron    ↑phytoplankton     ↑iron  →  ↑phytoplankton   “Gran  (1933)  was  among  the  first  to   demonstrate  that  the  addi7on  of  iron  to   seawater  may  s7mulate  the  growth  of   phytoplankton.”   …..that  the  addi7on  of   iron  to  seawater  may   s-mulate  the  growth  of   phytoplankton.”   1. Decide  what  type  of  knowledge  to  aFend  to   14  
  • 15. LIBER  Conference  2015,   ÖztürkMarsiManola   Change    events   “Gran  (1933)  was  among  the  first  to   demonstrate  that  the  addi7on  of  iron  to   seawater  may  s7mulate  the  growth  of   phytoplankton.”   •  ↑iron     •   ↑phytoplankton   15  
  • 16. Event  expressions  in  natural  language   •  Same  event  may  be  expressed  in  various  ways  in  natural  language   •  E.g.,  “increase”:   –  “Rise  in  atmospheric  CO2  levels…”   –  “…addiQon  of  iron…”   –  “elevated  value  of    …..”       •  E.g.,  “decrease”:   –  “…to  slow  down  calcificaQon  in  corals..”   –  “decreasing  temperature…”   –  “…reduced  pH    value…”     LIBER  Conference  2015,   ÖztürkMarsiManola   16  
  • 17. Design  of    a  text  mining  system  for  OC-­‐  2   2.  Design  and  construct  Corpus   –  Decide  which  disciplines,  publishers,  journals   .  Ensure  sufficient  coverage,  i.e  number  and  variety  of   publicaQons   .  Currently  10  K  papers  from  Nature   .  Problems  with  open  access  –  text  mining  &sharing  rights     3.  Determine  text  mining  subtask(s):   –  Event  extracQon   –  Causal/CorrelaQonal  relaQonships  between  events   –  Recognizing  EnQty  menQons   –  Linking  to  ontologies  and  generalizaQon  of  terms  LIBER  Conference  2015,   ÖztürkMarsiManola   17  
  • 18. Design  of    a  text  mining  system  for  OC  -­‐  3   4.  IdenQfy  the  tools  to  be  used     5.  Decide  the  external  sources  to  be  used   LIBER  Conference  2015,   ÖztürkMarsiManola   18  
  • 19. Design  of    a  text  mining  system  for  Climate   Change  domain  -­‐  4   •  Preliminary  yet     •  Our  Strategy:  try  to  map  if/which  of  the  exisQng  tools,  methods,   and  external  sources  developed  for  other  domains  (e.g.,   biomedicine,  news  text,  digital  heritage  etc)  are  relevant   •  Tools:   –  NLP  tools  &  AnnotaQon  tools,  e.g.,  Stanford’s  NLP,  GATE,   Brat  annotaQon  tool   •  External  resources   –  Controlled  vocabularies,  terminologies,  thesauruses,   ontologies,  data  bases   –  Examples:  dbpedia,  Wiki,  WordNet,  Chebi,  Oscar,   ChemSpot,,  linnaeus2    LIBER  Conference  2015,   ÖztürkMarsiManola   19  
  • 20. Example:  IdenQfy  tools  &external  resources  for   named-­‐enQty-­‐recogniQon  (NER)  in  Ocean-­‐Certain   •  Named  enQQes  are  the  enQQes  of  interest   –  Examples  in    news  text:  people  names,  organisaQons,  places   –  Examples  in  Ocean-­‐certain:  chemical  compounds,  biological  species,   locaQons   •  A  lot  of  NER  systems  but  mostly  built  for  other  domains  (e.g,   news,  humaniQes  or  biomedicine)           •  Check  whether/which  exisQng  NER  systems  can  be  used  for   processing  papers  in  the  climate  change  domain   •  In  parQcular,  we  are  evaluaQng  :   –  CoreNLP  (for  geographical  locaQons)       –  Linnaeus2      (species)   –  Oscar3  (chemical  compounds)   LIBER  Conference  2015,   ÖztürkMarsiManola   20  
  • 21. EvaluaQon  of  exisQng  NER  tools   AnnotaQon  using  Brat   Selected  abstract-­‐corpus   Manually  tagged  corpus   NER  system  (e.g.  Oscar)   Test  abstract   System-­‐tagged  abstract   EvaluaQon  algorithm   Judgment  of  appropriateness  of  the  NER  system  to  CC  domain   LIBER  Conference  2015,   ÖztürkMarsiManola   21  
  • 22. From  Sean   LIBER  Conference  2015,   ÖztürkMarsiManola   22  This  slide  is  prepared  by  Sean  Holloway  –  MSc  student(NTNU),  2015   NER  candidates  and  the  external  resources  
  • 23. NER  System  Results   Oscar3   ChemSpot   Linnaeus2   SPECIES   OrganismTagger   IllinoisNE   CoreNLP   OpenNLP   CC  corpus   (abstracts)   species   chemical  substance   loca-on   23   EvaluaQon  experiments  are  run  by  Sean  Holloway  –  MSc  student,  2015   LIBER  Conference  2015,   ÖztürkMarsiManola  
  • 24. Sharing  extended  resources?   •  Preprocessed  scienQfic  papers  in  machine  readable  format   –  10  K  full  papers  from  Nature  but  we  cannot  share  them   •  Annotated  papers   –  Two  types  of  annotaQons     •  For  EnQty  recogniQon   •  For  relaQon  and  event  recogniQon   •  Currently  crawling    open  access  (PLOS  first)    publicaQons-­‐   aiming  to  prepare  and  share  a  large  volume  corpus  for  CC   domain   LIBER  Conference  2015,   ÖztürkMarsiManola   24  
  • 25. Annotated  gold  standard  –  not  shared     LIBER  Conference  2015,   ÖztürkMarsiManola   25  
  • 26. Summary   •  Text  mining  as  a  support  to  scienQfic  discovery   –  The  preliminary  results  promising  for  extracQon  of     •  enQQes/variables,     •  change  events  and     •  relaQons  between  events     LIBER  Conference  2015,   ÖztürkMarsiManola   26  
  • 27. Conclusion   •  Some  of  the  exisQng  tools  (general  and  specific  to  other  domains)   may  be  useful     •  However,  we  need  to  adapt  and  extend  these  for  the  CC  domain       •  Corpus  is  an  important  problem   –  We  cannot  share  the  preprocessed  and  annotated  corpus  we   create   –  We  would  not  possible  use  others’  resources  because  of  the   same  reasons   –  RepeQQon  of  task    (inefficient  use  of  money  and  Qme)   –  Slows  down  our  own  work  as  well  as  the  knowledge  discovery   research  in  CC  domain,  because  of   LIBER  Conference  2015,   ÖztürkMarsiManola   27  
  • 28. Future  work   •  PreparaQon  of  a  corpus  for  the  CC  domain  –  larger  volume  and  sharable   •  Currently  working  on  automated  crawling&preprocessing  that  fits  to   variaQons  in  various  publishers   •  We  need  more  annotaQon,  meaning  more  people,  more  funding   •  Planning  to  apply  EU    and  Norwegian  Research  council  for  funding   •  Organizing  a  workshop  (in  connecQon  with  the  OC  project)  to  gather   people  working  in  text  mining  in  Earth  science   •  Want/need  collaboraQon  with  other  people/universiQes   •  The  work  presented  here  is  partly  reported  in  :   Marsi,  Erwin;  Özturk,  Pinar;  Aamot,  Elias;  Sizov,  Gleb  Valerjevich;  Ardelan,  Murat   Van.  (2014)  Towards  Text  Mining  in  Climate  Science:  ExtracQon  of  QuanQtaQve   Variables  and  their  RelaQons.  Proceedings  of  the  Ninth  Interna7onal  Conference  on   Language  Resources  and  Evalua7on  (LREC'14).   LIBER  Conference  2015,   ÖztürkMarsiManola   28  
  • 29. Demo   •  hvp://www.idi.ntnu.no/~emarsi/ocwp1/ chavarex   LIBER  Conference  2015,   ÖztürkMarsiManola   29