3. CheTA: issues
Human annotations, especially in
Chemistry, are
• Accurate but labour-intensive
• Very useful but expensive
• Great benchmarks but time-
consuming
7. CheTA: significance
Automate metadata generation for
chemistry using text-mining tools
Add chemistry to the largest
repository of text-mining tools (U-
Compare)
8. CheTA: significance
Automate metadata generation for
chemistry using text-mining tools
Add chemistry to the largest
repository of text-mining tools (U-
Compare)
9. CheTA: significance
Automate metadata generation for
chemistry using text-mining tools
Add chemistry to the largest
repository of text-mining tools (U-
Compare)
Design chemistry specific workflows
which are useful to all scientists in
general.
10. CheTA: audience
Annotators working on chemical
documents
Search engines wanting to incorporate
new data into their indices
Scientific community leaning towards
using text-mining tools
People responsible towards building
corpora and evaluation standards
12. CheTa: currently…
Explore chemistry-text mining relation
using OSCAR
Examine new and efficient machine-
learning techniques
Design effective evaluation framework
that can port to other domains as well
14. CheTA: currently …
Use U-Compare to
Run applications
User interface to compare
different annotations
Add chemistry to its repositories
Implement the evaluation
framework
A typical view of annotation in Chemistry. Identify each entity, markup requisite labels and references.
It is quite tedious and requires expert knowledge for efficient annotation.
Why do we need these annotations? Because they are accurate! A human expert-knowledge is likely to very accurate.
It is useful for automation such as training mathematical models for automatic named entity recognition.
Once automation has been achieved, these human annotation can serve as great benchmarks!
Combine text, process and visualisation
Cheta aims to automate annotation
Make chemistry accessible to other realms of Science
Design mechanisms so that scientists can use various workflows without the need for programming acumen
OSCAR reads a list of chemistry documents and returns a list of chemical entities such as chemical compounds, reaction, chemical adjectives etc.
Napolean’s march graph of OSCAR. Describes how a document containing 4500 words is trimmed to a list of 467 entities.
Tailor OSCAR into U-Compare components; design workflows. Add a user interface to compare several NERs.