Presentation on a project run in my Chemical Information Science course. Valuable referenced chemical data from 'reliable' static webpages was 'scraped', cleaned, and added to a database for search.
2. Motivation
Chemical Information Science: The Course
Syllabus
Final Project Outline
Expected Student Activities
Submitted Data Sets
Example Data
Data Modeling
Compound Data
Website
Future Plans
Conclusion
Outline
From http://www.embl.de/chemcore/chemcore_services/computational_chemistry/
3. Motivation
Students are not exposed to informatics in the
regular chemistry curriculum
There is so much information for the chemist to
access/use they need to know how to deal with it
Giving students this exposure makes them more
competitive in graduate/professional school
We need professionals at the interface of chemistry
and information science
4. Chemical Information Science:
A UNF Elective Class
First taught as a Freshmen Honors course in 2003
“Chemical Informatics”
Five iterations over the last 10 years
Now an upper-level three credit elective class
Fall 2013 cohort – 21 students (three credit lecture)
“Chemical Information Science”
5. Syllabus
What is information? What is data?
What is metadata? What types of data are there?
How and where does informatics fit in chemistry?
How is information organized, stored, related,
formatted, typed?
The objected oriented view of information
(objects, classes, methods)
The Semantic Web – What is it why is it important?
Defining relationships between data, Concept maps
Controlled vocabularies, Thesauri, Ontologies
6. Syllabus
The eXtensible Markup Language (XML) and
Scientific Markup Languages
Understanding and using Web 2.0 technologies
for information retrieval
Generating Information and Metadata
Finding Chemical Information
Tools for Finding, Organizing and Using Chemical Information
Searching databases
Internet/browser software for Chemistry
Using Excel for searching and organizing scientific information
7. Final Project Outline
The ChemData Database
For your project you will gather chemical data from sources on
the Internet, organize/filter the data, added it too the Excel
spreadsheet provided, and then send your completed Excel
spreadsheet to Dr. Chalk by the deadline.
Requirements
600 pieces of metadata at minimum must be submitted
(excluding reference data)
The data must be correctly entered in the spreadsheet
(no extra spaces, loss of accuracy, etc.)
It must be referenced to its origin, and those reference
included in the spreadsheet
For chemicals, the InChI must be part of the submitted
metadata for each chemical species
A minimum of six hours of time for this activity is expected
The Excel Spreadsheet to use is available on the course website.
8. Find suitable data source (hand coded web page) on
‘reputable’ site with original reference
Download webpage content to computer
‘Scrape’ data out of webpage
Perform any data normalization (e.g. scientific notation)
Get metadata about chemicals referenced
Get metadata about original reference (DOI)
Import data into Excel and organize
Assign unique ids and add ids to link data
Add units and other metadata
Expected Student Activities
10. Submitted Data Sets
They choose to submit data about
Organic compound properties
Organic compound reactions
Solvent properties
Types of analytical instrumentation
Analytical instrument operating conditions
Mathematical equations used in PChem
Physical constants
Unit conversion factors
15. Very positive
“Course was an informative and enjoyable overview of the emerging field of
informatics as it relates to the sciences and Chemistry in particular.”
“What I am taking away from this class is something that can be applied to
other courses and my career. Interesting peek behind the curtains of how
the sharing of scientific knowledge and discovery are evolving.”
“Dr. Chalk was very enthusiastic about the subject of chemical informatics.
He exposed us to some very helpful chemistry resources that I plan on using
in the future.”
“Very interesting class with a lot of hands on computer use and learning
experience. The homework was relative to the course information and
helped to prepare for exams. Would retake this class and recommend to a
friend interested in data or computer science.”
Feedback
16. Future Plans
Finish curating, cleaning up data
Make site publically available
For students: provide detailed instructions on how to
find, curate, and submit their own data
For faculty: provide detailed description of the project
and Excel spreadsheet
Write up a paper about this for J. Chem. Ed.
Use site as the basis for a question bank for online
study questions
17. Conclusion
This was a fun project to run at the end of the class
Bringing together all that we had talked about in an
activity made it much more tangible for students
Students liked the idea that the Chem Data website
would be used by other students in chemistry
I can’t wait to teach this again…