Your SlideShare is downloading. ×
  • Like
Chemical Information Bulletin Vol. 62(3) Fall 2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Chemical Information Bulletin Vol. 62(3) Fall 2010

  • 2,126 views
Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,126
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
10
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Chemical Information Bulletin Vol. 62(3) Fall 2010 1 Chemical Information Bulletin A Publication of the Division of Chemical Information of the ACS Volume 62 No. 3 (Fall) 2010 Dr. Svetla Baykoucheva Editor White Memorial Chemistry Library 1526 Chemistry Building, University of Maryland College Park, Maryland 20742 USA sbaykouc@umd.edu IN THIS ISSUE Message fromthe Chair 2 Letter from the Editor 3 CINF Sponsors 4 The Merck Index: Interview with Maryadele O’Neil 5 CINF Division Meetings and Social Events 10 Technical Program Highlights 11 Technical Program Schedule (Short) 12 Technical Program Schedule (Full) 13 Awards and Scholarships 19 Future ACS Meetings 22 Book Reviews 23 Abstracts 24 CINF 2010 Officers 55 CINF 2010 EXECUTIVE COMMITTEE ISSN: 0364-1910 Chemical Information Bulletin,  Copyright 2010 by the Division of Chemical Information of the American Chemical Society Photo on cover by Svetla Baykoucheva Chair: Chair Elect: Past Chair: Secretary: Treasurer: Program Chair: Membership Chair: Councilors: Alternate Councilors: Ms. Carmen Nitsche Dr. Gregory M. Banik Ms. Svetlana Korolev Ms. Leah R Solla (2010-2011) Ms. Meghan Lafferty (2009-2010) Dr. Rajarshi Guha (2009-2010) Ms. Jan Carver (2009-2011) Ms. Bonnie Lawlor (2010-2012) Ms. Andrea Twiss-Brooks (2009-2011) Dr. Guenter Grethe (2010-2012) Mr. Charles F Huber (2009-2011)
  • 2. Chemical Information Bulletin Vol. 62(3) Fall 2010 2 MESSAGE FROM THE CHAIR Dear Colleagues, The Fall 2010 ACS National Meeting is just around the corner. In this issue you will find all the information you need to organize your schedule in Boston. Early Saturday morning we begin our Division business meetings with long range planning followed by committee and executive meetings. On Sunday the CINF and COMP Divisions invite you to the Joint CINF/COMP Welcoming Reception and CINF Scholarship for Scientific Excellence Posters. ACS Publications is generously sponsoring the event, in recognition of the 50th Carmen Nitsche anniversary of the Journal of Chemical Information and Modeling. The Program Committee has assembled a terrific technical program for us. We will be awarding a Best Presentation award once again at this meeting, thanks to an ACS Innovation Grant. Papers presented at the Data Intensive Drug Discovery session, organized by John Van Drie on Sunday afternoon will be eligible, and the author of the winning paper will receive the award at the CINF Luncheon on Tuesday. We also are trying an experiment on Monday afternoon. The CINFlash session offers speakers a chance to present truly new research and results. Tuesday will be dedicated to the Herman Skolnik awardee, our esteemed colleague Professor Anton Hopfinger. The all-day session is entitled “The Marriage, or at least Dating, of Molecular Simulation and Modeling with QSAR Analysis.” The celebration will be capped off with the Evening Reception at the Seaport Hotel. Make sure you reserve your seat at the Tuesday CINF Luncheon (order your tickets when you register, or see me at the meeting). We have lined up the journalist and author Mike Capuzzo as our luncheon speaker. His first book, Close to Shore, was a non-fictional account of the first US shark attacks off the shores of New Jersey. He will be speaking about his latest book, The Murder Room: The Heirs of Sherlock Holmes Gather to Solve the World's Most Perplexing Cold Cases, due out August 10th . In other Division news, I am pleased to report that CINF was granted a $5,000 innovation grant to work on web design of the eCIB (thanks to Bill Town for preparing the submission). This will help us as we consider the redesign of the entire CINF web presence. I can also report that we have submitted a grant proposal to support remote attendance/participation for our programming. To be more specific, we are requesting funds to allow live blogging and tweeting during our lightning sessions. Find out more about our current and future divisional activities by joining the CINF business meetings on Saturday. I see that 37 CINF Division members have joined the CINF group on the ACS Network. I encourage all of you to sign up before the meeting, so that we can use the network as our primary division communication mechanism as well as reduce our web infrastructure costs. All committee reports will be posted there in the Documents area in the next few weeks. See you in Boston! Carmen Nitsche Chair, ACS Division of Chemical Information Carmen.Nitsche@symyx.com
  • 3. Chemical Information Bulletin Vol. 62(3) Fall 2010 3 LETTER FROM THE EDITOR Svetla Baykoucheva This issue of the Chemical Information Bulletin (CIB) is the third one we are publishing online. The first (Spring) issue was posted on the CINF web page (www.acscinf.org) before the ACS Spring National Meeting. It contained the technical program, the abstracts, awards information, book reviews, and three interviews. The second (Summer) issue, which is the equivalent of the former CINF eNews newsletter, was edited by Svetlana Korolev and contained information about the ACS Spring National Meeting. As you will notice, CIB is still in a period of transition from print to online format — the functionality of the product is not what we are used to seeing from the commercial publishers. I hope that for the next issues the process will become smoother and easier. Here in this issue you will find the usual categories — a Message from the Chair, Awards and Scholarships, Book Reviews, Technical and Social programs, Abstracts, and a list of the CINF functionaries. Rajarshi Guha, chair of Technical Program Committee highlights the technical program for the upcoming ACS Fall National Meeting (p. 11). Maryadele O’Neil, editor-in-chief of The Merck Index, discusses in an interview (p. 5) the history of this famous resource and what it takes to publish each new edition. The two book reviews (p. 23) were written by Bob Buntrock: (1) Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics, by Nicola De Bellis and (2) Patents for Chemicals,Pharmaceuticals, and Biotechnology, by Grubb & Thomsen. On p. 4 you will be able to see who the CINF sponsors are. As always, we highly appreciate their financial support for the Division. In the near future, the CINF Executive Committee will be looking for my replacement, and those who are interested and believe that they are qualified for this job should consider this exciting opportunity. Being the editor of CIB for five years has allowed me to become acquainted with many interesting people. The editor’s position is a voluntary one, and I have been greatly helped by many colleagues from the Division who submitted materials and proofread the issues. The journal is now published online, and it will be much easier for the next editor to put together the issues — there will be no strict deadlines, as there will be no need to send it to the printer. I hope you will find this issue useful and enjoyable. Svetla Baykoucheva,Editor sbaykouc@umd.edu
  • 4. Chemical Information Bulletin Vol. 62(3) Fall 2010 4 CINF Sponsors Fall 2010 August 2010 The American Chemical Society Division of Chemical Information (CINF) is very fortunate to receive generous financial support from our sponsors to maintain the high quality of the Division’s programming, to promote communication between members at social functions at the ACS Fall 2010 National Meeting in Boston, and to support otherdivisional activities during the year, including scholarships to graduate students in Chemical Information. The Division gratefully acknowledges the contributions from the following sponsors: Platinum ACS Publications FIZ CHEMIE Berlin Gold Elsevierreaxys® Procter & Gamble Silver Bio-Rad Laboratories Bronze CambridgeSoft InfoChem RSC Publishing Thieme Publishers Opportunities are available to sponsorDivision of Chemical Information events,speakers, and materials. Our sponsors are acknowledged on the CINF web site, in the Chemical Information Bulletin,on printed meeting materials, and at any events for which we use their contribution.Please feel free to contact me if you would like more information about supporting the CINF. The ACS CINF Division is a non-profit tax-exempt organization with taxpayer ID no. 52-6054220. Graham Douglas Chair, Fundraising Committee Email: Fundraising@acscinf.org Tel: 510-407-0769 Please Join Us at These CINF Events! The ACS Division of Chemical Information (CINF) is pleased to host the following social networking events at the Fall 2010 ACS National Meeting in Boston, Massachusetts. CINF/COMP Welcoming Reception & CINF Scholarship for Scientific Excellence Posters & Awards Celebrating the 50th Anniversary of the Journal of Chemical Information & Modeling 6:30-8:30 pm: Sunday, August 22 (Harbor Ballroom I, Westin Boston Waterfront) Reception sponsored exclusively by ACS Publications Scholarships for Scientific Excellence sponsored by FIZ CHEMIE Berlin Harry's Party 5:30-8:00 pm: Monday,August 23 (Presidential Suite Westin Boston Waterfront) Sponsored exclusively by FIZ CHEMIE Berlin CINF Tuesday Luncheon (Ticketed Event); Speaker: Mike Capuzzo, author of “The Murder Room” 12:00-1:30 pm: Tuesday August 24, (Boston Convention and Exhibit Center 162A) Sponsored by Bio-Rad Laboratories, CambridgeSoft and Thieme CINF Herman Skolnik Award Reception honoring Dr. Anton Hopfinger 6:30-8:30 pm: Tuesday,August 23 (Seaport Boston Hotel, Plaza Ballroom C) Sponsored by ElsevierReaxys® & Procter & Gamble with InfoChem & RSC Publishing Division of Chemical Information American Chemical Society
  • 5. Chemical Information Bulletin Vol. 62(3) Fall 2010 5 The Merck Index, an Encyclopedia of Chemicals and Natural Products: Interview with Maryadele O’Neil By Svetla Baykoucheva Maryadele O’Neil is the Senior Editor of The Merck Index and the Director of Scientific Nomenclature Services. She obtained her Master’s of Library Science from Rutgers University and her BA in Bacteriology (minor in Chemistry) from Douglass College at the same university. After working as a lab-bench researcher for several years, she became involved with The Merck Index. Now she directs all aspects of researching, writing, and publishing of this famous book and is responsible for its content and editorial style, as well as for the authorized product nomenclature and the development of official non- proprietary names for all Merck products. Ms O’Neil has founded the Women in Chemistry Scholarship Program to encourage women to pursue PhDs in medicinal or synthetic organic chemistry. This program has provided $5,000 scholarships, plus travel stipends for winners, to present their research at American Chemical Society national meeting s. She has started different initiatives to provide network ing opportunities for scholarship winners with their colleagues at the ACS. Maryadele O’Neil SB: The Merck Index has been an icon for chemists for decades, and it has served them very well. Could you tell our readers a little bit of its history and the people who have contributed to it? MO: The first edition, known as Merck’s Index, was published in 1889 by the German chemical company, E. Merck, in order to communicate with customers in the United States. The company’s American subsidiary was established 2 years later as Merck & Company. While the first edition was little more than a sales catalog, the second edition, published as Merck’s 1896 Index, grew in scope to include important medicines from the US Pharmacopeia and the National Formulary as well as common laboratory and manufacturing chemicals. Already, The Index was more of a reference handbook than a price list and included physical properties and medicinal uses for the compounds. Because of the popular demand, a Third Edition was published in 1907. Chemists were now an important part of the readership, and additional physical properties and line formulae were added for their use. Ties between the German and American companies were severed by the impact of World War I, but Merck & Co. continued to publish The Merck Index. An ever-growing number of compounds would be added in succeeding editions to increase the utility of the information for researchers. Still today, The Index is published on a not-for- profit basis as a service to the scientific community. The material in The Index has been written over the last 120+ years by generations of Merck scientists. No editors were specifically named until Paul G. Stecher was appointed as such for the Seventh Edition in 1960. Martha Windholz, also a Merck chemist, was named Editor for the Ninth Edition, followed by Susan Budavari, from whom I took over in 1999. The scientists of Merck Research Laboratories have always been very generous with their expertise and advice, and I am most grateful to them for their continued support.
  • 6. Chemical Information Bulletin Vol. 62(3) Fall 2010 6 SB: As the editor of The Merck Index, what do your responsibilities include? MO: Most importantly, I author a significant amount of the material in the monograph section and tables. I am responsible for the selection of new compounds to be included as well as monographs that are retired from each edition. It is very difficult to select the compounds that will no longer appear in print, but it is necessary to maintain the single volume size. The retired monographs are still available in the online editions, although the information is no longer updated. If you look through previous print editions of The Index, you will notice that the contents are really an encapsulation of what was happening in science during a particular period of time. I so much enjoy reading and scanning journals and news articles for the latest developments, and I have been privileged to read and write about a number of significant scientific advancements. Perhaps the most enjoyable aspect of my role as editor is the opportunity to interact directly with those who use The Index in their daily work. Their specialties span a wide range of scientific disciplines and include researchers in the lab, emergency first responders, information professionals, educators, and, of course, students. It is most gratifying to speak with these individuals to learn how The Index has contributed to their ability to do their work. Many of them have great suggestions for us that they are willing to share. We try to incorporate their ideas into our workflow to continually improve both the information we provide and the way we deliver it. I and some of my team regularly attend the ACS National Meetings and would welcome everyone to stop by at our booth in the exhibition center. The title page of the First Edition of The Merck Index SB: With the many new resources available online, what is the niche that The Merck Index occupies nowand which is its audience today? What is unique about it? Why would people use it rather than go to other resources? MO: There is an overwhelming amount of information available now through the internet and some of it is very good. However, it is often difficult to find authoritative information while weeding through thousands of leads in your search engine of choice. Scientists have come to rely on the accuracy of the physical properties and chemical structures presented in The Index. It’s also viewed as a key to the literature — a first stop for an overview of a compound’s use or its significance. The Merck Index centralizes a lot of information that is critical to many different research disciplines. Approximately half of the monographs cover human and veterinary drugs, traditional and herbal medicines, and diagnostic aids. The remainder includes standard lab reagents, agricultural and commercial chemicals, plants and natural products, and compounds of environmental significance. We try to select the most important compounds across this broad sweep of science to retain the ready reference characteristic.
  • 7. Chemical Information Bulletin Vol. 62(3) Fall 2010 7 In our user survey conducted last year, lab scientists surprisingly told us that they still prefer the printed handbook to keep nearby them at the bench. The encyclopedic organization of the data makes it easy for them to look up a particular compound and find all of the information together in a single monograph. SB: What is the publication process for the manual? How are new data added and how are these data verified? MO: All of the data included in The Index come from published sources, usually peer-reviewed journal articles that are cited within the monographs. We evaluate multiple sources before selecting the best values to report. If it is not possible to choose between varying data, both values are reported. Citations are also given so that our users may read the experimental details on their own. The online versions now include digital object identifiers (DOIs) and PubMed IDs for many of the cited articles to make it even easier to review the original sources. After a new compound has been selected for inclusion, the writer of the monograph performs a search of the scientific literature to immerse him or herself in the topic. Chemical names, trademarks, and generic names are gathered and verified as well as registry numbers and drug codes, if applicable. The chemical structure is drawn to TMI specifications and physical properties are gleaned as described above. Finally, literature citations, such as syntheses and analytical methods, are selected based on the information they provide and their applicability to our end users’ needs. As early as 1984, The Index has been available as an online database. We knew very early on that a publication cycle of 5-6 years had become too long for many of our readers to wait for new material. The online versions are updated twice per year with new and revised monographs. To accommodate this rapid cycle time, our manuscript is always publication-ready. Each monograph is written or updated and released into the manuscript as a completed entity. New monographs are appended in the order of completion, which is why they are not in alphabetical order in the electronic editions. Database extracts are prepared by our technical staff and exported to our online partners to be processed and published in their format. Publishing the print edition is somewhat more complicated. After the retired material is removed, monographs are re-alphabetized and re-numbered. The various indices for the printed edition are automatically created by the attributes assigned to specific data elements, such as registry number, formula, and synonyms. Extracts for each section and index are prepared and sent to the typesetter to produce the pages according to our specifications. Once the page proofs are reviewed to ensure accuracy, the work of the editorial staff is essentially done. Our good colleagues in the Merck Publishing Group take over and handle the printing and production. They also are responsible for the sales, marketing, and distribution of the new books. SB: Who are the people who are involved with The Merck Index now? MO: You may be surprised that the editorial staff of The Index is quite small. There are only 4 editors who write original content, each with a specific area of expertise. For example, Dr. Peter Dobbelaar is an experienced synthetic chemist who prepares the monographs on chemical reagents and is responsible for the Organic Name Reactions (ONR) section. We are fortunate to be able to call upon the expertise of chemists from outside of Merck as well. For the upcoming 15th edition, Dr. David MacMillan has been of great assistance in reviewing the ONRs. In previous editions, we have been privileged to work with Dr. David Evans and Dr. Barry Trost. Senior Associate Editor Patricia Heckelman has an advanced degree in toxicology. In addition to writing, she is responsible for all of the operational activities, manages the interactions with the typesetter and co-publishers, and ensures the accuracy of each of the electronic editions. Assistant Editor Kristin Roman utilizes her Masters Degree in Biotechnology to author new material pertaining to this field. Rounding out the team, we have excellent support from the senior editorial assistants, Catherine Kenny
  • 8. Chemical Information Bulletin Vol. 62(3) Fall 2010 8 and Edwin Enraca, and the technical specialist, Linda Karaffa. We also have assistance from Merck’s nomenclature expert, Margaret Hill, who is always on-call to answer naming and structure questions. The Merck Index is part of a family of publications that include The Merck Manual of Diagnosis and Therapy, The Merck Veterinary Manual, and the Home Health and Pet Health editions. As publisher, Gary Zelko brings his years of experience in print publications to oversee the production and marketing of the entire set of handbooks. I would be remiss if I did not also mention my manager, Matthew Cahill, who also brings valuable expertise from the publishing world. SB: You have been passionate about creating opportunities for women to get involved and advance in science. What are the main issues women scientists are facing now? MO: I do believe that it is easier today for women to pursue a career in science than it was 30 or 40 years ago. While the number of women chemists at the master’s level continues to grow, the number of female PhD chemists is dropping precipitously. The choice to continue past the master’s level is a difficult one for women who continue to worry about balancing their work with a family. The availability of day care options is certainly more prevalent now, but it is still difficult for young mothers to cope with long hours in the laboratory. Experienced, successful women have a responsibility to mentor and encourage young talent. I have had the opportunity to speak with many young women as they are beginning their advanced studies. Overwhelmingly, they tell me that the opportunity to network with their female colleagues is invaluable. SB: Could you tell us what your background and professional and personal interests are? MO: My undergraduate degree is from Douglass College, which is part of Rutgers University and sister to the then all-male Rutgers College. While I had planned to pursue a career in the medical field, my professors convinced me that I had an aptitude for chemistry and I found myself spending more and more time in the chemistry building. I still maintained my love of the biological sciences and ended up with a degree in Microbiology. My first position at Merck was as a research assistant in the laboratories, working in the immunology department. There was an opening on The Index staff after the 10th Edition was published, and I thought that it would be the ideal job for someone like me who loved to read and write science. After joining the staff, I returned to Rutgers University part-time to pursue my Master’s Degree in Information Management, with two small children in tow. The decision to come to The Index wasabsolutely the right one for me. I have enjoyed my many years on the staff and have continued to learn something new every day. Not surprisingly, I do have a passion for reading, and enjoy non-fiction as well as novels of many different genres. One of my favorite hobbies is working in my perennial garden which is probably why I enjoy writing material on natural products and traditional medicines for The Index. I am an avid baseball fan and have been able to attend baseball games in the wonderful stadiums throughout the US, whenever my travel coincided with the home team’s schedule. SB: What is your relationship with the ACS? MO: I have personally been a member of ACS and the Chemical Information Division for many years and regularly attend the ACS national meetings. When not at the booth in the exhibit hall, you will more than likely find me attending one or more of the CINF sessions. I have also enjoyed being associated with the Women Chemists’ Committee and the great work they are doing to advance the careers of women in science.
  • 9. Chemical Information Bulletin Vol. 62(3) Fall 2010 9 The Merck Index has had a long partnership with the ACS, particularly in support of various educational initiatives. Last year, in conjunction with National Chemistry Week, Merck donated 12,000 copies of The Index to local ACS sections to distribute to high school students and teachers across the country. I had the privilege of attending a science fair at Ballou High School in Washington DC with ACS President Tom Lane when the first books were distributed. The students and their teachers were most inspiring and I was delighted to be able to meet them and present them with copies of The Merck Index for use in their classrooms. SB: Why is the online version of The Merck Index accessible through different platforms rather than having it available like other similar resources directly from a web page? MO: The editorial staff prepares all of the material and maintains our content management system, but we do not have any search software with which to produce an online edition ourselves. Because of the richness of our indexing and the granularity of the data, The Merck Index Online is much more than a text-searchable eBook. Our co-publishers process our data and apply their specific tools to enhance the searchability of the material. The Merck Index has a diverse user base ranging from healthcare professionals, to engineers, to bench researchers. It is important to us that our end users are able to easily access the information, regardless of their discipline. The various platforms have been strategically selected based on the way the data are presented and/or paired with other resources to meet the needs of their specific target audience. SB: As science is becoming more and more interdisciplinary, how do you see the evolution of The Merck Index? MO: In the preface to the Tenth Edition (1983), Martha Windholz described her most important challenge as being able to “effectively report major developments at the forefront of the life sciences and to reflect the complex and inextricable interdependence of chemistry, biology, and medicine.” This began the incorporation of biologics into The Index to provide research chemists with access to a broader scope of information than they previously needed. Speaking selfishly, I was quite happy with this decision, since I was purposefully hired to bring my blended perspective of biology and chemistry to the staff. There is always the temptation to stray too far from our true niche, and we must not forget that our readers have relied on The Index as a key core reference for organic chemistry. Future editors of The Merck Index must balance the need for this interdisciplinary information without abandoning the original purpose set forth in the First Edition to provide a summary of whatever chemical products are today adjudged as being useful in either medicine or technology. SB: Thank you, Maryadele, for giving this interesting interview. Our readers will have a better idea now how this “icon” of chemical information is being published.
  • 10. Chemical Information Bulletin Vol. 62(3) Fall 2010 10 ACS Chemical Information Division (CINF) Fall 2010 ACS National Meeting Boston, MA (August 22-26) CINF DIVISION MEETINGS AND SOCIAL EVENTS The CINF Executive Meeting is a closed meeting; if you wish to attend it, contact the division chair. CINF members are WELCOME AND ENCOURAGED to attend any of the other committee meetings and all social functions. All committee meetings will be held in the Boston Convention and Exhibit Center. Saturday, August 21, 2010 7:30 AM - 9:00 AM CINF: Long Range Planning & Breakfast Meeting (Room 156A) 9:00 AM - 10:00 AM CINF: Awards Committee Meeting (Room 156B) 10:00 AM - 11:00 AM CINF: Fundraising Committee Meeting (Room 156B) 11:00 AM - 12:00 PM CINF: Finance Committee Meeting (Room 156B) 9:00 AM - 10:30 AM CINF: Membership Committee Meeting (Room 155) 10:30 AM - 12:00 PM CINF: Careers Committee Meeting (Room 155) 9:00 AM - 12:00 PM Communications & Publications Committee Meeting (Room 156C) 9:00 AM - 12:00 PM CINF: Education Committee Meeting (Room 157A) 9:00 AM - 12:00 PM CINF: Program Committee Meeting (Room 156A) 12:00 PM - 1:00 PM CINF: Functionary Luncheon (Room 156A) 1:00 PM - 5:30 PM CINF: Executive Committee Meeting (Room 156A) Sunday, August 22, 2010 12:00 PM - 2:00 PM CINF - CSA Trust Group Meeting (Boston Convention and Exhibit Center, Room 153A) 6:30 PM - 8:30 PM Joint CINF/COMP Welcoming Reception, Sponsored Exclusively by ACS Publications and CINF Scholarship for Scientific Excellence Posters & Awards, Sponsored by FIZ CHEMIE Berlin Harbor Ballroom I, Westin Boston Waterfront Celebrating the 50th Anniversary of JCIM Monday, August 23, 2010 5:30 PM - 7:30 PM Harry’s Party, Sponsored Exclusively by FIZ CHEMIE Berlin Presidential Suite, Westin Boston Waterfront Tuesday, August 24, 2010 12:00 PM - 1:30 PM CINF: Luncheon – Boston Convention and Exhibit Center, Room 162A Speaker: Mike Capuzzo, author of “The Murder Room.” Sponsored by Bio-Rad Laboratories, CambridgeSoft and Thieme 6:30 PM - 8:30 PM CINF Herman Skolnik Award Reception honoring Dr. Anton Hopfinger Sponsored by ElsevierReaxys® & Procter & Gamble with InfoChem & RSC Publishing Plaza Ballroom C, Seaport Hotel Wednesday, August 25, 2010 12:00 PM - 5:00 PM CINF - CIC Collaborative Working Group Meeting Boston Convention and Exhibit Center 104A
  • 11. Chemical Information Bulletin Vol. 62(3) Fall 2010 11 ACS Chemical Information Division (CINF) Fall 2010 ACS National Meeting Boston, MA (August 22-26) Technical Program Highlights By Rajarshi Guha, Program Chair The Fall ACS National meeting is nearly upon us and we are looking forward to attending an exciting line up of symposia and talks. As in recent meetings, we’ve got a packed program covering a diverse set of topics ranging from the Semantic Web to structure-activity relationships. Sunday starts off with a symposium on the use of the Semantic Web in chemistry, organized by Egon Willighagen and Martin Braendle. This is the first session of the three-session symposium that extends to Monday. Topics being covered range from tools & technologies for enabling the use of semantic concepts with chemical data, to actual chemical and biological applications of these concepts. In parallel, Sunday also is the day for the joint CINF-SLA symposium that will address how collections and information resources can be assessed — quantitatively and qualitatively. On Sunday afternoon we will also have the third session of the Best Presentation Award Symposium, organized by John Van Drie. It will focus on methods and approaches to handling the problems associated with the data deluge in drug discovery settings. The winner will receive an invitation to the CINF Luncheon, a plaque, as well as $1,000 towards registration and expenses. On Monday, we continue with the Semantic Web symposium and concurrently will have the first session of the symposium celebrating 50 years of the Journal of Chemical Information. This is being organized by Prof. William Jorgensen and has a lineup of chemical information luminaries who will be describing some of the pioneering work that was published in the journal. On Monday afternoon we will have a brand new symposium — CINFlash. This is an experimental symposium where we’ve tried to have people speak about recent work by allowing them to bypass the ACS abstract system. In addition, each talk is going to be strictly timed (with the help of a loud horn) for six minutes. So, you should expect some fun stuff! Tuesday is a day devoted to the Herman Skolnik Symposium, honoring Prof. Anton Hopfinger. This whole-day event features talks from his past students and collaborators covering a range of topics in molecular modeling. The symposium will be followed by the Skolnik Reception. On Wednesday, we will have a symposium to discuss recent developments in the structure-activity landscape (SAL) concept. Organized by Jurgen Bajorth, Gerry Maggiora and Mic Lajiness, it will cover topics ranging from novel descriptions of landscapes to new applications of the concept in molecular modeling. Wednesday also sees a symposium address recent developments in chemical structure representation, which is organized and run by Richard Apodaca; it will be covering new tools and applications that address traditional and novel chemical structure representations. I’m quite excited with the upcoming program and at the same time I am grateful for the contributions from the Program Committee members and symposium organizers. Thanks to everybody and I look forward to meeting you in Boston.
  • 12. Chemical Information Bulletin Vol. 62(3) Fall 2010 12 Fall 2010 ACS National Meeting Boston, MA (August 22-27, 2010) Technical Program Schedule (Short Version) Rajarshi Guha, Chair SUNDAY MORNING Semantic Web in Chemistry E. Willighagen, Organizer; M. Braendle, Organizer; E. Willighagen, Presiding; Papers 1-5 Assessing Collections & Information Resources in Science & Technology E. Kajosalo, Organizer; E. Kajosalo, Presiding Papers 6-10 SUNDAY AFTERNOON Data-intensive Drug Design J. Van Drie, Organizer; J. Van Drie, Presiding; Papers 11-18 Assessing Collections & Information Resources in Science & Technology E. Kajosalo, Organizer; E. Kajosalo, Presiding Papers 19-22 SUNDAY EVENING 2010 CINF Scholarship for Scientific Excellence G. Grethe, Organizer; Papers 23-31 MONDAY MORNING Semantic Web in Chemistry E. Willighagen, Organizer; M. Braendle, Organizer; M. Braendle, Presiding; Papers 32-36 The Journal of Chemical Information and Modeling’s 50th Anniversary Symposium W. Jorgensen,Organizer; W. Jorgensen,Presiding Papers 37-42 MONDAY AFTERNOON Where's the Good Stuff - Consumer Health Information, and Social Networking Resources and Services A. Twiss-Brooks, Organizer; A. Twiss-Brooks, Presiding Papers 43-46 Semantic Web in Chemistry E. Willighagen, Organizer; M. Braendle, Organizer; E. Willighagen, Presiding; Papers 47-51 MONDAY EVENING Sci-Mix R. Guha, Organizer; Papers 2, 6, 78, 28, 91, 31, 20 TUESDAY MORNING Herman Skolnik Award Symposium A. Hopfinger, Organizer; E. X. Esposito, Organizer; A. Hopfinger, Presiding; Papers 52-55 TUESDAY AFTERNOON Herman Skolnik Award Symposium A. Hopfinger, Organizer; E. X. Esposito, Organizer; E. X. Esposito, Presiding; Papers 56-59 WEDNESDAY MORNING The Emerging Concepts of Activity Landscapes and Activity Cliffs and their Role in Drug Research G. Maggiora, Organizer; J. Bajorath, Organizer; M. Lajiness, Organizer; J. Bajorath, Presiding; Papers 60-64 Recent Progress in Chemical Structure Representation R. Apodaca,Organizer; R. Apodaca,Presiding; Papers 65-69 WEDNESDAY AFTERNOON The Emerging Concepts of Activity Landscapes and Activity Cliffs and their Role in Drug Research G. Maggiora, Organizer; J. Bajorath, Organizer; M. Lajiness, Organizer; G. Maggiora, Presiding Papers 70-73 Recent Progress in Chemical Structure Representation R. Apodaca,Organizer; R. Apodaca,Presiding; Papers 74-77 THURSDAY MORNING General Papers R. Guha, Organizer; R. Guha, Presiding; Papers 78-83 THURSDAY AFTERNOON General Papers R. Guha, Organizer; X. Wang,Presiding;Papers 84-91
  • 13. Chemical Information Bulletin Vol. 62(3) Fall 2010 13 ACS Chemical Information Division (CINF) Fall 2010 ACS National Meeting Boston, MA (August 22-27, 2010) Full Technical Program Schedule Rajarshi Guha Chair, Technical Programming OTHER SYMPOSIA OF INTEREST: Data Analysis: Statistics on Chemicals (see COMP, Thu) The Journal of Chemical Information and Modeling’s 50th Anniversary Symposium (see COMP, Mon) Computer Modeling: The Wave of the Future and its Benefits for Small Business Owners (see SCHB, Thu) Drugging the Undruggable: Small Molecule Modulators of Protein-Protein Interactions (see MEDI, Mon) SOCIAL EVENTS CINF Welcoming Reception and Scholarship for Scientific Excellence Posters, 6:30-8:30pm: Sun Harry’s Party, 5:30-8pm: Mon Luncheon, 12-1:30pm: Tue Herman Skolnik Award Reception, 6:30-8:30pm: Tue SUNDAY MORNING Section A Boston Convention & Exhibition Center, Room 156A Semantic Web in Chemistry RDF & Computation E. Willighagen, Organizer, Presiding M. Braendle, Organizer Cosponsored by COMP 8:45 Introductory Remarks 8:55 1. Semantic envelopment of cheminformatics resources with SADI. L. L. Chepelev, E. Willighagen, M. Dumontier 9:40 2. RESTful RDF web services for predictive toxicology. N. Jeliazkova 10:10 Intermission 10:25 3. Linking the resource description framework to cheminformatics and proteochemometrics. E. L. Willighagen, J. E. Wikberg 10:55 4. Chemical e-Science Information Cloud (ChemCloud): A semantic web based eScience infrastructure. A. Paschke, S. Heineke 11:25 5. Use of semantic web services to access small molecule ligand database. A. P. Tamhankar, A. S. Ausekar Section B Boston Convention & Exhibition Center, Room 155 Assessing Collections & Information Resources in Science & Technology E. Kajosalo, Organizer, Presiding Cosponsored by SLA‡ 9:00 Introductory Remarks 9:05 6. Usage metrics: Tools for evaluating science monograph collections. M. M. Foss, V. Kisling, S. Haas 9:30 7. Happily ever after or not: E-book collection usage analysis and assessment at USC Library. N. Xiao 9:55 8. From Chemical Abstracts to SciFinder: Transitioning to SciFinder and assessing customerusage. S. Makar, S. Bruss 10:20 Intermission 10:35 9. Using Web of Knowledge to identify publishing and citation patterns of campus researchers at the University of Arkansas. L. Salisbury, J. S. Smith 11:00 10. Don’t forget the qualitative: Including focus groups in the collection assessment process.S.Shepherd, T. M. Vogel Social Networking: The Next Generation Sponsored by CHED, Cosponsored by CINF and YCC Tautomers and Biology Computer Handling of Tautomers Sponsored by COMP, Cosponsored by CINF, MEDI, and PHYS
  • 14. Chemical Information Bulletin Vol. 62(3) Fall 2010 14 SUNDAY AFTERNOON Section A Boston Convention & Exhibition Center, Room 156A Data-intensive Drug Design J. Van Drie, Organizer, Presiding Cosponsored by COMP 1:45 Introductory Remarks 1:50 11. Strategies for the identification and generation of informative compound sets. M. S. Lajiness 2:15 12. Public-domain data resources at the European Bioinformatics Institute and their use in drug discovery. C. Steinbeck 2:40 13. Decision making in the face of complicated drug discovery data using the Novartis systemfor virtual medicinal chemistry (FOCUS). D. Chin 3:05 14. Integrating chemical and biological data: Insights from 10 years of VERDI. S. Roberts, W. P. Walters, R. McLoughlin, P. Gabriel, J. Willis, T. Kramer 3:30 Intermission 3:45 15. Collaborative database and computational models for tuberculosis drug discovery decision making. S. Ekins, J. Bradford, K. Dole, A. Spektor, K. Gregory, D. Blondeau, M. Hohman, B. A. Bunin 4:10 16. Data drive life sciences: The Pyramids meet the Tower of Babel. R. Guha 4:35 17. Design principles for diversity-oriented synthesis:Facilitating downstream discovery with upfront design.L. Marcaurelle 5:00 18. Overview: Data-intensive drug design. J. H. Van Drie Section B Boston Convention & Exhibition Center, Room 155 Assessing Collections & Information Resources in Science & Technology E. Kajosalo, Organizer, Presiding Cosponsored by SLA‡ 2:00 19. Data-driven development: How ACS Publications uses data to enhance products and services, and respond to customer needs.M. Blaney, S. Rouhi 2:25 20. Objective collections evaluation using statistics at the MIT Libraries. M. Willmott, E. Kajosalo 2:50 21. Getting the biggest bang for your buck: Methods and strategies for managing journal collections. G. Baysinger 3:15 22. Taking a collection down to its elements: Using various assessment techniques to revitalize a library. L. Solla 3:40 Panel Discussion Scripting & Programming HPC on the Cheap Sponsored by COMP, Cosponsored by CINF Social Networking: The Next Generation Sponsored by CHED, Cosponsored by CINF and YCC Using Waters Explicitly in Drug Discovery Theory and Methods Sponsored by COMP, Cosponsored by CINF and MEDI SUNDAY EVENING 2010 CINF Scholarship for Scientific Excellence G. Grethe, Organizer Financially supported by FIZ CHEMIE Berlin 6:30 - 9:30 23. Predicting specific inhibition of cyclophilins A and B using docking, growing, and free energy perturbation calculations. S. V. Sambasivarao, O. Acevedo 24. Using aggregative web services for drug discovery. Q. Zhu, M. S. Lajiness, D. J. Wild 25. Semantifying polymer science using ontologies. E. O. Cannon, A. Nico, P. Murray-Rust 26. Toxicity reference database (ToxRefDB) to develop predictive toxicity models and prioritize compounds for future toxicity testing. H. Tang, H. Zhu, L. Zhang, A. Sedykh, A. Richard, I. Rusyn, A. Tropsha 27. OrbDB: A database of molecular orbital interactions. M. A. Kayala, C. A. Azencott, J. H. Chen, P. F. Baldi 28. Novel approach to drug discovery integrating chemogenomics and QSAR modeling: Applications to anti-Alzheimer's agents. R. Hajjo, S. Wang, B. L. Roth, A. Tropsha 29. Cheminformatics improvements by combining semantic web technologies,cheminformatical representations,and chemometrics for statistical modeling and pattern recognition. E. L. Willighagen (withdrawn) 30. Prediction of consistent waternetworks in uncomplexed protein binding sites based on knowledge- based potentials. M. Betz, G. Neudert, G. Klebe (withdrawn) 31. Functional binders for non-specific binding: Evaluation of virtual screening methods for the elucidation of novel transthyretin amyloid inhibitors. C. J. Simões,T. Mukherjee, R. M. Jackson, R. M. Brito
  • 15. Chemical Information Bulletin Vol. 62(3) Fall 2010 15 MONDAY MORNING Section B Boston Convention & Exhibition Center, Room 155 Semantic Web in Chemistry OWL Chemical Ontologies M. Braendle, Organizer, Presiding E. Willighagen, Organizer Cosponsored by COMP 8:30 32. Using the oreChemexperiments ontology: Planning and enacting chemistry. J. G. Frey, M. I. Borkum, C. Lagoze, S. J. Coles 9:15 33. CHEMINF: Community-developed ontology of chemical information and algorithms. L. L. Chepelev, J. Hastings,E. Willighagen, N. Adams, C. Steinbeck, P. Murray-Rust, M. Dumontier 9:45 Intermission 10:00 34. Chemical entity semantic specification: Knowledge representation for efficient semantic cheminformatics and facile data integration. L. L. Chepelev, M. Dumontier 10:30 35. Semantic assistant for lipidomics researchers. A. Kouznetsov, R. Witte, C. J. Baker 11:00 36. ChemicalTagger: A tool for semantic text- mining in chemistry. L. Hawizy, D. M. Jessop, P. Murray-Rust Section A Boston Convention & Exhibition Center, Room 156A The Journal of Chemical Information and Modeling's 50th Anniversary Symposium W. Jorgensen,Organizer, Presiding Cosponsored by COMP 8:45 Introductory Remarks 8:55 37. From canonical numbering to the analysis of enzyme-catalyzed reactions: 32 years of publishing in JCIM (JCICS). J. Gasteiger 9:55 39. Fifteen years in chemical informatics: Lessons from the past,ideas for the future. D. Agrafiotis 10:25 Intermission 10:40 40. Applications of wavelets in virtual screening. V. Gillet, R. Martin, E. Gardiner, S. Senger 11:10 41. Privileged substructures revisited:Target community-selective scaffolds. J. Bajorath 11:40 42. Automated retrosynthetic analysis: An old flame rekindled. P. Johnson, A. P. Cook, J. Law, M. Mirzazadeh, A. Simon Tautomers and Biology Predictions of Tautomer Ratios Sponsored by COMP, Cosponsored by CINF, MEDI, and PHYS The Community Structure-Activity Resource (CSAR) Scoring Challenge Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI MONDAY AFTERNOON Section B Boston Convention & Exhibition Center, Room 155 Where’s the Good Stuff? Consumer Health Information, and Social Networking Resources and Services A. Twiss-Brooks, Organizer, Presiding Cosponsored by CHED 1:00 43. Dietary supplements: Free evidence-based resources for the cautious consumer. B. Erb 1:25 44. What lessons learned can we generalize from evaluation and usability of a health website designed for lower literacy consumers? M. J. Moore, R. G. Bias 1:50 45. National Library of Medicine resources for consumer health information. M. Eberle 2:15 46. Better prescription for information: Dietary supplements online. G. Y. Hendler Section A Boston Convention & Exhibition Center, Room 156A Semantic Web in Chemistry RDF/OWL Applications E. Willighagen, Organizer, Presiding M. Braendle, Organizer Cosponsored by COMP 1:15 47. Overview of the linking open drug data task. E. Prudhommeaux, E. Willighagen, S. Stephens 2:00 48. Control, monitoring, analysis and dissemination of laboratory physicalchemistry experiments using semantic web and broker technologies. J. G. Frey, S. Wilson
  • 16. Chemical Information Bulletin Vol. 62(3) Fall 2010 16 2:30 Intermission 2:45 49. Semantic analysis of chemical patents. D. M. Jessop, L. Hawizy, P. Murray-Rust, R. C. Glen 3:15 50. Data mining and querying of integrated chemical and biological information using Chem2Bio2RDF. D. J. Wild, B. Chen, Y. Ding, X. Dong, H. Wang, D. Jiao, Q. Zhu, M. Sankaranarayanan 3:45 51. Mining and visualizing chemical compound- specific chemical-gene/disease/pathway/literature relationships. Q. Zhu, P. Purohit, J. Youl Choi, S. Bae, J. Qiu, Y. Ding, D. Wild 4:15 Intermission 4:20 CINF Open Meeting 4:30 Open Meeting. Committees on Publications and Chemical Abstracts Service Section B Boston Convention & Exhibition Center, Room 155 CINFlash Can You Present Faster Than a Femtosecond Laser? R. Guha, Organizer, Presiding 2:45 Panel Discussion The Community Structure-Activity Resource (CSAR) Scoring Challenge Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI The Journal of Chemical Information and Modeling's 50th Anniversary Symposium Sponsored by COMP, Cosponsored by CINF Using Waters Explicitly in Drug Discovery Characterization and Applications Sponsored by COMP, Cosponsored by CINF and MEDI MONDAY EVENING Sci-Mix R. Guha, Organizer 8:00 - 10:00 2, 6, 20, 28, 31. See previous listings. 78, 91. See subsequent listings. TUESDAY MORNING Section A Boston Convention & Exhibition Center, Room 156A Herman Skolnik Award Symposium The Marriage, or at Least Dating, of Molecular Simulation and Modeling with QSAR Analysis: Exploring Chemometric Methods A. Hopfinger, Organizer, Presiding E. X. Esposito, Organizer 8:15 Introductory Remarks 8:30 52. What makes polyphenols good antioxidants? Alton Brown, you should take notes... E. X. Esposito 9:15 53. Engineering and 3D protein-ligand interaction scaling of 2D fingerprints. J. Bajorath 10:00 Intermission 10:15 54. In silico binary QSAR models based on 4D- fingerprints and MOE descriptors for prediction of hERG blockage. Y. Tseng 11:00 55. Telling the good from the bad and the ugly: The challenge of evaluating pharmacophore model performance. R. D. Clark Tautomers and Biology Tautomers and Macromolecule-ligand Complexes Sponsored by COMP, Cosponsored by CINF, MEDI, and PHYS The Community Structure-Activity Resource (CSAR) Scoring Challenge Sponsored by COMP, Cosponsored by BIOL, CINF, and MEDI TUESDAY AFTERNOON Section A Boston Convention & Exhibition Center, Room 156A Herman Skolnik Award Symposium The Marriage, or at Least Dating, of Molecular Simulation and Modeling with QSAR Analysis: Exploring Chemical Systems E. X. Esposito, Organizer, Presiding A. Hopfinger, Organizer 2:00 56. Creative application of ligand-based methods to solve structure-based problems: Using QSAR approaches to learn from protein crystal structures. C. M. Breneman, S. Das, M. Sundling, M. Krein, S. Cramer, K. P. Bennett, C. Bergeron, J. Zaretzki
  • 17. Chemical Information Bulletin Vol. 62(3) Fall 2010 17 2:45 57. Computer-aided drug discovery. W. L. Jorgensen 3:30 Intermission 3:45 58. Structure-based discovery and QSAR methods: A marriage of convenience. J. S. Duca 4:30 59. Extending the QSAR Paradigm using molecular modeling and simulation. A. J. Hopfinger 5:15 Presentation of Award Using Waters Explicitly in Drug Discovery Hybrid Explicit/Implicit Methods Sponsored by COMP, Cosponsored by CINF and MEDI WEDNESDAY MORNING Section A Boston Convention & Exhibition Center, Room 156A The Emerging Concepts of Activity Landscapes and Activity Cliffs and their Role in Drug Research J. Bajorath, Organizer, Presiding G. Maggiora, M. Lajiness, Organizers Cosponsored by COMP and MEDI 8:50 Introductory Remarks 9:00 60. Overview of activity landscapes and activity cliffs: Prospects and problems. G. M. Maggiora 9:30 61. Exploring and exploiting the potential of structure-activity cliffs. G. M. Maggiora, M. S. Lajiness 10:00 62. What makes a good structure activity landscape? Network metrics and structure representations as a way of exploring activity landscapes. R. Guha 10:30 Intermission 10:45 63. Consensus model of activity landscapes and consensusactivity cliffs. J. L. Medina-Franco, K. Martinez-Mayorga, F. Lopez-Vallejo 11:15 64. R-Cliffs: Activity cliffs within a single analog series. D. Agrafiotis Section B Boston Convention & Exhibition Center, Room 155 Recent Progress in Chemical Structure Representation Applications and Tools R. Apodaca, Organizer, Presiding 9:00 Introductory Remarks 9:05 65. Chemical structure representation in the DuPont Chemical Information Management Solutions database: Challenges posed by complex materials in a diversified science company. M. A. Andrews, E. S. Wilks 9:35 66. From deposition to application: Technologies for storing and exploiting crystal structure data. C. R. Groom, J. Cole, S. Bowden, T. Olsson 10:05 67. Recent IUPAC recommendations for chemical structure representation: An overview. J. Brecher 10:35 Intermission 10:50 68. Orbital development kit. E. L. Willighagen 11:20 69. Line notations as unique identifiers. K. Boda WEDNESDAY AFTERNOON Section A Boston Convention & Exhibition Center, Room 156A The Emerging Concepts of Activity Landscapes and Activity Cliffs and their Role in Drug Research G. Maggiora, Organizer, Presiding J. Bajorath, M. Lajiness, Organizers Cosponsored by COMP and MEDI 2:00 70. Analysis of activity landscapes,activity cliffs, and selectivity cliffs. J. Bajorath 2:30 71. Using Activity Cliff Information in structure- based design approaches. B. Seebeck, M. Wagener, M. Rarey 3:00 72. Exploring activity cliffs using large scale semantic analysis of PubChem. D. J. Wild, B. Chen, Q. Zhu 3:30 73. Quantifying the usefulness of a model of a structure-activity relationship: The SALI Curve Integral. J. H. Van Drie, R. Guha 4:00 Concluding Remarks Section B Boston Convention & Exhibition Center, Room 155 Recent Progress in Chemical Structure Representation File Formats and Line Notations R. Apodaca, Organizer, Presiding 2:00 74. Status of the InChI and InChIKey algorithms. S. Heller 2:30 75. Self-contained sequence representation (SCSR): Bridging the gap between bioinformatics and cheminformatics . K. T. Taylor, W. L. Chen, B. D. Christie, J. L. Durant, D. L. Grier, B. A. Leland, J. G. Nourse 3:00 Intermission 3:15 76. Representation of Markush structures:From molecules toward patents. S.Csepregi, N. Máté, R. Wágner,T. Csizmazia, S. Dóránt, E. Bíró, T. Dudgeon, A. Baharev, F. Csizmadia
  • 18. Chemical Information Bulletin Vol. 62(3) Fall 2010 18 3:45 77. CSRML: A new markup language definition for chemical substructure representation. C. H. Schwab, B. Bienfait, J. Gasteiger, T. Kleinoeder, J. Marucszyk, O. Sacher, A. Tarkhov, L. Terfloth, C. Yang THURSDAY MORNING Section A Boston Convention & Exhibition Center, Room 156A General Papers Characterization and Prediction R. Guha, Organizer, Presiding 8:45 78. Prediction of solvent physical properties using the hierarchical clustering method. T. M. Martin, D. M. Young 9:10 79. Scaffold diversity analysis using scaffold retrieval curves and an entropy-based measure. J. L. Medina-Franco, K. Martinez-Mayorga, A. Bender, T. Scior 9:35 80. Nonsubjective clustering scheme for multiconformer databases. A. B. Yongye, A. Bender, K. Martinez-Mayorga 10:00 Intermission 10:10 81. Finding drug discovery “rules of thumb”with bump hunting. T. Hashimoto, M. Segall 10:35 82. Machine learning in discovery research: Polypharmacology predictions as a use case. N. Wale, K. McConnell, E. M. Gifford 11:00 83. Interpretable correlation descriptors for quantitative structure-activity relationships. J. D. Hirst Computer Modeling: The Wave of the Future and its Benefits for Small Business Owners Sponsored by SCHB, Cosponsored by CINF, COMP, and PROF Targeting Gram-Negative Pathogens Sponsored by COMP, Cosponsored by CINF and MEDI THURSDAY AFTERNOON Section A Boston Convention & Exhibition Center, Room 156A General Papers Extraction and Integration X. Wang, Presiding R. Guha, Organizer 1:30 84. Chemistry in your hand: Using mobile devices to access public chemistry compound data. A. J. Williams, V. Tkachenko 1:55 85. Feature analysis of ToxCastTM compounds. P. Volarath, S. Little, C. Yang, M. Martin, D. Reif, A. Richard 2:20 86. Extracting information from the IUPAC Green Book. J. G. Frey, M. I. Borkum 2:45 87. Biologics and biosimilars: One and the same? R. Schenck 3:10 Intermission 3:20 88. Intelligent mining of drug information resources. R. Jain, A. Tamhankar, A. Ausekar, Y. Dixit 3:45 89. Cheminformatics semantic grid for neglected diseases.P. J. Kowalczyk 4:10 90. Extraction and integration of chemical information from documents. H. O. Villar, J. Betancort, M. R. Hansen 4:35 91. SAR and the role of active-site waters in blood coagulating serine proteases:A thermodynamic analysis of ligand-protein binding. N. K. Salam, W. Sherman, R. Abel Data Analysis: Statistics on Chemicals Sponsored by COMP, Cosponsored by CINF Targeting Gram-Negative Pathogens Sponsored by COMP, Cosponsored by CINF and MEDI
  • 19. Chemical Information Bulletin Vol. 62(3) Fall 2010 19 AWARDS AND SCHOLARSHIPS 2010 Herman Skolnik Award to Tony J. Hopfinger Anton (Tony) J. Hopfinger Anton (Tony) J. Hopfinger, Distinguished Research Professor of Pharmacy, University of New Mexico, Professor Emeritus of Medicinal Chemistry and Pharmacognosy, University of Illinois, and co-Founder and Chief Science Officer of The Chem21 Group, Inc., is the recipient of the 2010 Herman Skolnik Award presented by the ACS Division of Chemical Information (CINF). The award recognizes outstanding contributions to and achievements in the theory and practice of chemical information science and related disciplines. The prize consists of a $3,000 honorarium and a plaque. At the ACS Fall National Meeting there will be a Skolnik Award Symposium entitled The Marriage, or at Least Dating, of Molecular Simulation and Modeling with QSAR Analysis: Exploring Chemometric Methods, organized by Tony Hopfinger and Emilio Esposito. It will be held on Tuesday,August 24th, and abstracts of the papers to be presented are listed under “Abstracts” in this issue. Tony Hopfinger is recognized as a pioneer and major contributor in the fields of quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) techniques employing three and higher dimensional levels of information derived from modeling and simulation. Tony has addressed chemical information and modeling problems in the pharmaceutical, polymer and materials sciences, in both industry and academia, and he is generally acknowledged as having fathered the development of QSPR modeling in polymer and materials science, including coining the acronym QSPR. The breadth of his interests and the applicability of the techniques he has developed are reflected in the topics covered in some of his recent papers, including drug discovery, ADME-Tox property prediction, nanotoxicity, cheminformatic descriptors and molecular similarity analysis. Tony has made many contributions to the field of cheminformatics through his publications, teaching, mentoring, advising and organizing. He has authored or co-authored more than 270 peer-reviewed (and highly cited) papers and delivered almost 360 invited lectures. He has served on many journal editorial boards and has been an associate editor of the Journal of Chemical Information and Modeling (previously Journal of Chemical Information and Computer Sciences) for the past 16 years. He has been a member of government and industrial advisory boards, and he chaired a Gordon Research Confe rence on Quantitative Structure-Activity Relationships in Biology. He has coordinated and taught at short courses in North and South America and Europe; more than 50 computational scientists earned their Ph.D. degrees under Tony ’s mentoring; and he has also provided advanced training to more than 70 postdoctoral students. Tony Hopfinger received a B.S. in Math and Physics from the University of Wisconsin in 1966, and a Ph.D. in Biophysical Chemistry from Case Western Reserve University in 1969. He started his career in 1969 as an NIH Postdoctoral Fellow, Department of Biological Chemistry, Harvard Medical School, and from there moved to Case Western Reserve University in 1970 as Assistant Professor of Macromolecular Science. He held increasingly senior positions at Case Western, eventually becoming Professor of Macromolecular Science in 1978 and Director, Research Computing Laboratory, in 1979. In 1981 he moved from academia to industry, joining G.D. Searle (now part of Pfizer) as Director, Department of Drug Design , and later Director, Department of Medicinal Chemistry. Tony maintained links with academia, holding several adjunct and visiting professorships, and in his spare time founded, or co-founded, a number of software and pharmaceutical companies, including Intersoft, ChemLab, Receptor Laboratories and DNACodes. He returned to academia in 1985 and was Professor of Bioengineering, Chemistry and Medicinal Chemistry, University of Illinois at Chicago until 2005. Since then he divides his time as Distinguished Research Professor of Pharmacy, University of New Mexico; Chief Science Officer of The Chem21 Group, Inc.; and Professor Emeritus of Medicinal Chemistry and Pharmacognosy, University of Illinois. Tony Hopfinger is highly respected by all of his colleagues worldwide and this Award is a well-deserved recognition of the outstanding career of an unstinting and generous pioneer and practitioner of cheminformatics. Phil McHale, Chair, CINF Awards Committee pmchale@cambridgesoft.com
  • 20. Chemical Information Bulletin Vol. 62(3) Fall 2010 20 CINF Scholarship for Scientific Excellence Sponsored by Accelrys® The scholarship program of the Division of Chemical Information (CINF) of the American Chemical Society (ACS) funded by Accelrys is designed to reward graduate and postdoctoral students in chemical information and related sciences for scientific excellence and to foster their involvement in CINF. Up to two scholarships valued at $1,000 each will be presented at the 241th ACS National Meeting in Anaheim, CA, March 27 – 31, 2011. Applicants must be enrolled at a certified college or university, and they will present a poster during the Welcoming Reception of the division on Sunday evening at the National Meeting. Additionally, they will have the option to also show their poster at the Sci-Mix session on Monday night. Abstracts for the poster must be submitted electronically through PACS, the new abstract submission systemof ACS. To apply, please inform the Chair of the selection committee, Guenter Grethe at ggrethe@comcast.net, that you are applying for a scholarship. Submit your abstract at http://abstracts.acs.org using your ACS ID. If you do not have an ACS ID, follow the registration instructions and submit your abstract for “CINF Scholarship for Scientific Excellence”. PACS will be open for abstract submissions on August 3, 2010, and close on October 18, 2010. Additionally, please send a 2,000-word abstract describing the work to be presented in electronic form to the Chair of the selection committee by January 31, 2011. Any questions related to applying for one of the scholarships should be directed to the s ame e-mail address. Winners will be chosen based on contents, presentation and relevance of the poster and they will be announced during the reception. The contents shall reflect upon the student’s work and describe research in the field of cheminformatics and related sciences. Winning posters will be marked “Winner of Accelrys-CINF Scholarship for Scientific Excellence” at the poster session. Guenter Grethe ggrethe@comcast.net
  • 21. Chemical Information Bulletin Vol. 62(3) Fall 2010 21 Chemical Structure Association Trust Applications Invited for CSA Trust Jacques-Émile Dubois Grants for 2011 The Chemical Structure Association (CSA) Trust is an internationally recognized organization established to promote the critical importance of chemical information to advances in chemical research. In support of its charter, the Trust has creat ed a unique Grant Program, renamed in honor of Professor Jacques-Émile Dubois who made significant contributions to the field of cheminformatics. The Trust is currently inviting the submission of grant applications for 2011. Purpose of the Grants: The Grant Program has been created to provide funding for the career development of young researchers who have demonstrated excellence in their education, research or development activities that are related to the systems and methods used to store, process and retrieve information about chemical structures, reactions and compounds. A Grant will be awarded annually up to a maximum of four thousand U.S. dollars ($4,000). Grants are awarded for specific purposes, and within one year each grantee is required to submit a brief written report detailing how the grant funds were allocated. Grantees are also requested to recognize the support of the Trust in any paper or presentation that is given as a result of that support. Who is Eligible? Applicant(s), age 35 or younger, who have demonstrated excellence in their chemical information related research and who are developing careers that have the potential to have a positive impact on the utility of chemical information relevant to chemical structures, reactions and compounds, are invited to submit applications. While the primary focus of the Grant Program is the career development of young researchers, additional bursaries may be made available at the discretion of the Trust. All requests must follow the application procedures noted below and will be weighed against the same criteria. What ActivitiesAre Eligible? Grants may be awarded to acquire the experience and education necessary to support research activities; e.g. for travel to collaborate with research groups, to attend a conference relevant to one’s area of research, to gain access to special computational facilities, or to acquire unique research techniques in support of one’s research. ApplicationRequirements: Applications must include the following documentation: 1. A letter that details the work upon which the Grant application is to be evaluated as well as details on research recently completed by the applicant; 2. The amount of Grant funds being requested and the details regarding the purpose for which the Grant will be used (e.g. cost of equipment, travel expenses if the request is for financial support of meeting attendance, etc.). The relevance of the above-stated purpose to the Trust’s objectives and the clarity of this statement are essential in the evaluation of the application); 3. A brief biographical sketch, including a statement of academic qualifications; 4. Two reference letters in support of the application. Additional materials may be supplied at the discretion of the applicant only if relevant to the application and if such materials provide information not already included in items 1-4. Three copies of the complete application document must be supplied for distribution to the Grants Committee. Deadline for Applications: Applications must be received no later than March 14, 2011. Successfulapplicants will be notified no later than May 2, 2011.
  • 22. Chemical Information Bulletin Vol. 62(3) Fall 2010 22 Address for Submission of Applications: Two copies of the application documentation should be forwarded to: Bonnie Lawlor CSA Trust Grant Committee Chair 276 Upper Gulph Road, Radnor, PA 19087, USA. If you wish to enter your application by e-mail, please contact Bonnie Lawlor at blawlor@nfais.org prior to submission so that she can contact you if the e-mail does not arrive. 2011 Lucille M. Wert Scholarship Call for Applications Deadline: February 1, 2011 Designed to help persons with an interest in the fields of Chemistry and Information to pursue graduate study in Library, Information, or Computer Science, the Scholarship consists of a $1,500 honorarium. This scholarship is given yearly by the Division of Chemical Information of the American Chemical Society. The applicant must have a bachelor’s degree with a major in Chemistry or related disciplines (related disciplines are, for example, Biochemistry or Chemical Informatics). The applicant must have been accepted (or currently enrolled) into a graduate Library, Information, or Computer Science program in an accredited institution. Work experience in Library, Information or Computer Science preferred. The deadline to apply for the 2011 Lucille M. Wert Scholarship is February 1, 2011. Details on the application procedures can be found at: http://www.acscinf.org and once there click on “Awards” and then click on “Lucille M. Wert Student Scholarship”. Applications (e-mail preferred) can be sent to: margaret.matthews@thomsonreuters.com Contact: Marge Matthews CINF Awards Committee 633 Dayton Rd. Bryn Mawr, PA 19010-3801 Phone: 215-823-3922 SCHEDULE OF FUTURE ACS NATIONAL MEETINGS 240th Fall 2010 August 22-26 Boston,Massachusetts 241st Spring 2011 March 27-31 Anaheim, California 242nd Fall 2011 August 28- September 1 Denver, Colorado 243rd Spring 2012 March 25-29 San Diego, California 244th Fall 2012 September 9-13 Philadelphia, Pennsylvania
  • 23. Chemical Information Bulletin Vol. 62(3) Fall 2010 23 Book Reviews By Robert E. Buntrock buntrock16@myfairpoint.net De Bellis, Nicola. Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics; Scarecrow Press: Lanham, MD, 2009, $55.00 (Paperback). 394 pp. ISBN: 978-0-8108-6713-0. This rather massive monograph is the outgrowth of a research project on a related topic and is an English translat ion, encouraged by Eugene Garfield, of the Italian original. The history and philosophy behind citation indexing and other bibliometric measures are documented in chapters one and three. The empirical basis, the literary antecedents, and comparisons with concept indexing and other full text retrieval are described in chapter two, including some discussion of Salton’s work. The work of the giants in these portions of the information industry — Bernal, Merton, Price, Garfield, and Small — are documented in detail. The mathematics of bibliometrics are described in chapter four including skewness, Lotka’s Law, Bradford’s Law, Zipf’s Law, and the work of Mandelbrot. A chapter titled “Maps and Paradigms” discusses involvement of bibliographic citation with the history and sociology of science using co-citation analysis and other methods. Another chapter, titled “Impact Factor and the Evaluation of Scientists: Bibliographic Citation at the Service of Science Policy and Management,” probably has the most relevance to most scientists. The various metrics are discussed including the Hirsch (h) index. Chapter seven, with the intriguing title, “On the Shoulders of Dwarfs: Citation as a Rhetorical Device …,” describes reasons for citation, professed and actual. Chapter eight evolves the discussion into cybermetrics, including the involvement of citation or linking in the performance of search engines. Errors and omissions do occur. Reference 5 for the introduction is missing. Recall and relevance are only discussed briefly in the context of “improvement” of results from concept indexing (manual) and retrieval by means of Salton’s geometric machine indexing and other full text indexing methods. No mention is made in chapter seven (or apparently elsewhere) of the relevance of citation retrieval since it should be commonly known among searchers that authors don’t always cite other references for the same reasons that the searcher is interested in. Due to the multitude of topics and concepts that can app ear in a single article, many of us searchers can cite instances where a citation was made for a non-relevant concept. Curiously, the discussion of citation searching in patents is the last section of chapter six and has no discussion of the validity of t he bibliometric value of citations in patents. The work of Narin is described and referenced, but that of critics of the method, including Edlyn Simmons, Stu Kaback, and Nancy Lambert, is missing. The existence of other uses of citation indexing and searching, e.g. in the CA file on versions of STN, is not mentioned. In the Conclusions, the author provides an either/or summation of the evaluation controversy. Either you believe that citations are “Mertonian” or you don’t. If the former, researchers, organization, and journals can be evaluated. If you don’t, none of the evaluations can be made and the Citation Index itself may not be of value. This reviewer instead t akes an intermediate attitude. Bibliometric evaluations can be a valuable supplement in a larger, more personal evaluation scheme. As for searching, use of citation indexes is a valuable supplement to other methods of searching (index, full text, etc.) and all methods should be used, none exclusively. This book will be useful to those interested in or doing research in the fields of information science or history of science. Chapter six should be made available to the management of academic and other organizations that use citation analysis for personnel evaluation. Grubb, Philip W., Thomsen, Peter R. Patents for Chemicals, Pharmaceuticals,and Biotechnology; Oxford University Press: New York, 2010, $190.00 (Hardcover). 592 pp. ISBN: 987-0-19-957523-7. This exhaustive monograph covers the whole of the field of patents on an international scale. History, patent law and procedure, patentability, patenting in practice (including drafting), and commercial exploitation are covered in depth in 25 chapters. However, it is directed at patent agents and other practitioners in patent law including portfolio manag ement. Significantly, the chapter in previous editions on patents and information (chapter 20 in the 4th edition) has been eliminated. For a treatment of the field of patent information, including a good primer on patents in general, the interested reader is referred to, among others,“Information Sources in Patents”* by Stephen Adams. *Adams, S. R. Information Sources in Patents, 2nd ed.; K. G. Saur, Munich, 2006.
  • 24. Chemical Information Bulletin Vol. 62(3) Fall 2010 24 ACS Chemical Information Division (CINF) Fall 2010 ACS National Meeting Boston, MA (August 22-26) ABSTRACTS CINF 1 Semantic envelopment of cheminformatics resources with SADI Leonid L Chepelev(1) ,leonid.chepelev@gmail.com,1125 Colonel By Drive, Ottawa Ontario K1S 5B6, Canada ; Egon Willighagen(2) ;Michel Dumontier(1) .(1) Department of Biology,School of Computer Science,and Institute of Biochemistry, Carleton University, Ottawa Ontario K1S 5B6, Canada (2) Department of Pharmaceutical Sciences,Uppsala University, Uppsala, Sweden The distribution of computational resources as web services and their execution as workflows has enabled facile computation and data integration for bio- and cheminformatics. The Semantic Automated Discovery and Integration (SADI) framework addresses many shortcomings of similar frameworks, such as SSWAP and BioMoby, while allowing for more efficient semantic envelopment of computational chemistry services, resource discovery, and automated workflow organization. In this work, we apply the CHEMINF ontology and Chemical Entity Semantic Specification and demonstrate the usability of the SADI framework in solving common cheminformatics problems starting from RDF-based chemical entity representations. Our eventual goal is to convert all of the functions and functionalities of the Chemistry Development Kit (CDK) into distinct SADI services. This would enable the formulation of all cheminformatics problems currently addressed by CDK, as SPARQL queries, returning meaningful RDF output which can then be easily integrated with existing RDF- based knowledgebases or used for further processing. CINF 2 RESTful RDF web services for predictive toxicology Nina Jeliazkova(1) ,jeliazkova.nina@gmail.com,4 A.Kanchev str., Sofia - 1000,Bulgaria . (1) Ideaconsult Ltd., Sofia 1000,Bulgaria The Open Source Predictive Toxicology Framework http://www.opentox.org, developed by partners of the EC FP7 OpenTox project , aims at providing a unified access to toxicity data and predictive models, as well as validation procedures. This is achieved by i) an information model, based on a common OWL-DL ontology http://www.opentox.org/api/1.1/opentox.owl ii) flexibility by linking with related ontologies; iii) availability of data and algorithms via a standardized REST web services interface, where every compound, data set or predictive method has an unique web address, used to retrieve its RDF representation, or initiate the calculations. The OpenTox framework allows building user-friendly applications for toxicological experts or model developers, or direct access by an application programming interface for development, integration and validation of new algorithms. The work presented describes the experience of building RESTful web services, based on RDF representation of resources, to incorporate diverse IT solutions into a distributed and interoperable system. CINF 3 Linking the resource description framework to cheminformatics and proteochemometrics Egon L. Willighagen(1) , egon.willighagen@farmbio.uu.se,Box 591,Uppsala Uppland SE-75124;Jarl E.S. Wikberg(1) .(1) Department of Pharmaceutical Biosciences,Uppsala University, Uppala, Sweden Background Semantic web technologies are finding their way into the life sciences.Ontologies and semantic markup have already been used for more than a decade in molecular sciences,but have not found widespread use yet. The semantic web technology Resource Description Framework (RDF) and related methods showto be sufficiently versatile to change that situation. Results The work presented here focuses on linking RDF approaches to existing molecular chemometrics fields, including cheminformatics, QSAR modeling and proteochemometrics. Applications are presented that link RDF technologies to methods from statistics and
  • 25. Chemical Information Bulletin Vol. 62(3) Fall 2010 25 cheminformatics, including data aggregation, visualization, chemical identification, and property prediction. They demonstrate how this can be done using various existing RDF standards and cheminformatics libraries. For example, we showhow IC50 and Ki values are modeled for a number of biological targets using data from the chEMBL database. Conclusions We have shown that existing RDF standards can suitably be integrated into existing molecular chemometrics methods. Platforms that unite these technologies,like Bioclipse, makes this even simpler and more transparent. Being able to create and share workflows that integrate data aggregation and analysis (visual and statistical) is beneficial to interoperability and reproducibility. The current work shows that RDF approaches are sufficiently powerful to support molecular chemometrics workflows. CINF 4 Chemical e-Science Information Cloud (ChemCloud): A semantic web based eScience infrastructure Stephan Heineke(1) ,heineke@fiz-chemie.de,Franklinstr. 11, Berlin Berlin, Germany ; Adrian Paschke(2) .(1) FIZ CHEMIE, Berlin 10587,Germany (2) Department of Mathematicsand ComputerScience, FU Berlin, Berlin 14195,Germany Our Chemical e-Science Information Cloud (ChemCloud) – a Semantic Web based eScience infrastructure – integrates and automates a multitude of databases,tools and services in the domain of chemistry, pharmacy and bio-chemistry available at the Fachinformationszentrum Chemie (FIZ CHEMIE), at the Freie Universitaet Berlin (FUB), and on the public Web. Based on the approach of the W3C Linked Open Data initiative and the W3C Semantic Web technologies for ontologies and rules it semantically links and integrates knowledge from our W3C HCLS knowledge base hosted at the FUB, our multi-domain knowledge base DBpedia (Deutschland) implemented at FUB, which is extracted from Wikipedia (De) providing a public semantic resource for chemistry, and our well-established databases at FIZ Chemie such as ChemInform for organic reaction data, InfoTherm the leading source for thermophysical data, Chemisches Zentralblatt, the complete chemistry knowledge from 1830 to 1969, and ChemgaPedia the largest and most frequented e-Learning platform for Chemistry and related sciences in German language. CINF 5 Use of semantic web services to access small molecule ligand database Anay P Tamhankar(1) ,anay@evolvus.com,88 Shukrawar Peth, Pune Maharashtra 411002,India ; Aniket S Ausekar(1) .(1) Software Solutions Group,Evolvus, Pune Maharashtra 411002,India Resource Description Framework (RDF) and a set of associated technologies like OWL, SPARQL etc..., which form the W3C's semantic web technology stack, are renewing interest in semantic chemistry. Semantic Web Services not only specify syntactic interoperability but also specify and enforce the semantic constraints of messages being transmitted and objects being accessed. Liceptor database is a small molecule ligand database consisting of approximately 4 million compounds.The database schema consists offields like molecular properties (2D-structure, molecular weight, molecular formula etc...), molecular descriptors (H-donors, H- acceptors,logP, logD number of rotational bonds etc...) and pharmacological properties (bio-assays,receptors, enzymes, parameters, animal models, therapeutic indications etc...). Pharmaceutical and Bio-Technology companies use this database to mine chemical space for internal research, to prioritize QSAR and pharmacophore studies,for synthetic chemistry endeavors and for advancing hit-to-lead patterns. The database records are available in multiple formats (relational database,XML, Rdfile etc...) as well as available online through an interactive web application (html format). The soon to be released version of the database includes access using semantic web services. The ontology is expressed in OWL and RDF defines the overall
  • 26. Chemical Information Bulletin Vol. 62(3) Fall 2010 26 framework. Typical consumers of the data using this access mechanism are expected to be third-party tool vendors and data aggregators. Use of semantic web services allows evolution of the schema over time without explicitly communicating the change as well as requiring all data consumers to be changed. CINF 6 Usage metrics: Tools for evaluating science monograph collections Michelle M Foss(1) ,micfoss@uflib.ufl.edu, PO Box 117011,Gainesville FL 32611,United States ; Vernon Kisling(1) ;Stephanie Haas(1) .(1) Department of Marston Science Library, University of Florida, Gainesville FL 32611,United States As academic libraries are increasingly supported by a matrix of databases functions, the use of data mining and visualization techniques offer significant potential for future collection development based on quantifiable data. While data collection techniques are not standardized and results may be skewed because of granularity problems, or faulty algorithms, useful baseline data is extractable and broad trends identified. The purpose of the study is to provide an initial assessment of data associated with the science monograph collection at the Marston Science Library (MSL), University of Florida. The sciences fall within the major Library of Congress Classification schedules of Q, S, and T, excluding TN, TR, TT, and R. The overall strategy of this project is to analyze audience- based circulation patterns, e-book usage, purchases, and interlibrary loan statistics from the academic year July 1, 2008 to June 30, 2009. Such analyses provide an evidence-based framework for future collection decisions. CINF 6 Usage metrics: Tools for evaluating science monograph collections Michelle M Foss(1) ,micfoss@uflib.ufl.edu, PO Box 117011,Gainesville FL 32611,United States ; Vernon Kisling(1) ;Stephanie Haas(1) .(1) Department of Marston Science Library, University of Florida, Gainesville FL 32611,United States As academic libraries are increasingly supported by a matrix of databases functions, the use of data mining and visualization techniques offer significant potential for future collection development based on quantifiable data. While data collection techniques are not standardized and results may be skewed because of granularity problems, or faulty algorithms, useful baseline data is extractable and broad trends identified. The purpose of the study is to provide an initial assessment of data associated with the science monograph collection at the Marston Science Library (MSL), University of Florida. The sciences fall within the major Library of Congress Classification schedules of Q, S, and T, excluding TN, TR, TT, and R. The overall strategy of this project is to analyze audience- based circulation patterns, e-book usage, purchases, and interlibrary loan statistics from the academic year July 1, 2008 to June 30, 2009. Such analyses provide an evidence-based framework for future collection decisions. CINF 7 Happily ever after or not: E-book collection usage analysis and assessment at USC Library Norah Xiao(1) ,nxiao@usc.edu,910 BloomWalk, Los Angeles CA 90089-0481,United States . (1) University of Southern California,United States With more and more e-book collections being launched by publishers, USC Science and Engineering Library initiated its e-book collection acquisition since late 2008, and one of first and biggest acquired collections is Springer e-books. Now after two years, are users satisfied with this e-book collection? Are they accessing and using it? Like any other e-collection, how well have we, librarians and staff, been coping with this collection in collection development (e.g. e-book packages from other publishers), access services (e.g. interlibrary loan, off-campus access, e-books technical issues), outreach (e.g. e-book market strategies), and information literacy? This presentation will overview our assessment of this e-book collection after 2 years. What have we learned from the usage data? And by analyzing the data, how did and can we improve our services to users? It is hoped to our experience can present a proactive implementation plan for others considering comprehensive digital migration of their content, with the goal of not only better coping with the current economic environment, but of spurring development, innovation, and efficiency in the long run.
  • 27. Chemical Information Bulletin Vol. 62(3) Fall 2010 27 CINF 8 From Chemical Abstracts to SciFinder: Transitioning to SciFinder and assessing customer usage Susan Makar(1) ,susan.makar@nist.gov,100 Bureau Drive, MS 2500, Gaithersburg MD 20899-2500,United States; Stacy Bruss(1) , stacy.bruss@nist.gov,100 Bureau Drive, MS 2500, Gaithersburg MD 20899-2500,United States. (1) National Institute of Standardsand Technology, United States The Research Library of the National Institute of Standards and Technology (NIST) monitors SciFinder usage to ensure customers have ready access to the database and to determine who uses it. Usage statistics played a critical role in determining whether to increase the number of seats and which heavy users should help pay for those additional seats. While most NIST researchers were very excited to acquire access to this product, many, who were well acquainted with using the print version of Chemical Abstracts, needed to learn best techniques for searching and browsing the chemistry literature using SciFinder. Transitioning from the printed Chemical Abstracts to SciFinder posed significant challenges to one research project. This presentation will describe how the NIST Research Library used SciFinder usage statistics to make collection development decisions and how library staff worked with NIST researchers to successfully transition from the printed Chemical Abstracts to SciFinder. CINF 9 Using Web of Knowledge to identify publishing and citation patterns of campus researchers at the University of Arkansas Lutishoor Salisbury(1) ,lsalisbu@uark.edu,365 N. McIlroy Ave.,Fayetteville AR 72701-4002,United States; Jeremy S. Smith(1) .(1) University of Arkansas,United States This presentation will provide information on a project undertaken at the University of Arkansas in Fayetteville to study publications by the campus researchers with an emphasis on the STEM (agricultural sciences, physical science, biological sciences, engineering and mathematics, etc.) disciplines at the macro-level for a three-year period. The overall objective of the study was (1) to provide an overview of the productivity of faculty and researchers in the various departments which could be used in allocating resources for collection development and (2) to provide evidence-based data of periodical use to assist with collection decisions and to identify collection strengths at the university level. We used the Web of Knowledge database (Science Citation Index, Social Science Citation Index and Arts and Humanities Citation Index) to identify the periodical literature in which our researchers published and those that they cite in their publications to do several analysis including determining the extent to which our researchers are publishing in and citing periodicals from the Elsevier, Wiley and IEEE journal packages. A methodology for extracting citations from Web of Knowledge into an Excel spreadsheet will also be presented. The strengths and weaknesses of the Web of Knowledge for this study will also be highlighted. CINF 10 Don't forget the qualitative: Including focus groups in the collection assessmentprocess Teri M. Vogel(1) ,tmvogel@ucsd.edu,9500 Gilman Dr #0175E,La Jolla CA 92093,United States ; Susan Shepherd(1) .(1) University of California San Diego, United States To complement our ongoing quantitative collection evaluations based on cost and usage data,the UC San Diego Science & Engineering Library conducted a series of focus groups with graduate students and faculty in our core departments. Our objective was to learn more about how they use the collection for research and teaching,so that we could make more informed decisions about collection management, as well as how best to deploy our staff resources for increased promotion, outreach and instruction. Participants were asked about the resources they use, how they use them, and what gaps they perceived. We also probed their familiarity with the top licensed resources in their fields. In this presentation we will discuss ourfocus group methods, results and the next steps we have taken in this assessment,including a follow-up survey to the same departments to obtain more quantitative information about usage of the collection.
  • 28. Chemical Information Bulletin Vol. 62(3) Fall 2010 28 CINF 11 Strategies for the identification and generation of informative compound sets Michael S Lajiness(1) ,LajinessMS@Lilly.com, 1 Corporate Center, DC:1930, Indianapolis,IN IN 46285, United States. (1) ComputerAided Drug Discovery, Eli Lilly & Company,Indianapolis,IN IN 46285,United States Mounting pressures in drug discovery research dictate more efficient methods of picking the winners: molecules that actually have a chance to be the drugs of the future. Clearly, these methods need to navigate a highly, multi- dimensional landscape. It is also clear that hard filters should never be used and that a more continuous treatment or prioritization has clear advantages. Further, structural diversity needs to be considered in order for the best structural ideas to be found most efficiently. In addition, history and external sources of information also must be examined. This presentation will describe some of the methods, techniques, and strategies that have been employed by the author over the past 25 years working in cheminformatic that attempt to identify compounds that are likely to provide the most useful information so that one might discover solid leads more rapidly. CINF 12 Public-domain data resources at the European Bioinformatics Institute and their use in drug discovery Christoph Steinbeck(1) ,steinbeck@ebi.ac.uk,Welcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, United Kingdom. (1) European BioinformaticsInstitute, EMBL Outstation - Hinxton,Hinxton Cambridge CB10 1SD, United Kingdom Small molecules are of increasing interest for bioinformatics in areas such as metabolomics and drug discovery. The recent release of large open chemistry databases into the public domain calls for flexible, open toolkits to process them. These databases and tools will, for the first time, create opportunities for academia and third-world countries to perform state-of-the-art open drug discovery and translational research – endeavors so far a domain of the pharmaceutical industry.This talk will describe a couple of relevant data resources at the European Bioinformatics Institute and will also outline our research on and development of toolkits such as the Chemistry Development Kit and CDK-Taverna to support the exploitation of these data sources. CINF 13 Decision making in the face of complicated drug discovery data using the Novartis system for virtual medicinal chemistry (FOCUS) Donovan Chin(1) ,donovan_chin@novartis.com,250 Mass Ave, Cambridge MA 02139,United States. (1) Global Discovery Chemistry, Novartis Institutesfor BioMedical Research, Cambridge MA 02139,United States This talk will describe some of the broad concepts that led to the development of the Novartis software system for data analysis & virtual medicinal chemistry (FOCUS). The system, which is routinely used globally, is designed to present the scientist with an accessible interface that permits iterative hypothesis testing of many possible chemical candidates while accounting for undesirable ADMET properties. Some of the key principles are to present the data in a way that reflects stored knowledge and facilitates the decision about what compound to make next. We will highlight some of these concepts in applications spanning the range from target identification to drug optimization. CINF 14 Integrating chemical and biological data: Insights from 10 years of VERDI Susan Roberts(1) ,susan_roberts@vrtx.com,130 Waverly St, Cambridge MA 02139,United States ; W. Patrick Walters(1) ;Ryan McLoughlin(1) ;Philppe Gabriel(1) ; Jonathan Willis(1) ;Trevor Kramer(1) .(1) Vertex Pharmaceuticals,Cambridge MA 02139,United States VERDI is a software system, originally developed in 2000 at Vertex Pharmaceuticals, for integrating chemical and biological data and delivering this information to drug discovery teams. In addition to traditional table views, VERDI incorporated a number of modules designed to enable scientists to understand relationships between chemical structure and biological data. Over the last 10 years, VERDI has been the primary data access tool for hundreds of scientists at multiple sites around the world. A retrospective evaluation of VERDI has provided us with a number of 'lessons-learned', which come from a multitude of revisions, improvements and new feature additions. Some of these lessons, which are being used as the basis for development of the next generation of data analysis and visualization tools at Vertex, will be presented and discussed in detail.
  • 29. Chemical Information Bulletin Vol. 62(3) Fall 2010 29 CINF 15 Collaborative database and computational models for tuberculosis drug discovery decision making Sean Ekins(1)(2)(3)(4) ,sekins@collaborativedrug.com,601 Runnymede Ave, Jenkintown PA 19046,United States ; Justin Bradford(1) ;Krishna Dole(1) ;Anna Spektor(1) ; Kellan Gregory(1) ;David Blondeau(1) ;MosesHohman(2) ; Barry A Bunin(1) .(1) Collaborative Drug Discovery, Burlingame CA 94403,United States (2) Collaborations in Chemistry, Burlingame CA 94403,United States (3) Department of Pharmaceutical Sciences,University of Maryland,Burlingame CA 94403,United States (4) Department of Pharmacology,Robert Wood Johnson Medical School,University of Medicine & Dentistry of New Jersey, Burlingame CA 94403, United States Drug discovery is being re-shaped involving large scale collaborations that connect individual researchers using collaborative computational approaches and crowdsourcing. Future drug discovery decisions will ultimately still be made based on massive multidimensional datasets. As an example, the search for molecules with activity against Mycobacterium tuberculosis (Mtb) is employing many approaches in collaborating national and international laboratories. We have developed a database (CDD TB) to capture public and private Mtb data while enabling data mining and collaborations with other researchers. We have also used the public data along with several computational approaches including Bayesian classification models for 220,463 molecules and tested them with external molecules, enabling the discrimination of active or inactive substructures from other datasets in CDD TB. The combination of the database, dataset analysis, and computational models provides new insights into molecular properties and features that are determinants of whole cell activity, allowing prioritization and decision making around molecules. CINF 16 Data drive life sciences:The Pyramids meet the Tower of Babel Rajarshi Guha(1) , guhar@mail.nih.gov,9800 Medical Center Drive, Rockville MD 20852,United States . (1) Department of Informatics, NIH Chemical Genomics Center, Rockville MD 20852,United States A characteristic feature of modern life science research is the fact that it has become data intensive. As a result we are faced with datasets of massive size and wide variety in terms of the type of data. Examples include massive datasets from next generation sequencing to more complex datasets of chemical structure and activity from high-throughput small molecule screens. In this talk I will discuss some aspects of how one can handle datasets of such size and variability. I will consider examples from computational science and distributed services that allow us to easily and cheaply handle massive datasets to integration approaches that attempt to merge data from multiple sources to obtain a systems level view of the biological effects of small molecules. In all cases, the focus will be data generated from and for small molecule studies. CINF 17 Design principles for diversity-oriented synthesis: Facilitating downstream discovery with upfront design Lisa Marcaurelle(1) ,lisa@broadinstitute.org,7 Cambridge Center, Cambridge MA 02139,United States. (1) Chemical Biology Platform, Broad Institute, Cambridge MA 02139,United States To expand the diversity of our screening collection to access a broad range of biological targets, we aspire to produce libraries of small-molecules that combine the structural complexity of natural products and the efficiency of high-throughput processes. Moreover, we aim to synthesize the complete matrix of stereoisomers for all library members. We reason that this unique collection will enable the rapid development of stereo-structure/activity relationships (SSAR) upon biological testing providing valuable information for the prioritization and optimization of hit compounds. Although our library products may be distinct compared to traditional compound collections, we are faced with fundamental questions relevant to library design: How do you prioritize scaffolds for synthesis? How do you select products with desirable physicochemical properties? In designing DOS libraries we employ a number of cheminformatic methods to tackle such issues and select compounds for synthesis/screening. An overview of our design criteria and decision-making process will be presented.
  • 30. Chemical Information Bulletin Vol. 62(3) Fall 2010 30 CINF 18 Overview: Data-intensive drug design John H Van Drie(1) ,johnvandrie@mindspring.com,34 Stinson Rd, Andover MA 01810,United States. (1) R&D, Van Drie Research, AndoverMA 01810,United States How do we best make med chem decisions in the face of a lot of data? This is an issue that confronts us at many stages of the drug discovery process: screening, hit-to- lead, early lead optimization, and late-stage lead optimization. In this session, speakers representing each of these stages will describe how they have successfully tackled these issues, emphasizing general principles over specific computational tools. Our brains can conveniently handle only about 7 things at a time, and most traditional med chem. decision-making processes reflect that. Already when the number of molecules being considered is in the range of dozens, things get tricky; when that number is in the thousands to hundreds of thousands, one must re-orient one's perspective CINF 19 Data-driven development: How ACS Publications uses data to enhance products and services, and respond to customer needs Sara Rouhi(1) ,s_rouhi@acs.org, 1155 16th St., NW, Washington DC 20036,United States; Melissa Blaney(1) . (1) ACS Publications,United States As the scholarly publishing landscape continues to rapidly transform in unprecedented ways, publishers and libraries have had to quickly pivot to accommodate the changing preferences that users have for accessing, collecting, and consuming digital information. ACS Publications has used a data-driven approach to handle these changing customer and end-user needs. Everything from our ACS Mobile iPhone application to our transition from print to online Web products has been shaped by this approach. This presentation will address the role of data in developing new products, enhancing our web presence, and responding to user behavior on the ACS Web Editions Platform. CINF 20 Objective collections evaluation using statistics at the MIT Libraries Mathew Willmott(1) ,willmott@mit.edu,Building 14S-134, 77 MassachusettsAve, Cambridge MA 02139-4307, United States ; Erja Kajosalo(1) .(1) Engineering & Science Libraries, Massachusetts Institute of Technology, United States Recent budget pressures have forced many libraries to reevaluate their collections and substantially cut back on their subscription spending.The task of evaluating a large collection of subscription-based materials, however, is a difficult one. Journals from different subject areas are used differently, and journals from different publishers have their usage measured differently. Evaluating each individual journal subscription separately would be a monumental task bordering on infeasibility. This paper will discuss the approach taken by the MIT Engineering and Science Libraries in the spring of 2009 and 2010 to evaluate their journal collections, specifically for Springer, Elsevier, and Wiley-Blackwell, the three journal publishers with which these libraries hold the most subscriptions.Discussion will include the gathering and analysis of usage data,publication data, and citation data, as well as the process by which these data were combined to create an objective ranking for each journal. These objective rankings were not final decisions; librarians with subject expertise then evaluated the lower-ranked journals to determine if they were appropriate choices for cancellation, often taking into consideration many additional factors. However, these objective evaluations helped librarians to more efficiently use their time by indicating which journals may be strong candidates for cancellation, and they helped department liaisons to defend final cancellation choices to a very data-driven faculty. The end result was a more efficient cancellation process as well as a more comprehensive understanding of the library's journal collections. CINF 21 Getting the biggest bang for your buck: Methods and strategies for managing journal collections Grace Baysinger(1) ,graceb@stanford.edu,364 Lomita Drive, Stanford 94305-5081,United States . (1) Stanford University, United States Chemistry journals have the highest average cost per title of all subject areas. Library collection budgets have not kept pace with price increases and funds to acquire new titles are scarce. Signing big deals for journals has limited flexibility in adapting to changes.These factors have made acquiring journals to support programmatic needs more of a challenge than ever before. This presentation will cover methods, strategies,and tools than can be used to help assess howresources are allocated when
  • 31. Chemical Information Bulletin Vol. 62(3) Fall 2010 31 developing and managing journal collections. CINF 22 Taking a collection down to its elements: Using various assessmenttechniques to revitalize a library Leah Solla(1) ,leah.solla@cornell.edu,283 Clark Hall, Ithaca NY 14953-2501,United States. (1) Cornell University, 283 Clark Hall, Ithaca NY 14953-2501, United States What are the elements of a research literature collection in the physical sciences? How are they being used and what roles are they playing in research and teaching and learning? Who is using them- students, faculty, related disciplines? These are the questions that drove the extensive analyses conducted on the print and electronic literature collections in the Physical Sciences Library at Cornell University in preparation for transitioning the service model from a print-based facility to electronic collections and services. General trends indicated the usage of the collection had been well over 90% electronic for years and the acquisition of books and journals in print had been reduced to minimal levels under budget pressures. But there were significant gaps in the electronic holdings and there remained a small but very active core of the print collection, both warranted further study to enable us to provide the best possible access to these crucial materials in the new service model. The library management system was mined for a variety of data points and complemented with external data sources and user input to build the transition map for the physical sciences literature collections. CINF 23 Predicting specific inhibition of cyclophilins A and B using docking, growing, and free energy perturbation calculations Somisetti V Sambasivarao(1) ,somissv@auburn.edu,179 Chemistry Building,Auburn AL 36849,United States ; Orlando Acevedo(1) .(1) Department of Chemistry and Biochemistry, Auburn University, Auburn AL 36849, United States Cyclophilins (Cyp) belong to the enzyme class of peptidyl-prolyl isomerases which catalyze the cis-trans conversion of prolyl bonds in peptides and proteins. Twenty human Cyp isoenzymes have been reported and many are excellent targets for the inhibition of hepatitis C virus replication and multiple inflammatory diseases and cancers. Given the complete conservation of all active site residues between many of the enzymes, i.e., CypA, CypB, CypC and CypD, a better understanding of how to specifically inhibit individual targets could potentially reduce reported side effects in current treatments. Docking and growing programs have been used to construct protein-ligand complexes for a variety of reported selective inhibitors, including acylurea and aryl 1-indanylketone derivatives. Free-energy perturbation/Monte Carlo (FEP/MC) calculations have been utilized to quantitatively reproduce the free energies of binding for the inhibitors in multiple Cyp active sites in order to elucidate the origin of the specificity for the compounds. CINF 24 Using aggregative web services for drug discovery Qian Zhu(1) , qianzhu@indiana.edu,901 E 10th St., Bloomington IN 47408,United States ; Michael S. Lajiness(1) ;David J. Wild(1) .(1) School of Informatics and Computing,Indiana University, Bloomington IN 47408, United States Recent years have seen a huge increase in the amount of publicly-available information pertinent to drug discovery, including online databases of compound and bioassay information; scholarly publications linking compounds with genes, targets and diseases; and predictive models that can suggest new links between compounds,genes,targets and diseases. However, there is a distinct lack of data mining tools available to harness this information, and in particular to look for information across multiple sources. At Indiana University we are developing an aggregative web service framework to solve this kind of problems. It offers a new approach to data mining that crosses information source types to look at the "big picture" and to identify corroborating or conflicting information from models, assays, databases and publications.
  • 32. Chemical Information Bulletin Vol. 62(3) Fall 2010 32 CINF 25 Semantifying polymer science using ontologies Edward O. Cannon(1) ,eoc21@cam.ac.uk,Lensfield Road, Cambridge Cambridgeshire CB2 1EW, United Kingdom; Adams Nico(1) ;PeterMurray-Rust(1) .(1) Department of Chemistry, Unilever Centre for Molecular Science Informatics, University of Cambridge, Cambridge Cambridgeshire CB2 1EW, United Kingdom Ontologies are graph based, formal representations of information in a domain. Currently, there is a large interest in ontologies for biology and medicine, though little effort has been concentrated in the field of chemistry, let alone polymer science. We have developed a number of ontologies for polymer science: properties, measurement techniques and measurement conditions, using the Web Ontology Language. These ontologies will help facilitate the standardization of data exchange formats in polymer science by providing a common domain of knowledge. The properties ontology contains over 150 properties and has been integrated with the measurement techniques and conditions ontology, to give information on how a property is measured and under what conditions. The ontologies will be of use to polymer scientists wishing to reach a consensus in this area of knowledge. The ontologies also have the advantage that they can be integrated into software applications to leverage this knowledge. CINF 26 Toxicity reference database (ToxRefDB) to develop predictive toxicity models and prioritize compounds for future toxicity testing Hao Tang(1)(2) ,tangh@email.unc.edu,120 Mason Farm Road Genetic Medicine, Suite 3010,Campus Box 7260, Chapel Hill NC 27599,United States; Hao Zhu(1) ;Liying Zhang(1) ;AlexanderSedykh(1) ;Ann Richard(3) ;Ivan Rusyn(4) ;AlexanderTropsha(1) .(1) Division of Medicinal Chemistry and Natural Products,School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill NC 27599,United States(2) Department of Biochemistry and Biophysics, School of Medicine,University of North Carolina at Chapel Hill, Chapel Hill NC 27599,United States(3) National Centerfor Computational Toxicology, Office of Research&Developoment,U.S. Environmental Protection Agency, Chapel Hill NC 27711,United States (4) Department of Environmental Sciencesand Engineering,School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill NC 27599,United States EPA's ToxCast program aims to use in vitro assays to predict chemical hazards and prioritize chemicals for toxicity testing. We employed the predictive QSAR workflow to develop computational toxicity models for ToxCast compounds with historical animal testing results available from ToxRefDB. To ensure model stability and robustness, multiple classifiers and 5-fold external cross- validation were applied. Results show that for three of the 78 toxicity endpoints, including one chronic and two reproductive endpoints, the Correct Classification Rate for external validation datasets was above 0.6 for all types of QSAR models. Our studies suggest that it is feasible to develop QSAR models for some endpoints, which could be further augmented by in vitro assay measures. The validated toxicity models were used for virtual screening of 50,000 chemicals compiled for the REACH program. The compounds predicted as toxic could be regarded as candidates for future toxicity testing. Abstract does not reflect EPA policy. CINF 27 OrbDB: A database of molecular orbital interactions Matthew A. Kayala(1) ,mkayala@ics.uci.edu,6473 Adobe Circle, Irvine CA 92617,United States ; Chloe A. Azencott(1) ;Jonathan H.Chen(1) ;Pierre F. Baldi(1) .(1) Department of Computer Science, University of California - Irvine, Irvine CA 92627,United States The ability to anticipate the course of a reaction is essential to the practice of chemistry. This aptitude relies on the understanding of elementary mechanistic steps, which can be described as the interaction of filled and unfilled molecular orbitals. Here, we create a database of mechanistic steps from previous work on a rule-based expert system (ReactionExplorer). We derive 21,000 priority ordered favorable elementary steps for 7800 distinct reactants or intermediates. All other filled to unfilled molecular orbital interactions yield 106 million unfavorable elementary steps. To predict the course of reactions, one must recover the relative priority of these elementary steps. Initial cross-validated results for a neural network on several stratified samples indicate we are able to retrieve this ordering with a precision of 98.9%. The quality of our database makes it an invaluable resource for the prediction of elementary reactions, and therefore of full chemical processes.
  • 33. Chemical Information Bulletin Vol. 62(3) Fall 2010 33 CINF 28 Novel approach to drug discovery integrating chemogenomics and QSAR modeling: Applications to anti-Alzheimer's agents Rima Hajjo(1) ,hajjo@email.unc.edu,2069 Genetic Medicine Building,120 Mason Farm Road,Chapel Hill NC 27599,United States; Simon Wang(1) ;Bryan L. Roth(2) ;AlexanderTropsha(1) .(1) Department of Medicinal Chemistry and Natural Products, University of North Carolina at Chapel Hill, Chapel Hill NC 27599, United States(2) Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill NC 27599,United States Chemogenomics is an emerging interdisciplinary field relating the receptorome-wide biological screening to functional or clinical effects of chemicals. We have developed a novel chemogenomics approach combining QSAR modeling, virtual screening (VS), and gene expression profiling for drug discovery. Gene signatures for the Alzheimer's disease (AD) were used to query the Connectivity Map (cmap,http://www.broad.mit.edu/ cmap/) to identify potential anti-AD agents.Concurrently, QSAR models were developed for the serotonin, dopamine, muscarinic and sigma receptor families implicated in the AD. The models were used for VS of the World Drug Index database to identify putative ligands. 12 common hits from QSAR/VS and cmap studies were subjected to parallel binding assays against a panel of GPCRs. All compounds were found to bind to at least one receptor with binding affinities between 1.7 - 9000 nM. Thus,our approach afforded novel experimentally confirmed GPCR ligands that may be implied as putative treatments for the AD. CINF 29 Cheminformatics improvements by combining semantic web technologies, cheminformatical representations, and chemometrics for statistical modeling and pattern recognition Egon L. Willighagen(1) , egon.willighagen@farmbio.uu.se,Box 591,Uppsala Uppland,Sweden . (1) Department of Pharmaceutical Biosciences, Uppsala University, Uppsala Uppland SE- 75124,Sweden My research focuses on the methods needed for large- scale molecular property prediction, using semantic web, cheminformatics, and chemometrics methods. Originally starting with a Dictionary on Organic Chemistry website, research was started to find methods to accurately disseminate molecular knowledge, resulting in participation in Open Source cheminformatics projects, including Jmol, JChemPaint, and the Chemical Markup Language project, and an oral presentation at the "2000 Chemistry & Internet" conference. In that year, the applicant founded together with the Jmol and JChemPaint project leaders the Chemistry Development Kit (CDK), which is now a highly cited Open Source cheminformatics toolkit. Between 2001 and 2006 the applicant continued research in the area of data analysis with a PhD thesis on the "Representation of Molecules and Molecular Systems in Data Analysis and Modeling" with Prof. dr L.M.C. Buydens at the Analytical Chemistry Department at the Radboud University Nijmegen. The thesis studies the interaction of representation and the statistics and shows how tightly these need to match. Topics of the thesis include: a critical analysis of the use of proton and carbon NMR in QSAR; the use of Open Source, Open Data, and Open standards in interoperability in cheminformatics; the clustering of crystal structures using a novel similarity measure; and, the use of new supervised self-organizing maps in pattern recognition in crystallography. Part of the research was performed in the group of dr P. Murray-Rust at Cambridge University. Later research focused on the use of semantic technologies to reduce error in the aggregation and exchange of molecular data. Recent work applies developed technologies to cheminformatics in general and QSAR and metabolite identification in particular, with dr C. Steinbeck at Cologne University in Germany, and with dr R. van Ham at Wageningen University within the Netherlands Metabolomics Center. The applicant recently joined the development teamof the award-winning cheminformatics-platform Bioclipse in Uppsala with Prof. J. Wikberg in Sweden, to continue his research in improving interoperability and reproducibility in cheminformatics and pharmaceutical bioinformatics and proteochemometrics in particular. This implies continued CDK development, development of semantic methods in computational chemistry, and making these technologies accessible to the non-programming chemist by supporting the development of cheminformatics in bench-chemist-oriented platforms such as Bioclipse and Taverna.
  • 34. Chemical Information Bulletin Vol. 62(3) Fall 2010 34 CINF 30 Prediction of consistent water networks in uncomplexed protein binding sites based on knowledge-based potentials Michael Betz(1) ,michael.betz@staff.uni-marburg.de, MarbacherWeg 6, Marburg 35032,Germany ; Gerd Neudert(1) ;Gerhard Klebe(1) .(1) Pharmaceutical Chemistry, Philipps-University Marburg,Marburg 35032,Germany Within the active site of a protein water fulfills a variety of different roles. Solvation of hydrophilic parts stabilizes a distinct protein conformation, whereas desolvation upon ligand binding may lead to a gain of entropy. In an overwhelming number of cases,water molecules mediate interactions between protein and the bound ligand. Therefore, a reliable prediction of water molecules participating in ligand binding is essential for docking and scoring, and is necessary to develop strategies in ligand design.We require some reasonable estimates about the free energy contributions of water to binding. Useful parameters for such estimations are the total number of displaceable water molecules and the probabilities for their displacement upon ligand binding. These parameters depend on specific interactions with the protein and other water molecules, and thus the positions ofindividual water molecules. The high flexibility of water networks makes it difficult to observe distinct water molecules at well defined positions in structure determinations. Thus,experimentally observed positions ofwater molecules have to be assessed critically, bearing in mind that they represent an average picture of a highly dynamic equilibrium ensemble. Moreover, there are many structures with inconsistent and incomplete water networks. To address these deficiencies we developed a tool that predicts possible configurations of complete water networks in binding pockets in a consistent way. It is based on the well established knowledge-based potentials implemented into DrugScore, which also allow for a reasonable differentiation between "conserved" and "displaceable" water molecules. The potentials used were derived specifically for water positions as observed in small molecule crystal structures in the CSD. To account for the flexibility and high intercorrelation we apply a clique-based approach,resulting in water networks maximizing the total DrugScore. To incorporate as much known information as possible about a given target, we also allow to include constraints defined by experimentally observed water positions. Our tool provides a useful starting point whenever a possible configuration of water molecules need to be estimated in an uncomplexed protein, and suggests their spatial positions and their classification with respect to some kind of affinity prediction. In first tests we were able to get classifications and positional predictions which are in good agreement with crystallographically observed water molecules with remarkably small deviations. CINF 31 Functional binders for non-specific binding: Evaluation of virtual screening methods for the elucidation of novel transthyretin amyloid inhibitors Carlos J.V. Simões(1)(2) ,csimoes@qui.uc.pt,Coimbra, Coimbra 3004-535,Portugal ; Trishna Mukherjee(2) ; Richard M. Jackson(2) ;Rui M.M. Brito(1) .(1) Department of Chemistry, Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra 3004-535,Portugal (2) Institute of Molecular and Cellular Biology,University of Leeds, Coimbra 3004-535,Portugal Inhibition of fibril formation by stabilization of the native form of transthyretin (TTR) is a viable approach for the treatment of Familial Amyloid Polyneuropathy that has been gaining momentum in the field of amyloid research. Herein, we present a benchmark of five virtual screening strategies to identify novel TTR stabilizers: (1) 2D similarity searches with chemical hashed fingerprints, pharmacophore fingerprints and UNITY fingerprints, (2) 3D-searches based on shape, chemical and electrostatic similarity, (3) LigMatch, a ligand-based method employing multiple templates, (4) 3D- pharmacophore searches, and (5) docking to consensus X-ray crystal structures. By combining the best-performing VS protocols, a small subset of molecules was selected froma tailored library of 2.3 million compounds and identified as representative of multiple series of potential leads. According to our predictions, the retrieved molecules present better solubility, halogen fraction and binding affinity for both TTR pockets than the stabilizers discovered to date.
  • 35. Chemical Information Bulletin Vol. 62(3) Fall 2010 35 CINF 31 Functional binders for non-specific binding: Evaluation of virtual screening methods for the elucidation of novel transthyretin amyloid inhibitors Carlos J.V. Simões(1)(2) ,csimoes@qui.uc.pt,Coimbra, Coimbra 3004-535,Portugal ; Trishna Mukherjee(2) ; Richard M. Jackson(2) ;Rui M.M. Brito(1) .(1) Department of Chemistry, Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra 3004-535,Portugal (2) Institute of Molecularand CellularBiology,University of Leeds, Coimbra 3004-535,Portugal Inhibition of fibril formation by stabilization of the native form of transthyretin (TTR) is a viable approach for the treatment of Familial Amyloid Polyneuropathy that has been gaining momentum in the field of amyloid research. Herein, we present a benchmark of five virtual screening strategies to identify novel TTR stabilizers: (1) 2D similarity searches with chemical hashed fingerprints, pharmacophore fingerprints and UNITY fingerprints, (2) 3D-searches based on shape, chemical and electrostatic similarity, (3) LigMatch, a ligand-based method employing multiple templates, (4) 3D- pharmacophore searches, and (5) docking to consensus X-ray crystal structures. By combining the best-performing VS protocols, a small subset of molecules was selected froma tailored library of 2.3 million compounds and identified as representative of multiple series of potential leads. According to our predictions, the retrieved molecules present better solubility, halogen fraction and binding affinity for both TTR pockets than the stabilizers discovered to date. CINF 32 Using the oreChemexperiments ontology: Planning and enacting chemistry Jeremy G Frey(1) ,j.g.frey@soton.ac.uk, School of Chemistry, Univeristy of Southampton,Southampton Hants SO17 1BJ, United Kingdom; Mark I Borkum(1) ; Carl Lagoze(2) ;Simon J Coles(1) .(1) School of Chemistry, Univeristy of Southampton,Southampton Hants SO17 1BJ, United Kingdom(2) Department of Information Science,Cornell Univeristy, Ithica NY 14850,United States This paper presents the oreChem Experiments Ontology, an extensible model that describes the formulation and enactment of scientific methods (referred to as “plans”), designed to enable new models of research and facilitate the dissemination of scientific data on the Semantic Web. Currently, a high level of domain-specific knowledge is required to identify and resolve the implicit links that exist between digital artefacts, constituting a significant barrier-to-entry for third parties that wish to discover and reuse published data. The oreChem ontology radically simplifies and clarifies the problem of representing an experiment to facilitate the discovery and re-use of the data in the correct context. We describe the main parts of the ontology and detail the enhancements made to the Southampton eCrystals repository to enable the publication of oreChem metadata. CINF 33 CHEMINF: Community-developed ontology of chemical information and algorithms Leonid L Chepelev(1) ,leonid.chepelev@gmail.com,1125 Colonel By Drive, Ottawa Ontario K1S 5B6, Canada ; Janna Hastings(2) ;Egon Willighagen(3) ;Nico Adams(2) ; Christoph Steinbeck(2) ;Michel Dumontier(1) ;Peter Murray-Rust(4) .(1) Department of Biology, School of Computer Science,and Institute of Biochemistry, Carleton University, Ottawa Ontario, Canada (2) Chemoinformatics and Metabolism Team, European Bioinformatics Institute,Cambridge, United Kingdom (3) Department of Pharmaceutical Sciences,Uppsala University, Uppsala, Sweden (4) Department of Chemistry, Unilever Centre for Molecular Informatics, University of Cambridge,Cambridge, United Kingdom In order to truly convert RDF-encoded chemical information into knowledge and break out of domain- and vendor-specific data silos, reliable chemical ontologies are necessary. To date, no standard ontology that addresses all chemical information representation and service integration needs has emerged from previously proposed ontologies, ironically threatening yet another “Tower of Babel” event in cheminformatics. To avoid resultant substantial ontology mapping costs, we hereby propose CHEMINF, a community-developed modular and unified ontology for chemical graphs, qualities, descriptors, algorithms, implementations, and data representations/formalisms. Further, CHEMINF is aligned with ontologies developed within the OBO Foundry effort, such as the Information Artifact Ontology. We present the application of CHEMINF to efficiently integrate two RDF-based chemical knowledgebases with different representation structures and aims, but common classes and properties from CHEMINF. Finally, we discuss the steps taken to ensure applicability of this ontology in the semantic envelopment of computational chemistry resources, algorithms, and their output.
  • 36. Chemical Information Bulletin Vol. 62(3) Fall 2010 36 CINF 34 Chemical entity semantic specification: Knowledge representation for efficient semantic cheminformatics and facile data integration Leonid L Chepelev(1) ,leonid.chepelev@gmail.com,1125 Colonel By Drive, Ottawa Ontario K1S 5B6, Canada ; Michel Dumontier(1) .(1) Department of Biology,School of ComputerScience, and Institute of Biochemistry, Carleton University, Ottawa Ontario K1S 5B6, Canada Though the nature of RDF implies the ability to interoperate and integrate diverse knowledgebases, designing adequate and efficient RDF-based representations of knowledge concerning chemical entities is non-trivial. We hereby describe Chemical Entity Semantic Specification (CHESS), which captures chemical descriptors, molecular connectivity, functional composition, and geometric structure of chemical entities and their components. CHESS also handles multiple data sources and multiple conformers for molecules, as well as reactions and interactions. We demonstrate the generation of a chemical knowledgebase fromdisparate data sources, using which we conduct an analysis of the implications of design choices taken in CHESS on the efficiency of solutions for some classical cheminformatics problems, including molecular similarity searching and subgraph detection. We do this through automated conversion of SMILES-encoded query fragments into SPARQL queries and DL-Safe rules. Finally, we discuss approaches to identification of potential reaction participants and class members in chemical entity knowledgebases represented with CHESS. CINF 35 Semantic assistant for lipidomics researchers Alexandre Kouznetsov(1) ,alexk@unb.ca,P.O. Box 5050, TuckerPark Road,Saint John New Brunswick E2L 4L5, Canada ; Rene Witte(2) ;ChristopherJ.O. Baker(1) .(1) Department of ComputerScience and Applied Statistics, University of New Brunswick,Saint John New Brunswick E2L 4L5, Canada (2) Department of Computer Science and Software Engineering,Concordia University, Montreal,Canada Lipid nomenclature has yet to become a robust research tool for lipidomics or lipid research in general. This is in part because no rigorous structure based definitions exist for membership of specific lipid classes has existed. Recent work on the OWL-DL Lipid Ontology with defined axioms for class membership and has provided new opportunities to revisit the lipid nomenclature issue [1], [2]. Also necessary is a framework for sharing these axioms with scientists during scientific discourse and the drafting of publications. To achieve this we introduce here a new paradigm for Lipidomics researchers in which a client side application tags raw text about lipids with information, such as canonical name or relevant functional groups, derived from the ontology and is delivered using web services. Our approach includes following core components: (i)Semantic Assistant Framework [6]; (ii) Lipid ontology [4]; (iii) Ontological NLP methodology; (iv) Ontology Axiom-extractor for the GATE framework. The Semantic Assistant Framework is aservice-oriented architecture used to enhancing existing end-user clients, such Open Office Writter, with online Lipidomics text analysis capabilities provided as a set of web services. The Ontological NLP methodology links Lipid named entities occurred in a document opened on client side with existing ontologies on server side. The Ontology Axiom-extractor annotates each named entity with canonical name, class name and related class axioms providing annotation for documents on the client side. The proposed system is scalable and extensible allowing researchers to easily customize the information to be delivered as annotations depending on the availability of chemical ontologies with defined axioms linked to canonical names for chemical entities. [1] Baker CJO, Low HS, KanagasabaiR, and Wenk MR, (2010) Lipid Ontologies, 3rdInterdisciplinary Ontology Conference, Tokyo, Japan, February 27-28, 2010 [2] Low HS, Baker CJO, Garcia A and Wenk M., OWL- DL (2009), Ontology for Classification of Lipids, International Conference on Biomedical Ontology, Buffalo, New York, July 24-26 [3] Witte R., Gitzinger T., (2008), A General Architecture for Connecting NLP Frameworks and Desktop Clients Using Web Services, 13th International Conference on Applications of Natural Language to Information Systems [4] Lipid Ontology available at http://bioportal.bioontology.org/ontologies/39503
  • 37. Chemical Information Bulletin Vol. 62(3) Fall 2010 37 CINF 36 ChemicalTagger: A tool for semantic text-mining in chemistry Lezan Hawizy(1) ,lh359@cam.ac.uk;Dave M Jessop(1) ; Peter Murray-Rust(1) .(1) The Unilever Centre for MolecularScience Informatics, Department of Chemistry, University of Cambridge,Cambridge CB2 1EW, United Kingdom The primary method for scientific communication is in the form of published scientific articles and theses and the use of natural language combined with domain-specific terminology. As such,they contain unstructured data. Given the unquestionable usefulness ofdata extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches.Using chemical synthesis procedures as an exemplar, we present ChemicalTagger. ChemicalTagger is a tool that combines chemical entity recognisers such as OSCAR with tokenisers, part-of-speech taggers and shallow parsing tools to produce a formal structure of reactions. This extracted data can then be expressed in RDF. This allows for the generation of highly informative visualisations,such as visual document summaries, structured querying and further enrichment can be provided by linking with domain specific ontologies. CINF 37 From canonical numbering to the analysis of enzyme- catalyzed reactions: 32 years of publishing in JCIM (JCICS) Johann Gasteiger(1) ,Gasteiger@molecular- networks.com,Naegelsbachstr.25, Erlangen D-81477, Germany ; Johann Gasteiger(2) .(1) Computer-Chemie- Centrum, University of Erlangen-Nuremberg,Erlangen D-91052, Germany (2) MolecularNetworksGmbH, Erlangen D-91052, Germany In 1972 we embarked on the development of a program for computer-assisted synthesis design which eventually led to the present system THERESA. Along the way many fundamental problems had to be solved such as the unique representation of chemical structures published in 1977. This work laid the foundation for building the Beilstein database. Methods had to be developed for the computer representation of chemical reactions which formed the basis for constructing the ChemInform reaction database. Recent work has concentrated on the analysis of biochemical reactions, the prediction of metabolism and the risk assessment of chemicals. CINF 38 Fifteen years of JCICS George W Milne(1) ,bill@phm.com,5 Clarke Court, Williamsburg VA 23188,United States . (1) NCI, NIH (Retd),Williamsburg VA 23188,United States During the period 1989-2004 when I was Editor of the Journal of Chemical Information and Computer Sciences (JCICS), the predecessorof the Journal of Chemical Information and Modeling (JCIM), many papers appeared addressing contemporary problems in computational chemistry. Some of these problems were completely settled and significant progress was made with others. A third group,in spite of numerous publications, defied attempts at resolution and remain to this day as challenges to computational chemists. As JCIM, aka JCICS, aka J. Chem. Doc embarks upon its second 50 years, the progress recorded during the 1990s and the advances in computer hardware and software are reviewed. With a longer perspective, the impact of computers on chemistry is considered. CINF 39 Fifteen years in chemical informatics: Lessons from the past, ideas for the future Dimitris Agrafiotis(1) ,dagrafio@its.jnj.com, Welsh & McKean Roads, Spring House Pennsylvania 19477, United States . (1) Pharmaceutical Research & Development, Johnson & Johnson,Spring House Pennsylvania 19477,United States A unique aspect of chemical informatics is that it has been heavily influenced and shaped by the needs of the pharmaceutical industry. As this industry undergoes a profound transformation, so will the field itself. In this talk, we reflect on the experiences of the past and explore the possibilities we see for the future. These possibilities lie on the convergence of chemistry, biology, and information technology, and will require thinking and working across scientific and organizational boundaries in a way that has never been previously possible.
  • 38. Chemical Information Bulletin Vol. 62(3) Fall 2010 38 CINF 40 Applications of wavelets in virtual screening Val Gillet(1) ,v.gillet@sheffield.ac.uk, Regent Court, Sheffield, United Kingdom; Richard Martin(1) ;Eleanor Gardiner(1) ;Stefan Senger(2) .(1) Department of Information Studies,University of Sheffield, Sheffield S1 4DP, United Kingdom(2) Computational and Structural Chemistry, GlaxoSmithKline,Stevenage,Hertfordshire SG1 2NY, United Kingdom The interactions which a small molecule can make with a receptor can be modelled using three-dimensional molecular fields, such as GRID fields, however, the cumbersome nature of these fields makes their storage and comparison computationally expensive. Wavelets are a family of multiresolution signal analysis functions which have become widely used in data compression. We have applied the non-standard wavelet transform to generate low-resolution approximations (wavelet thumbnails) of finely sampled GRID fields, without loss of information. We demonstrate various applications of wavelet thumbnails including the development of an alignment method to enable the comparison of the wavelet representations of GRID fields in arbitrary orientation. CINF 41 Privileged substructures revisited: Target community- selective scaffolds Jürgen Bajorath(1) ,bajorath@bit.uni-bonn.de, Dahlmannstr. 2, Bonn NRW, Germany . (1) Department of Life Science Informatics, University of Bonn,Germany Molecular scaffolds that preferentially bind to a given target family, so-called “privileged” substructures, have long been of high interest in drug discovery. Many privileged substructures have been proposed, in particular, for G protein coupled receptors and protein kinases. However, the existence of truly privileged structural motifs has remained controversial. Frequency-based analysis has shown that many scaffolds thought to be target class-specific also occur in compounds active against other types of targets. In order to explore scaffold selectivity on a large scale, we have carried out a systematic survey of publicly available compound data and defined target communities on the basis of ligand- target networks. The analysis was based on compound potency data and target pair potency-derived selectivity. More than 200 hierarchical scaffolds were identified, each represented by at least five compounds, which exclusively bound to targets within one of ca. 20 target communities. By contrast, currently available compound data is too sparsely distributed to assign target-specific scaffolds. Most scaffolds that exclusively bind to a single target within a community are only represented by one or two compounds in public domain databases. However, characteristic selectivity patterns are found to evolve around community-selective scaffolds that can be explored to guide the design of target-selective compounds. CINF 42 Automated retrosynthetic analysis: An old flame rekindled Peter Johnson(1) , p.johnson@leeds.ac.uk,Woodhouse Lane, Leeds LS2 9JT, United Kingdom; Anthony P Cook(1) ;James Law(2) ;Mahdi Mirzazadeh(2) ;Aniko Simon(2) .(1) School ofChemistry, University of Leeds, Leeds LS2 9JT, United Kingdom(2) Simbiosys Inc, Toronto Ontario M9W 6V1, Canada The last century saw truly innovative research aimed at the creation of systems for computer aided organic synthesis design (CAOSD). However, such systems have not achieved significant useracceptance,perhaps because they required manual creation of reaction knowledge bases,a time consuming task which requires considerable synthetic chemistry expertise. More recent systems like ARChem1 circumvent this problem by automated abstraction of transformation rules from very large databases ofspecific examples of reactions. ARChem is still a work in progress and specific problems which are being addressed include: a) dentification of precise structural characteristics of each reaction, often requiring knowledge of reaction mechanism; b) treatment of interfering functional groups; c) minimising the combinatorial explosion inherent in automated multistep retrosynthesis; d)treatment of the results of extensive recent research into enantioselective and stereoselective reactions. 1 Law et al J. Chem. Inf. Model., 2009, 49 (3), pp 593– 602
  • 39. Chemical Information Bulletin Vol. 62(3) Fall 2010 39 CINF 43 Dietary supplements: Free evidence-based resources for the cautious consumer Brian Erb(1) ,rberb@unmc.edu, 986705 Nebraska Medical Center, Omaha Nebraska 68198-6705,United States. (1) McGoogan Library of Medicine,University of Nebraska Medical Center,Omaha NE 68198-6705, United States Vitamin, mineral and dietary supplements are a 70 billion dollar industry. With marginal FDA regulation, it can be difficult to evaluate the health claims of a given product. How can the skeptical consumer distinguish a promising nutritional supplement from a substance that lacks the evidence to back its nutritional claims? This short presentation will highlight some evidence-based Internet sources that will help the consumer navigate the dietary supplement minefield. These sources will not only help the consumer separate bogus claims from research supported evidence, but also help the consumer make informed nutritional decisions regarding which supplements might be a relevant and useful part of their healthy diet and lifestyle. The resources to be explored have been collected in a UNMC libguide at http://unmc.libguides.com/supplements for ease of navigation and dissemination. CINF 44 What lessons learned can we generalize from evaluation and usability of a health website designed for lower literacy consumers? Mary J Moore(1) ,mmoore@med.miami.edu, Calder Medical Library, PO Box 016950,Miami FL 33101, United States; Randolph G. Bias(2) .(1) Department of Health Informatics, University of Miami MillerSchool of Medicine,Miami FL 33136,United States(2) Department of Information, University of Texas at Austin, Austin Texas 78712,United States Objectives: Researchers conducted multifaceted usability testing and evaluation of a website designed for use by those with lower computer literacy and lower health literacy. Methods included heuristic evaluation by a usability engineer, remote usability testing and face-to- face testing. Results: Standard usability testing methods required modification, including interpreters, increased flexibility for time on task, presence of a trusted intermediary, and accommodation for family members who accompanied participants. Participants suggested website redesign, including simplified language, engaging and relevant graphics, culturally relevant examples, and clear navigation. Conclusions: User-centered design was especially important for this audience. Some lessons learned from this experience are echoed in usability and evaluation of commercial sites designed for similar audiences, and may be generalizable. CINF 45 National Library of Medicine resources for consumer health information Michelle Eberle(1) ,michelle.eberle@umassmed.edu, University of Massachusetts Medical School,222 Maple Avenue, Shrewsbury MA 01545,United States. (1) National Network ofLibraries of Medicine - New England,Shrewsbury MA 01545,United States Come learn about free, high quality web resources for consumer health information from the National Library of Medicine. We will cover MedlinePlus, a resource for health information for the public. The presenter will take you on a guided tour of http://medlineplus.gov and other specialized web resources for consumer health information including the Drug Information Portal, DailyMed and the Dietary Labels Supplement Database. The program will wrap up with a brief introduction to ClinicalTrials.gov. You will leave this program equipped with expertise to find, critically appraise, and use online health information more effectively. CINF 46 Better prescription for information: Dietary supplements online Gail Y. Hendler(1) ,gail.hendler@tufts.edu,145 Harrison Avenue, Boston MA 02111,United States . (1) Hirsh Health Sciences Library, Tufts University, Boston MA 02111,United States Dietary supplements are becoming staples in the health regimens of a growing number of consumers worldwide. According to the most recent National Health and Nutrition Examination Survey, 52% percent of adults in the United States reported taking a nutraceutical in the past month. Consumers turn to these products believing they are safe and effective because they are “all natural.” Supplementing knowledge about the benefits and the potential risks associated with nutraceutical use requires information resources that are authoritative, accurate and readable to a large and general audience. This presentation will provide recommendations for locating high-quality, freely available online resources that today's
  • 40. Chemical Information Bulletin Vol. 62(3) Fall 2010 40 consumers need to support decision-making. Featured resources will include books, databases and websites that discuss the pros and cons and provide the evidence for better use of dietary supplements, herbs and functional foods. CINF 47 Overview of the linking open drug data task Eric Prudhommeaux(1) , eric+ACS@w3.org, 32-525,32 Vassar St, Cambridge MA 02140,United States ; Egon Willighagen(2) ;Susie Stephens(3) .(1) ,W3C/MIT, Cambridge MA 02140,United States(2) Uppsala University, Uppsala, Sweden (3) , Johnson and Johnson, United States There is much interesting information about drugs that is available on the Web. Data sources range from medicinal chemistry results,to the impacts of drugs on gene expression, through to the results of drugs in clinical trials. Linking Open Drug Data (LODD) is a task within the W3C's Health Care Life Sciences Interest Group. LODD has surveyed publicly available data sets about drugs, created Linked Data representations of the data sets and interlinked them together,and identified interesting scientific and business questions that can be answered once the data sets are connected.The task also actively explores best practices for exposing data in a Linked Data representation.The figure below shows part of the data sets that have been published and interlinked by the task so far. The LODDse data sets are represented in dark gray, while light gray represents otherLinked Data from the life sciences,and white indicates data sets from different domains. Collectively, the LODD data sets consist ofover 8 million RDF triples, which are interlinked by more than 370,000 RDF links. This presentation will introduce the LODD task and showexamples of recent. CINF 48 Control, monitoring, analysis and dissemination of laboratory physical chemistry experiments using semantic web and broker technologies Jeremy G Frey(1) ,j.g.frey@soton.ac.uk, School of Chemistry, University of Southampton,Southampton Hants SO17 1BJ, United Kingdom; Stephen Wilson(1) .(1) School of Chemistry, University of Southampton, Southampton,Hants SO17 1BJ, United Kingdom A suite of software was developed to control and monitor experimental and environmental data and used for probing of the air/water interface using Second Harmonic Generation. A centralised message broker enabled a common communication protocol between all objects in the system; experimental apparatus, data loggers, storage solutions and displays. The data and context are captured and represented in ways compatible with the Semantic Web. Experimental plans and the enactment are described using the oreChem experiments ontology; this provides the means to capture the metadata associated with the experimental process and the resulting data. Environmental data was stored in the Open Geospatial Consortium Sensor Observation Service (SOS). The SOS is part of the Sensor Web Enablement architecture; this describes a number of interoperable interfaces and metadata encodings for integrating sensors webs into the cloud. A mashup web interface was produced to link all these sources of information from a single point. CINF 49 Semantic analysis of chemical patents Dave M Jessop(1) ,dmj30@cam.ac.uk; Lezan Hawizy(1) ; Robert C Glen(1) ;Peter Murray-Rust(1) .(1) The Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, United Kingdom Chemical patents are a rich source of technical and scientific information. They include meta-data, such as bibliographic information, as well as scientific data relating to reactions and synthesis experiments. However, they are lengthy, largely unstructured and rich in technical terminology such that it takes a signification amount of human efforts for analyses. This would make
  • 41. Chemical Information Bulletin Vol. 62(3) Fall 2010 41 them an ideal candidate for 'semantification'. As a emonstration, an RDF triplestore of chemical patents is created. The patents, provided by the European Patent Office, are in an XML format. Document segmentation is used initially to extract the relevant information, mainly bibliographic information and experimental paragraphs. The experimental paragraphs are then processed using Natural Language Processing tools to extract the various components of the chemical reaction; roles, such as reactant, product or solvent, are then assigned. This extracted information is then converted into RDF and stored in a triplestore where it can then be queried, visualised and basic inferences can be made.The ultimate goal of this semantic representation, is to make data available and re-usable by the scientific community. CINF 50 Data mining and querying of integrated chemical and biological information using Chem2Bio2RDF David J Wild(1) ;Bin Chen(1) ;Ying Ding(2) ;Xiao Dong(1) ; Huijun Wang(1) ;Dazhi Jiao(1) ;Qian Zhu(1) ;Madhuvanti Sankaranarayanan(1) .(1) School ofInformatics and Computing,Indiana University, Bloomington IN 47408, United States(2) School of Library and Information Science,Indiana University, Bloomington IN 47408, United States We have recently developed a freely-available resource called Chem2Bio2RDF (http://chem2bio2rdf.org) that consists of chemical, biological and chemogenomic datasets in a consistent RDF framework, along with SPARQL querying tools that have been extended to allow chemical structure and similarity searching. Chem2Bio2RDF allows integrated querying that crosses chemical and biological information including compounds, publications, drugs, genes, diseases, pathways and side-effects. It has been used for a variety of applications including investigation of compound polypharmacology, linking drug side-effects to pathways, and identifying potential multi-target pathway inhibitors. In the work reported here, we describe a new set of tools and methods that we have developed for querying and data mining in Chem2Bio2RDF, including: Linked Path Generation (a method for automatically identifying paths between datasets and generating SPARQL queries from these paths); an ontology for integrated chemical and biological information; a Cytoscape plugin that allows dynamic querying and network visualization of query results; and a facet-based browser for browsing results. CINF 51 Mining and visualizing chemical compound-specific chemical-gene/disease/pathway/literature relationships Qian Zhu(1) , qianzhu@indiana.edu,901 E 10th St., Bloomington IN 7408,United States ; Prajakta Purohit(2) ; Jong Youl Choi(2) ;Seung-Hee Bae(2) ;Judy Qiu(2) ;Ying Ding(3) ;David Wild(1) .(1) School of Informatics and Computing,Indiana University, Bloomington IN 47408, United States (2) Department of Computer Science, Indiana University, Bloomington IN 47408,United States (3) School of Library & Information Science, Indiana University, Bloomington IN 47408,United States In common with most scientific disciplines, there has in the last few years been a huge increase in the amount of publicly-available and proprietary information pertinent to drug discovery, owing to a variety of factors including improvements in experimental technologies. So the big challenge for us is how we can use all of this information together in an intelligent way, in an integrative fashion. We are developing an application to mine relationships between Chemical and Gene/Disease/Pathway/ Literature, and visualize them. It aims to help answer the question “anything else should I know about this compound?” from a medicinal chemistry perspective based on the full picture of chemicals. For the mining part, we have already developed an aggregating web services, named WENDI, which calls multiple individual or atomic, web services including diversity of compound-related data sources, predictive models and self-developed algorithms, and aggregates the results from these services in XML; For visualizing, two ways to go: First, we create a RDF reasoner to convert XML from WENDI to RDF, find inferred relationships based on RDF, rank evidences focused on chemical-disease, and print all evidences out by using SWP faceted browser based on Longwell http://simile.mit.edu/wiki/Longwell), it mixes the flexibility of the RDF data model with the faceted browser to enable users to browse complex RDF triples in a user-friendly and meaningful manner; Second, we place all relationships from WENDI into a chemical space consisted of 60M PubChem compounds, then clustered/highlighted particular chemical compounds with specific attributes, like gene/disease/pathway/ literature by using PubChemBrowse, which is a customized visualization tool for cheminformatics research and provides a novel 3D data point browser that displays complex properties of massive data on commodity clients and supports fast interaction with an external property database via semantic web interface.
  • 42. Chemical Information Bulletin Vol. 62(3) Fall 2010 42 CINF 52 What makes polyphenols good antioxidants? Alton Brown, you should take notes... Emilio Xavier Esposito(1) ,emilio.esposito@gmail.com, 1780 S Wilson Drive, Lake Forest Illinois60045,United States. (1) The Chem21 Group, Inc, Lake Forest Illinois 60045,United States The dominant physical feature of antioxidants are phenols; polyphenols according to Alton Brown. The proposed antioxidant-tyrosinase mechanism, based on a series of experimentally determined mushroom tyrosinase structures, provides insight to the molecular interactions that drive the reaction. While the enzyme structures illustrate the important molecular interactions for tyrosinase inhibition, the enzyme structures do not always facilitate the understanding of what makes a good inhibitor or the mechanism of the reaction. Using an antioxidant (tyrosinase inhibitors) dataset of 626 compounds (from the linear discriminate analysis research of Martín et al. Euro J Med Chem 42 p1370-1381, 2007) we constructed binary QSAR models to indicate the important antioxidant molecular features. Exploring models constructed from molecular descriptors based on fingerprints (MACCS keys), traditional molecular descriptors (2D and 2½D), VolSurf-like molecular descriptors (3D) and molecular dynamics (4D- Fingerprints), the relationship between polyphenols' biologically relevant molecular features – as determined by each set of descriptors – and their antioxidant abilities will be discussed. INF 53 Engineering and 3D protein-ligand interaction scaling of 2D fingerprints Jürgen Bajorath(1) ,bajorath@bit.uni-bonn.de, Dahlmannstraße 2, Bonn NRW 53113,Germany . (1) Department of Life Science Informatics, University of Bonn, Bonn 53113,Germany Different concepts are introduced to further refine and advance molecular descriptors for SAR analysis. Fingerprints have long been among preferred descriptors for similarity searching and SAR studies. Standard fingerprints typically have a constant bit string format and are used as individual database search tools. However, by applying “engineering” techniques such as “bit silencing”, fingerprint reduction, and “recombination”, standard fingerprints can be tuned in a compound class-directed manner and converted into size-reduced versions with higher search performance. It is also possible to combine preferred bit segments from fingerprints of distinct design and generate “hybrids” that exceed the search performance of their parental fingerprints. Furthermore, effective 2D fingerprint representations can be generated from strongly interacting parts of ligands in complex crystal structures. These “interacting fragment” fingerprints focus search calculations on pharmacophore elements without the need to encode interactions directly. Moreover, 3D protein-ligand interaction information can implicitly be taken into account in 2D similarity searching through fingerprint scaling techniques that emphasize characteristic bit patterns. CINF 54 In silicobinary QSAR models based on 4D-fingerprints and MOE descriptors for prediction of hERG blockage Y. Jane Tseng(1) ,yjtseng@csie.ntu.edu.tw,No.1 Sec.4, Roosevelt Road, Taipei Taiwan 106,Taiwan Republic of China . (1) Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 106, Taiwan Republic of China Blockage of the human ether-a-go-go related gene (hERG) potassium ion channel is a major factor related to cardiotoxicity. Hence, drugs binding to this channel have become an important biological endpoint in side effects screening. We have collected all available biologically active hERG compounds from the hERG literature for a total of 250 structurally diverse compounds. This data set was used to construct a set of two-state hERG QSAR models. The descriptor pool used to construct the models consisted of 4D-fingerprints generated from the thermodynamic distribution of conformer states available to a molecule, 204 traditional 2D descriptors and 76 3D VolSurf-like descriptors computed using the Molecular Operating Environment (MOE) software. One model is a continuous partial least squares (PLS) QSAR hERG binding model. Another related model is an optimized binary QSAR model that classifies compounds as active, or inactive. This binary model achieves 91% accuracy over a large range of molecular diversity spanning the training set. An external test set was constructed from the condensed PubChem bioassay database containing 816 compounds and successfully used to validate the binary model. The binary QSAR model permits a structural interpretation of possible sources for hERG activity. In particular, the presence of a polar negative group at a distance of 6 to 8 Å from a hydrogen bond donor in a compound is predicted to be a quite structure-specific pharmacophore that increases hERG blockage. Since a data set of high
  • 43. Chemical Information Bulletin Vol. 62(3) Fall 2010 43 chemical diversity was used to construct the binary model, it is applicable for performing general virtual hERG screening. CINF 55 Telling the good from the bad and the ugly: The challenge of evaluating pharmacophore model performance Robert D. Clark(1) ,bclark@bcmetrics.com,42505 10th Street West,, Lancaster California 93534,United States . (1) SimulationsPlus, Inc., LancasterCalifornia 93534, United States Pharmacophore models are useful when they provide qualitative insight into the interactions between ligands and their target macromolecules, and therefore are more akin in many ways to molecular simulations than to quantitative structure activity relationships (QSARs) based on the partition of activity across a set of molecular descriptors. When the performance of a pharmacophore model is assessed quantitatively, it is usually in terms of its ability to recover known ligands or, less often, in terms of how well it distinguishes ligands from non-ligands. This status as a classification technique also sets it apart from more numerical QSAR methods, in part because of fundamental differences in what being "good" means. Carefully defining what "good" classification is, however, can make creative combination with other techniques a productive way to capture the value of their intrinsic complementarity. CINF 56 Creative application of ligand-based methods to solve structure-based problems: Using QSAR approaches to learn from protein crystal structures Curt M Breneman(1) ,brenec@rpi.edu, 110 8th St, Troy NY 12180,United States ; Sourav Das(1) ; Mike Krein(1) ; Steven Cramer(2) ; Kristin P Bennett(3) ;Charles Bergeron(3) ;Jed Zaretzki(1) ;Matt Sundling(1) .(1) Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute,Troy NY 12180,United States(2) Department of Chemical and Biological Engineering,RensselaerPolytechnic Institute,Troy NY 12180,United States(3) Department of Mathematical Sciences, RensselaerPolytechnic Institute,Troy NY 12180,United States In practice, there is no inherent disconnect between the descriptor-based cheminformatics methods commonly used for predicting small molecule properties and those that can be used to understand and predict protein behaviors. Examples of such connections include the development of predictive models of protein/stationary phase binding in HIC and ion-exchange chromatography, protein/ligand binding mode characterization through PROLICSS analysis of crystal structures, and the use of PESD binding site signatures for pose scoring and predicting off-target drug interactions. In all of these cases, models were created using descriptors based on protein electronic and structural features and modern machine learning methods that include model validation tools and domain of applicability assessment metrics.
  • 44. Chemical Information Bulletin Vol. 62(3) Fall 2010 44 CINF 57 Computer-aided drug discovery WilliamL Jorgensen(1) ,william.jorgensen@yale.edu,225 Prospect Street, New Haven CT 06520-8107,United States. (1) Department of Chemistry, Yale University, New Haven CT 06520-8107,United States Drug development is being pursued through computer- aided structure-based design.For de novo lead generation, the BOMB program builds combinatorial libraries in a protein binding site using a selected core and substituents, and QikProp is applied to filter all designed molecules to ensure that they have drug-like properties. Monte Carlo/free-energy perturbation simulations are then executed to refine the predictions for the best scoring leads including ca. 1000 explicit water molecules and extensive sampling for the protein and ligand. FEP calculations for optimization of substituents on an aromatic ring and for choice of heterocycles are now common. Alternatively, docking with Glide is performed with the large databases of purchasable compounds to provide leads, which are then optimized via the FEP- guided route. Successful application has been achieved for HIV reverse transcriptase, FGFR1 kinase, and macrophage migration inhibitory factor (MIF); micromolar leads have been rapidly advanced to extraordinarily potent inhibitors. CINF 58 Structure-based discovery and QSAR methods: A marriage of convenience Jose S Duca(1) .(1) Novartis,Cambridge MA 02139, United States The art of building predictive models of the relationships between structuraldescriptors and molecular properties has been historically important to drug design.In the recent years there has been an extraordinary amount of experimental data available from processes designed to accelerate drug discovery in pharma; from high throughput screening and automation applied to library design and synthesis to chemogenomics and microarray analysis. QSAR methods are one of the many tools to predict affinity-related, physicochemical, pharmacokinetic and toxicological properties through analyzing and extracting information from molecular databases and HTS campaigns. This presentation will cover case studies in which QSAR and Structure-Based Drug Design (SBDD) have worked in concert during the discovery process of pre-clinical candidates.The importance of incorporating time- dependent sampling to improve the quality of the nD- QSAR models (n=3,4) will also be discussed and compared to simplified low dimensional QSAR models. For those cases where structuralinformation cannot be readily available an extension of these methodologies will be discussed in relation to ligand-based approaches. CINF 59 Extending the QSAR Paradigm using molecular modeling and simulation Anton J Hopfinger(1)(2) ,hopfingr@unm.edu,2502 Marble NE, Albuquerque NM 87131-0001,United States. (1) College of Pharmacy, MSC 09 5360,University of New Mexico, Albuquerque NM 87131-0001,United States(2) Computational Chemistry,The Chem21 Group, Inc., Albuquerque NM 87131-0001,United States QSAR analysis and molecular modeling/ simulation methods are often complementary, and when combined in a study yield results greater than the sum of their parts. Modeling and simulation offer the ability to design custom, information-rich trial descriptors for a QSAR analysis. In turn, QSAR analysis is able to discern which of the custom descriptors most fully relate to the behavior of an endpoint of interest. One useful set of custom QSAR descriptors from modeling and simulation for describing ligand-receptor interactions are the grid cell occupancy descriptors, GCODs, of 4D-QSAR analysis. These descriptors characterize the relative spatial occupancy of all the atoms of a molecule over the set of conformations available to the molecule when in a particular environment. GCODS permit the construction of a 4D-QSAR equation for virtual screening, as well as a spatial pharmacophore of the 4D-QSAR equation for exploring mechanistic insight. Applications that can particularly benefit from combining QSAR analysis and modeling/simulation tools are those in which a model chemical system is needed to determine the sought after property. One such application is the transport of molecules through biological compartments, an integral part of many ADMET properties. The reliable estimation of eye irritation is greatly enhanced by simulating the transport of test solutes through membrane bilayers, and using extracted properties from the simulation trajectories as custom descriptors to build eye irritation QSAR models. These key descriptors of the QSAR models, in turn, also permit the investigator to probe and postulate detailed molecular mechanisms of action.
  • 45. Chemical Information Bulletin Vol. 62(3) Fall 2010 45 CINF 60 Overview of activity landscapes and activity cliffs: Prospects and problems Gerald M Maggiora(1)(2) ,gerry.maggiora@gmail.com, 1703 E. Mabel St., Tucson AZ 85721,United States . (1) Department of Pharmacology & Toxicology,University of Arizona College of Pharmacy, Tucson AZ 85721,United States(2) BIO5 Institute, University of Arizona, Tucson AZ 85721,United States Substantial growth in the size and diversity of compound collections and the capability to subject them to an increasing variety of different high-throughput assays manifests the need for a more systematic and global view of structure-activity relationships. The concepts of chemical space and molecular similarity, which are now well known to the drug-research community, provide a suitable framework for developing such a view. Augmenting a chemical space with activity data from various assays generates a set of activity landscapes, one for each assay. The topography of these landscapes contains important information on the structure-activity relationships of compounds that inhabit the chemical space. Activity cliffs, which arise when similar compounds possess widely different activities, are a particularly informative feature of activity landscapes with respect to SAR. The talk will present an overview of activity landscapes and cliffs and will describe some of the prospects and problems associated with these important concepts. CINF 61 Exploring and exploiting the potential of structure- activity cliffs Michael S Lajiness(1) ,lajinessms@lilly.com, 1 Corporate Center, DC:1930, IndianapolisIndiana 46285,United States; Gerald M Maggiora(2) .(1) Scientific Informatics, Eli Lilly & Co, IndianapolisIN Indiana,United States (2) Department of Pharmacology & Toxicology,University of Arizona College of Pharmacy, Tucson Arizona 85721- 0207,United States It's well known that small structural changes sometimes result in large changes in activity. There have been some recent efforts to identify such changes but little in regards to defining which structural changes are most informative or even real. Also, the missing value problem often obfuscates the ability to detect relevant patterns if in fact they exist. This presentation will present several ideas and applications for exploring and exploiting Structure- Activity Cliffs. In addition, various visualizations and approaches to communicate the information contained in these "cliffs" will be shared. Examples will be drawn from PubChem. CINF 62 What makes a good structure activity landscape? Network metrics and structure representations as a way of exploring activity landscapes Rajarshi Guha(1) , guhar@mail.nih.gov,9800 Medical Center Drive, Rockville MD 20852,United States . (1) Department of Informatics, NIH Chemical Genomics Center, Rockville MD 20852,United States The representation of SAR data in the formof landscapes and the identification of activity cliffs in such landscapes is well known. A number of approaches have been described to identifying activity cliffs, including several network based methods such as the SALI approach (JCIM, 2008, 48, 646-658). While a network representation of an SAR landscape moves away fromthe intuitive idea of rolling hills and steep gorges, it allows us to apply a variety of quantitative analyses. In this talk I will first examine some of the properties of SALI networks using various measures of network structures and attempt to correlate these features with features of the SAR data. While most examples are from relatively small datasets I will highlight some examples from larger datasets from high-throughput screens. While such data can be noisy and contain artifacts I will examine whether the underlying network structure can shed light on specific molecules that may be worth following up. The second focus of the talk will look at the effect of structure representations on the smoothness of the landscape and how one can derive ideas from the SALI characterization to suggest good or bad landscapes. CINF 63 Consensus model of activity landscapes and consensus activity cliffs Jose L Medina-Franco(1) ,jmedina@tpims.org, 11350 SW Village Parkway,Port St Lucie FL 34987,United States; Karina Martinez-Mayorga(1) ;Fabian Lopez-Vallejo(1) .(1) Torrey Pines Institute for Molecular Studies,Port St Lucie FL 34987,United States Characterization of activity landscapes is a valuable tool in lead optimization, virtual screening and computational modeling of active compounds. As such understanding the activity landscape and early detection of activity cliffs
  • 46. Chemical Information Bulletin Vol. 62(3) Fall 2010 46 [Maggiora, G. M. J. Chem. Inf. Model. 2006, 46, 1535] can be crucial to the success of computational models. Similarly, characterizing the activity landscape will be critical in future ligand-based virtual screening campaigns. However, the chemical space and activity landscape are influenced by the particular representation used and certain representations may lead to apparent activity cliffs. A strategy to address this problem is to consider multiple molecular representations in order to derive a consensus model for the activity landscape and in particular identify consensus activity cliffs [Medina- Franco, J. L. et al. J. Chem. Inf. Model. 2009, 49, 477]. The current approach can be extended to indentify consensus selectivity cliffs. CINF 64 R-Cliffs: Activity cliffs within a single analog series Dimitris Agrafiotis(1) ,dagrafio@its.jnj.com, Welsh & McKean Roads, Spring House Pennsylvania 19477, United States. (1) Pharmaceutical Research & Development, Johnson & Johnson,Spring House Pennsylvania 19477,United States The concept of activity cliffs has gained popularity as a means to identify and understand discontinuous SAR, i.e., regions of SAR where minor changes in structure have unpredictably large effects on biological activity. To the best of our knowledge, activity cliffs have been invariably evaluated using global measures of molecular similarity that do not take into account the presence of finer substructure among a series of related analogs. In this talk, we look at activity cliffs within a congeneric series, by decomposing them into R-groups and analyzing how activity is affected by changes in a single variation site. The analysis is greatly enhanced by R-group-aware visualization tools such as the SAR maps, which have been enhanced to specifically highlight such discontinuities. CINF 65 Chemical structure representation in the DuPont Chemical Information Management Solutions database: Challenges posed by complex materials in a diversified science company Mark A Andrews(1) ,Mark.A.Andrews@usa.dupont.com, Experimental Station,E320/107,PO Box 80320, Wilmington DE 19803-0320,United States; Edward S. Wilks(1) .(1) CR&D, Information & Computing Technologies,DuPont, Wilmington DE 19803,United States This talk will describe the novel ways we have developed to represent precisely the structures of the diverse chemical materials of interest to DuPont. These range from simple organics and inorganics to polymers, mixtures, formulations, multi-layer films, composites, and even devices and incompletely defined substances. Part of the solution involves evaluating trade-offs, which may be situation dependent, between details captured in the structure vs. details captured at the sample history level, e.g., ratios of components, polymer molecular weights and microstructures, and the existence of “fairy dust” components. An important aspect of the solution involves ensuring robust structure standardization and duplicate checking for complex and ill-defined substances. We believe that our needs and solutions have challenged and inspired a number of chemical software vendors to provide significant upgrades to the functionalities of their drawing packages and database cartridges. CINF 66 From deposition to application: Technologies for storing and exploiting crystal structure data Colin R Groom(1) ,edir@ccdc.cam.ac.uk, 12 Union Road, Cambridge Cambridgeshire CB2 1EZ, United Kingdom; Jason Cole(1) ;Simon Bowden(1) ;Tjelvar Olsson(1) .(1) Cambridge Crystallographic Data Centre, United Kingdom In December 2009 The Cambridge Crystallographic Data Centre (CCDC) archived the 500,000th small-molecule crystal structure to the Cambridge Structural Database (CSD). The passing ofthis milestone highlights the rate of growth of the CSD in recent years and the continuing challenges this represents in terms of information storage and exchange. This talk will describe the development of a number of tools for the processing,validation, and storage of crystal structure data. Recent developments that will aid this growing body of structural knowledge to be exploited in a range of applications and the provision of additional services that can assist the scientific community will also be illustrated.
  • 47. Chemical Information Bulletin Vol. 62(3) Fall 2010 47 CINF 67 Recent IUPAC recommendations for chemical structure representation: An overview Jonathan Brecher(1) ,jsb@cambridgesoft.com, 100 CambridgePark Drive, Cambridge MA 02140, United States. (1) CambridgeSoft Corporation,Cambridge MA 02140,United States Accurate and unambiguous depiction of chemical information is a key step in communicating that information. Such depiction is equally important whether the intended audience is a human chemist (as in a journal article or patent) or a computer (as in a chemical registration system). Recent IUPAC publications provide chemists a practical guide for producing chemical structure diagrams that accurately convey the author's intended meaning. A summary of those recommendations will be presented. As part of that summary, common pitfalls in producing chemical structure diagrams will be discussed. Solutions to those pitfalls will also be described, with an emphasis on solutions that are simple, straightforward, and accessible to the majority of practicing chemists. CINF 68 Orbital development kit Egon L. Willighagen(1) , egon.willighagen@farmbio.uu.se,Box 591,Uppsala Uppland SE-75124,Sweden . (1) Department of Pharmaceutical Biosciences,Uppsala University, Uppsala, Sweden Understanding properties of molecular structures requires a computer representation, and quantum mechanical and chemical graph representations have been used abundantly. Own have found their own areas of application in chemistry, and their fields are best described as theoretical chemistry and cheminformatics, respectively. The Orbital Development Kit (ODK) positions itself in-between these two representations, though closest to chemical graph theory, and addressing shortcomings of the latter. In particular, it replaces coloring of the nodes and edges in the chemical graph with atom hybridization and bond order explicit, making the representation more precise in how it represents geometrical features of the molecule. The ODK does so by replacing the atom as single node in the chemical graph by a central atomic core surrounded by valence orbitals, possible hybridized. Using this approach, the definition of an atom type is reformulated as a core element with a particular and well-defined set of identifiable orbitals with an implied, though relative, geometrical orientation. Bonding is now the connection of two orbitals, and a lone pair becomes a single orbital, and is therefore directional too. This approach means that the classical double bond in ethene is now represented by one sigma bonding between two sp2 orbitals of the two carbons, and one bonding of their two pz orbitals. This ODK representation leaves also room for representations beyond the chemical graph, such as proposed by Dietz in 1995: more than two orbitals can be combined into set to represent delocalization. The presentation will present the ODK data model, serialization and deserialization into a Resource Description Framework-based file format, and a bridge to the Chemistry Development Kit, for visualization and molecular property calculation. CINF 69 Line notations as unique identifiers Krisztina Boda(1) ,krisztina@eyesopen.com,9 Bisbee Court, Suite D, Santa Fe New Mexico 87508,United States . (1) OpenEye Scientific Software, Santa Fe New Mexico 87508,United States A wide variety of structure representation formats have been devised to encode molecular information in order to register, store and manipulate molecules in silico. One class of these formats, called line notations,is designed to express molecules as compact, unambiguous strings that can be used as unique identifiers for compound registration eliminating the computationally more expensive graph matching. The presentation will provide an overview of popular line notations,such as canonical SMILES, isomeric SMILES, and InChI, discussing their merits and shortcomings in regards to using them as robust lossless unique identifiers. We will present results of testing a variety of line notations on a diverse set of 10M compounds generated by combining organic and inorganic vendordatabases. We will also examine the information loss of various molecular normalization procedures with regard to line notation generation.
  • 48. Chemical Information Bulletin Vol. 62(3) Fall 2010 48 CINF 70 Analysis of activity landscapes, activity cliffs, and selectivity cliffs Jürgen Bajorath(1) ,bajorath@bit.uni-bonn.de, Dahlmannstr. 2, Bonn NRW 53113,Germany . (1) Department of Life Science Informatics, University of Bonn, Germany The concept of activity landscapes (ALs) is of fundamental importance for the exploration of structure- activity relationships (SARs). ALs are best rationalized as biological activity hypersurfaces in chemical space. When reduced to three dimensions, ALs display characteristic topologies that determine the SAR behavior of compound sets. Prominent features of ALs are activity cliffs that are formed by structurally similar compounds having large potency differences, giving rise to SAR discontinuity. ALs and activity cliffs can be analyzed in different ways including similarity-potency diagrams, approximate three- dimensional landscape representations, or molecular networks integrating compound similarity and potency information. Annotated similarity-based compound networks that incorporate results of numerical SAR analysis functions, termed Network-like Similarity Graphs (NSGs) are designed to explore relationships between global and local SAR features in compound data sets of any source. For collections of analogs, substitution patterns that introduce activity cliffs are identified in Combinatorial Analog Graphs (CAGs) that make it also possible to study additive and non-additive effects of compound modifications. Activity cliffs identified in CAGs can frequently be rationalized on the basis of complex crystal structures. When studying multi-target SARs using the NSG framework, the concept of activity cliffs can be extended to selectivity cliffs, i.e. similar compounds having significant differences in target selectivity. CINF 71 Using Activity Cliff Information in structure-based design approaches Birte Seebeck(1) ,seebeck@zbh.uni-hamburg.de, Bundesstr. 43, Hamburg D-20146, Germany ; Matthias Rarey(1) ,rarey@zbh.uni-hamburg.de,Bundesstr. 43, Hamburg D-20146, Germany ; MarkusWagener(2) .(1) Center for Bioinformatics(ZBH), University of Hamburg, Hamburg D-20146, Germany (2) MolecularDesign and Informatics, MSD, Oss, The Netherlands Activity cliffs are often the pitfall of QSAR modeling techniques, but at the same time they exhibit key features of a SAR. Based on the principles of the structure-activity landscape index (SALI) [1], here we present an approach to use the valuable information of activity cliffs in a structure-based design scenario, analyzing key interactions between protein-ligand complexes in activity cliff events. We visualize those interaction “hot spots” directly in the active site of target proteins. In addition, we use the activity cliff information to derive target- specific scoring models and pharmacophoric hypothesis, which are validated in enrichment experiments on independent external test sets. The results show an improved enrichment in comparison to the standard score for various protein targets. 1. Guha R. and Van Drie J.H., J. Chem. Inf. Model., 2008, 48, 646-658. CINF 72 Exploring activity cliffs using large scale semantic analysis of PubChem David J Wild(1) ,djwild@indiana.edu,901 E.10th St, Bloomington IN 47408, United States ; Bin Chen(1) ;Qian Zhu(1) .(1) School of Informatics and Computing,Indiana University, Bloomington IN 47408,United States Identification of Activity Cliffs, defined as the ratio of the difference in activity of two compounds to their “distance” of separation in a given chemical space [1], has been established as important in the creation of robust quantitative-structure activity relationship models. Previously, a method, SALI, for identifying and visualizing these activity cliffs was developed at Indiana University, and applied successfully to several established QSAR datasets [2]. In the work reported here, we have extended this work in two ways. First, we have used structure and activity data from the public PubChem BioAssay dataset to evaluate the method on a much larger scale, and second, we have integrated it with a project called Chem2Bio2RDF to look not just for activity cliffs based on reported assay values, but also on computationally established relationships between compounds and genes and diseases. We thus propose an extended application of SALI which can be used in a systems chemical biology and chemogenomic context. [1] J. Chem. Inf. Model., 2006, 46 (4), p 1535 [2] J. Chem. Inf. Model., 2008, 48 (3), pp 646–658
  • 49. Chemical Information Bulletin Vol. 62(3) Fall 2010 49 CINF 73 Quantifying the usefulness of a model of a structure- activity relationship: The SALI Curve Integral John H Van Drie(1) ,john@vandrieresearch.com,34 Stinson Rd, Andover MA 01810,United States; Rajarshi Guha(2) .(1) R&D, Van Drie Research LLC, Andover MA 01810,United States(2) Chemical Genomics Center, NIH, Bethesda MA 20892,United States In 2008, in two papers Guha and Van Drie introduced the notion of structure-activity landscape index(SALI) curves as a way to assess a model and a modeling protocol, applied to structure-activity relationships. The starting point is to study a structure-activity relationship pairwise, based on the notion of "activity cliffs"--pairs of molecules that are structurally similar but have large differences in activity. The basic idea behind the “SALI Curve” is to tally how many of these pairwise orderings a model is able to predict. Empirically, testing these SALI curves against a variety of models, ranging over structure-based and non-structure-based models, the utility of a model seems to correspond to characteristics of these curves. In particular, the integral of these curves, denoted as SCI and being a number ranging from -1.0 to 1.0, approaches a value of 1.0 for two literature models, which are both known to be prospectively useful. CINF 74 Status of the InChI and InChIKey algorithms Stephen Heller(1) ,steve@hellers.com, 100 Bureau Drive, Bldg. 221/ A111, Gaithersburg MD 20899-8320,United States. (1) CBRD, MS - 8320, NIST, Gaithersburg MD 20899-8320,United States The Open Source chemical structure representation standard,the IUPAC InChI/InChIKey project, has evolved considerably in the past two years.The project is now being supported and widely used by virtually all major publishers of chemical journals, databases,and structure drawing and related software. This usage of the InChI/InChIKey in their products enable them to link information between their products and other (fee-free and fee-based) chemical information available on the world wide web via the Internet. These organizations are now providing for a stable and financially viable structure to the project. This is enabling the world-wide chemistry community to expand its use of the InChI knowing that this freely available Open Source algorithm will be widely accepted and used of as a mainstream standard.The mission of the Trust is quite simple and limited; its sole purpose is to create and support administratively and financially a scientifically robust and comprehensive InChI algorithm and related standards and protocols. This presentation will describe the current technical state of the InChI and InChIKey algorithms. CINF 75 Self-contained sequence representation (SCSR): Bridging the gap between bioinformatics and cheminformatics Keith T Taylor(1) ,keith.taylor@symyx.com, 2440 Camino Ramon, San Ramon CA 94583,United States; William L Chen(1) ;Brad D Christie(1) ;Joe L Durant(1) ;David L Grier(1) ; Burt A Leland(1) ;Jim G Nourse(1) .(1) Symyx Technologies Inc, San Ramon CA 94583,United States In this paper we will discuss the benefits and disadvantages ofthe current approaches for storing biological sequence information. We have developed a hybrid representation that uses the compactness of the sequence,togetherwith the detail of chemical connectivity information for modified regions. It represents standard residues with substructure.All instances of the same residue are represented by a single template. This hybrid approach is compact and scalable. We have developed a converterthat takes a UniProt format file extracts the sequence information and derives the modifications producing an SCSR record. The SCSR is encoded as a molfile and registered into a Symyx Direct database.Duplicate checking, exact matching – with and without the modifications –molecular weight calculation and substructure searching are all available with these structures. We are using this representation for peptides, oligonucleotides, and we are now extending it to oligosaccharides. Non-natural residues can be included in an SCSR.
  • 50. Chemical Information Bulletin Vol. 62(3) Fall 2010 50 CINF 76 Representation of Markush structures: From molecules toward patents Szabolcs Csepregi(1) ,scsepregi@chemaxon.com, Máramaros köz3/a, Budapest - 1037,Hungary ; Nóra Máté(1) ;Róbert Wágner(1) ;Tamás Csizmazia(1) ;Szilárd Dóránt(1) ;Erika Bíró(1) ;Tim Dudgeon(1) ;Ali Baharev(1) ; Ferenc Csizmadia(1) .(1) ChemAxon Ltd., Budapest 1037, Hungary Cheminformatics systems usually focus primarily on handling specific molecules and reactions. However, Markush structures are also indispensable in various areas, like combinatorial library design or chemical patent applications for the description of compound classes. The presentation will discuss how an existing molecule drawing tool (Marvin) and chemical database engine (JChem Base/Cartridge) are extended to handle generic features (R-group definitions, atom and bond lists, link nodes and larger repeating units, position and homology variation). Markush structures can be drawn and visualized in the Marvin sketcher and viewer, registered in JChem databases and their library space is searchable without the enumeration of library members. Different enumeration methods allow the analysis of Markush structures and their enumerated libraries. These methods include full, partial and random enumerations as well as calculation of the library size. Furthermore, unique visualization techniques will be demonstrated on real-life examples that illustrate the relationship between Markush structures and the chemical structures contained in their libraries (involving substructures and enumerated structures). Special attention will be given to file formats and how they were extended to hold generic features. CINF 77 CSRML: A new markup language definition for chemical substructure representation Christof H. Schwab(1) ,schwab@molecular- networks.com,Henkestrasse 91, Erlangen Bavaria 91052, Germany ; Bruno Bienfait(1) ;Johann Gasteiger(1) ; Thomas Kleinoeder(1) ;Joerg Marucszyk(1) ;Oliver Sacher(1) ;Aleksey Tarkhov(1) ;LotharTerfloth(1) ;Chihae Yang(2) .(1) MolecularNetworksGmbH, Erlangen, Bavaria 91052,Germany (2) Altamira LLC, Columbus Ohio 43235,United States Although chemical subgraphs orsubstructures are quite popular and used since a long time in chemoinformatics, the existing and well established standards still have some limitations. In general, these standards are suited even for complex substructure queries,however, showsome insufficiences, e.g., for the inclusion of physicochemical properties or annotation of meta information. In addition, the existing standards are not fully interconvertible and specify no validation techniques to check the semantic correctness of a query definition. This paper proposes an approach for the representation of chemical subgraphs that aims to overcome the limitations of existing standards.The approach presents a well- structured,XML-based standard specification, the Chemical Subgraph Representation Markup Language (CSRML), that supports a flexible annotation mechanism of meta information and properties at each level of a substructure as well as user-defined extensions. Furthermore, the specification foresees a mandatory inclusion and use of test cases.In addition, it can be used as an exchange format. CINF 78 Prediction of solvent physical properties using the hierarchical clustering method Todd M Martin(1) ,martin.todd@epa.gov,26 W Martin Luther King. Dr. MS 419, Cincinnati OH 45268,United States ; Douglas M Young(1) .(1) National Risk Management Research Laboratory,Environmental Protection Agency, Cincinnati OH 45268,United States Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including surface tension and the normal boiling point. The hierarchical clustering method divides a chemical dataset into a series of clusters containing similar compounds (in terms of their 2D molecular descriptors). Multilinear regression models are fit to each cluster. The toxicity or property is estimated using the prediction value from several different cluster models. The physical properties are estimated using 2D molecular structure only (i.e. w/o the use of critical constants). The hierarchical clustering methodology was able to achieve excellent predictions for the external prediction sets. A freely available software tool to estimate toxicity and physical properties has been developed. The software tool is based on the open source Chemistry Development Kit (written in Java).
  • 51. Chemical Information Bulletin Vol. 62(3) Fall 2010 51 CINF 79 Scaffold diversity analysis using scaffold retrieval curves and an entropy-based measure Jose L Medina-Franco(1) ,jmedina@tpims.org, 11350 SW Village Parkway,Port St Lucie FL 34987,United States ; Karina Martinez-Mayorga(1) ;AndreasBender(2) ;Thomas Scior(3) .(1) Torrey Pines Institute for Molecular Studies, Port St. Lucie FL 34987,United States(2) Leiden University, Leiden 2333,The Netherlands(3) Benemerita Universidad Autonoma de Puebla, Puebla 72570,Mexico Scaffold diversity analysis of compound collections has several applications in medicinal chemistry and drug discovery. Applications include, but are not limited to, library design, compounds acquisition and assessment of structure-activity relationships. The scaffold diversity is commonly measured based on frequency counts. Scaffold retrieval curves are also employed. Further information can be obtained by considering the specific distribution of the molecules in those scaffolds. To this end, we present an entropy-based information metric to assess the scaffold diversity of compound databases [Medina-Franco, J. L. et al. QSAR Comb. Sci. 2009, 28, 1551]. The entropy-based information metric takes into account the frequency distribution of the different scaffolds and is a complementary measure of scaffold diversity enabling a more comprehensive analysis. CINF 80 Nonsubjective clustering scheme for multiconformer databases Austin B. Yongye(1) , ayongye@tpims.org,11350 SW Village Parkway,Port St Lucie FL 34987,United States ; Andreas Bender(2) ;Karina Martinez-Mayorga(1) .(1) Torrey Pines Institute for MolecularStudies,Port St Lucie FL 34987,United States(2) Medicinal Chemistry Division and Pharma-IT Platform, Leiden/Amsterdam Center for Drug Research, Leiden University, Leiden 2333,The Netherlands Representing the 3D-structures of ligands in virtual screenings via multi-conformer ensembles can be computationally intensive, especially for compounds with a large number of rotatable bonds. While clustering and RMSD filtering methods are employed in existing conformer generators, the novelty of this work is the inclusion of a non-subjective clustering scheme. This algorithm simultaneously optimizes the number and the average spread of the clusters. Using this method 10 times less conformers per compound were obtained on averaged and performed as well as OMEGA. Furthermore, we propose thresholds for root-mean square filtering depending on the number of rotors in a compound: 0.8, 1.0 and 1.4 for structures with low (1-4), medium (5-9) and high (10-15) numbers of rotatable bonds, respectively. The protocol employed is general and can be applied to reduce the number of conformers in multi- conformer compound collections and alleviate the complexity of downstream data processing in virtual screening experiments. CINF 81 Finding drug discovery "rules of thumb" with bump hunting Tatsunori Hashimoto(1) ,thashim@fas.harvard.edu, 233 Adams house mail center, cambridge ma 02138,United States ; Matthew Segall(2) .(1) Department of Statistics, Harvard University, Cambridge MA 02138,United States (2) Optibrium, Cambrdige CB25 9TL, United Kingdom Rules-of-thumb for evaluating potential drug molecules, such as Lipinski's Rule of Five, are commonly used because they are easy to understand and translate into practice. These rules have traditionally been constructed by observation or by following simple statistical analysis. However, application of these techniques to QSAR models or early screening data often ignores the underlying statistical structure. Conversely, when machine learning algorithms are used to classify 'drug- like' molecules, they often result in black-box classifiers that cannot be modified to suit a particular target drug profile. We propose a novel hybrid approach to constructing rules-of-thumb from existing data to match a given target product profile for any therapeutic objective. These rules are easily interpretable and can be rapidly modified to reflect expert opinions before application. CINF 82 Machine learning in discovery research: Polypharmacology predictions as a use case Nikil Wale(1) , nikhilwale@yahoo.com, 200/3055 Groton Laboratories, Eastern Point Road, Groton CT 06340, United States ; Kevin McConnell(1) ; Eric M Gifford(1) . (1) Computational Sciences Center of Emphasis, Pfizer Inc, Groton CT 06340, United States In this talk I will lay out the increasing role of machine learning technology in discovery research at Pfizer. Specifically, I will talk about how algorithms and methods inspired by (Machine) Learning Theory are
  • 52. Chemical Information Bulletin Vol. 62(3) Fall 2010 52 playing an increasing role in in-silico predictive technologies in pharmaceutical research. These methods will be put in the context of other popular methods based on the classical statistics based approaches and overlap and contrast will be discussed. I will use poly- pharmacology predictions as an important use case to demonstrate the power of large scale machine learning methods for such application. In particular, prospective validation of these methods will be emphasized and discussed. CINF 83 Interpretable correlation descriptors for quantitative structure-activity relationships Jonathan D. Hirst(1) ,jonathan.hirst@nottingham.ac.uk, University Park, NottinghamNottinghamshire NG7 2RD, United Kingdom. (1) School of Chemistry, University of Nottingham,NottinghamNottinghamshire NG7 2RD, United Kingdom Highly predictive Topological Maximum Cross Correlation (TMACC) descriptors for the derivation of quantitative structure-activity relationships (QSARs) are presented, based on the widely used autocorrelation method. They require neither the calculation of three- dimensional conformations, nor an alignment of structures. Open source software for generating the TMACC descriptors is freely available from our website: http://comp.chem.nottingham.ac.uk/download/TMACC. We illustrate the interpretability of the TMACC descriptors, through the analysis of the QSARs of inhibitors of angiotensin converting enzyme (ACE) and dihydrofolate reductase. In the case of the ACE inhibitors, the TMACC interpretation shows features specific to C- domain inhibition, which have not been explicitly identified in previous QSAR studies. CINF 84 Chemistry in your hand: Using mobile devices to access public chemistry compound data Antony J Williams(1) ,antony.williams@chemspider.com, 904 Tamaras Circle, Wake Forest NC 27587,United States; Valery Tkachenko(1) .(1) ChemSpider,Royal Society of Chemistry, Wake Forest North Carolina 27587, United States Mobile devices allowing browsing of the internet to access chemistry related data come in many forms: phones, music players and, increasingly, as “tablets” and “pads”. With the permanently online connectivity of these mobile devices, the browser now being the default environment for much of our computer-based interactions, and the increasing availability of rich datasets online, the aggregation of these offerings mesh together to provide chemists with the capabilities to query and search for chemistry in ways that were the stuff of science fiction only a few years ago. Using the ChemSpider platform as a foundation, and with the intention of continuing to enable the community to access Chemistry, we have delivered mobile chemistry applications to search across over 20 million compounds sourced from over 300 data sources to retrieve data including properties, spectra and links to patents and publications. This presentation will discuss Mobile ChemSpider and the challenges of delivering such a tool. CINF 85 Feature analysis of ToxCastTM compounds Patra Volarath(1) ,Volarath.Patra@epa.gov,109 TW Alexander Drive, Mail Code: D343-03, Research Triangle Park NC 27711, United States ; Stephen Little(1) ; Chihae Yang(2) ;Matt Martin(1) ;David Reif(1) ;Ann Richard(1) .(1) National Center for Computational Toxicology,U.S. Environmental Protection Agency, Research Triangle Park NC 27711,United States (2) Center for Food Safety and Nutrition,U.S. Food and Drug Administration,Bethesda MD 20740,United States ToxCastTM was initiated by the US Environmental Protection Agency (EPA) to prioritize environmental chemicals for toxicity testing. Phase I generated data for 309 unique chemicals, mostly pesticide actives, that span diverse chemical feature/property space, as determined by quantum mechanical, feature-/QSAR-based, and ADME- based descriptors. Results in over 450 high-throughput screening assays were generated for the chemicals. Deriving associations across such a structurally diverse and information-rich dataset is challenging. Approaches to determine relationships between the bioassay data and chemistry-/biology-informed structural features, and methods to meaningfully represent this knowledge are being developed. We initially focus on the Phase I data set. Successful approaches will be applied to the much larger chemical libraries in ToxCast Phase II and Tox21 projects (the latter to screen approximately 10,000 chemicals). These approaches will be used to develop data mining approaches to inform toxicity testing and risk assessment modelling. This abstract does not reflect EPA or FDA policy.
  • 53. Chemical Information Bulletin Vol. 62(3) Fall 2010 53 CINF 86 Extracting information from the IUPAC Green Book Jeremy G Frey(1) ,j.g.frey@soton.ac.uk, School of Chemistry, Univeristy of Southampton,Southampton Hants SO17 1BJ, United Kingdom; Mark I Borkum(1) .(1) School of Chemistry, Univeristy of Southampton, Southampton HantsSO17 1BJ, United Kingdom The IUPAC manual of Symbols and Terminology for Physicochemical Quantities and Units (the Green Book) was first published in 1969. One of the fundamental principles of the IUPAC Green Book is the reuse of existing symbols and terminology, in order to enable the accurate exchange of information and data. Accordingly, there is a need for the IUPAC Green Book to be repurposed as a machine-processable resource. This paper reports an experiment where we define a syntax for the subject index of the IUPAC Green Book in the Parsing Expression Grammar (PEG) formalism. We repurpose the resulting Abstract SyntaxTree (AST) as the primary data source for a Ruby on Rails application and Simple Knowledge Organization System (SKOS) concept scheme. We demonstrate a metric that gives prominence to the most significant terms and pages in the subject index, and reflect upon the usefulness and relevance of the information obtained. CINF 87 Biologics and biosimilars: One and the same? Roger Schenck(1) ,rschenck@cas.org,2540 Olentangy River Road, Columbus OH 43202,United States . (1) Chemical Abstracts Service, Columbus OH 43202,United States Biopharmaceuticals (or biologics) and generic follow-on biosimilars currently account for more than 10% of the revenue in the pharmaceutical market. As patent protection for first generation biotherapeutics begins to expire, follow-on biosimilars have begun to appear. This presentation will provide insights on how the CAS databases handle biologics and biosimilars, how these substances are treated differently in patents, and how biosimilars are viewed by different patenting authorities. What the CAS databases reveal about trends in biopharmaceutical research and development will be discussed along with specific examples CINF 88 Intelligent mining of drug information resources Rashmi Jain(1) ,ydixit@evolvus.com,88, ShukrawarPeth, Pune - 411002,India ; Anay Tamhankar(1) ;Aniket Ausekar(1) ;Yuthika Dixit(1) .(1) Evolvus Group, Pune 411002,India A fundamental aspect of any research is to understand and keep track of progress made by peer groups in terms of scientific discoveries. Research Conferences form a definitive source of this information. Annually, thousands of papers are presented in such conferences for any given disease vertical from a Therapeutic, Biological, Pharmacological, Clinical perspective.At first glance, the problem of finding relevant conference proceedings of interest and then organizing the information into a format which is easily analyzed, stored and efficiently retrieved seems to be difficult and chaotic as there are no patterns by which a process can be defined, furthermore conference presentations are highly fragmented and non- standardized. A hybrid approach, wherein a Machine Learning based text-extraction software coupled with assisted expert annotations by human editors come to the rescue. An in- house Machine Learning software systemis used in the first stage wherein the conference proceedings are classified based on keywords, segmented and converted into standardized format. The software then uses a proprietary, heuristic based, learning algorithm to extract relevant data from the segments.Since it is well known that any automated approach cannot be 100% accurate, in this step the software is assisted by a team of expert human editors who analyze the extracted and segmented data and perform necessary corrections,if any. In the third step, the software then pushes each segment to a team of expert human editors who analyze the segment, extract information relevant to the area of research, and store the information in our internal databases. CINF 89 Cheminformatics semantic grid for neglected diseases Paul J Kowalczyk(1) ,paul.kowalczyk@scynexis.com, 3501-C Tricenter Blvd, Durham NC 27713,United States (1) Department of Computational Chemistry,SCYNEXIS, Durham NC 27713,United States We present a summary of our progress towards establishing a cheminformatics semantic grid for
  • 54. Chemical Information Bulletin Vol. 62(3) Fall 2010 54 neglected diseases. Our efforts are based on using public data and open-source programs to generate both descriptive and predictive models, which are themselves made publicly available. There are three modes of model access: as web services, via web portals, and as downloads. Models are saved in Predictive Model Markup Language (PMML) format. Information stored for each model includes the training set, test set, descriptors and model tuning parameters. This information is provided so that researchers may determine a model's domain, and its applicability to their data. Examples will be presented for two data sets retrieved from PubChem: enzyme inhibition of dihydroorotate dehydrogenase (AID:1175), and a cytochrome panel assay with activity outcomes (AID:1851). CINF 90 Extraction and integration of chemical information from documents Hugo O Villar(1) ,hugo@altoris.com, 7660-HFay Ave #347,La Jolla California,United States; Juan Betancort(1) ;Mark R Hansen(1) .(1) Altoris, Inc., La Jolla California 92037,United States Effective chemical research requires that all sources of information be incorporated in the decision making. Here we introduced a tool that saves time when trying to build chemical databases that can be built from web information or chemical literature, including patent information. We discuss some of the challenges faced in automating the identification and extraction of chemicals named in patents, and their conversion into chemical databases that can be mined effectively. The integration of external sources of data can be valuable for research informatics. To that end we have integrated the conversion of IUPAC names with chemical optical character recognition. We show examples where such integration can provide useful competitive information. CINF 91 SAR and the role of active-site waters in blood coagulating serine proteases: A thermodynamic analysis of ligand-protein binding Noeris K Salam(1) ,noeris.salam@schrodinger.com,8910 University Center Lane, Suite 270, San Diego CA 92122, United States ; Woody Sherman(2) ;Robert Abel(2) .(1) Schrodinger,Inc., San Diego CA 92122,United States(2) Schrodinger, Inc., New York New York 10036,United States The prevention of blood coagulation is important in treating thromboembolic disorders. Several serine proteases involved in the coagulation cascade are classified as pharmaceutically relevant and are the focus of structure-based drug design campaigns. Here, we investigate the serine proteases thrombin and factors VIIa, Xa, and XIa, using a computational method called WaterMap that describes the thermodynamic properties of the water solvating the active site. We show that the displacement of key waters from specific subpockets (e.g. S1, S2, S3 and S4) of the active site by the ligand is a dominant term governing potency, providing insights into SAR cliffs observed in several compound series. Furthermore, we describe how WaterMap scoring can be supplemented with terms from an MM-GBSA calculation to improve the overall predictive capabilities.
  • 55. Chemical Information Bulletin Vol. 62(3) Fall 2010 55 2010 CINF OFFICERS AND FUNCTIONARIES Chair Ms. Carmen Nitsche Symyx Technologies, Inc. 254 Rockhill Drive San Antonio, TX 78209 510-589-3555 (mobile phone) 210-820-3459 (office and fax) Carmen.Nitsche@symyx.com Chair Elect Dr. Gregory Banik Bio-Rad Laboratories, Inc. Informatics Division 2 Penn Center Plaza, Suite 800 1500 John F Kennedy Blvd Philadelphia, PA 19102-1721 267.322.6952(voice) 267.322.6953 (fax) Gregory_Banik@bio-rad.com Past Chair/Nominating Chair Ms. Svetlana Korolev University of Wisconsin,Milwaukee 2311 E. Hartford Avenue Milwaukee, WI 53211 414.229.5045 (voice) 414.229.6791(fax) skorolev@uwm.edu Secretary Ms. Leah Solla Cornell University Physical Sciences Library 283 Clark Hall Ithaca, NY 14853-2501 607.255.1361 (voice) 607.255.5288 (fax) lrm1@cornell.edu Treasurer Ms. Meghan Lafferty Science & Engineering Library University of Minnesota 108 Walter Library 117 Pleasant St SE Minneapolis MN 55455 612.624.9399 (voice) 612.625.5583 (fax) mlaffert@umn.edu Councilor Ms. Bonnie Lawlor National Federation of Advanced Information Services (NFAIS), 276 Upper Gulph Road Radnor, PA 19087-2400 215.893.1561 (voice); 215.893.1564 (fax) blawlor@nfais.org Councilor Ms. Andrea B. Twiss-Brooks 4824 S Dorchester Avenue,Apt 2 Chicago, IL 60615-2034 773.702.8777 (voice) 773.702.3317 (fax) atbrooks@uchicago.edu Alternate Councilor Dr. Guenter Grethe 352 Channing Way Alameda, CA 94502-7409 (510)865-5152 (voice and fax) ggrethe@comcast.net Alternate Councilor Mr. Charles F. Huber University of California, Santa Barbara Davidson Library Santa Barbara, CA 93106 805-893-2762 (voice) 805-893-8620 (fax) huber@library.ucsb.edu Archivist/Historian Ms. Bonnie Lawlor National Federation of Advanced Information Services (NFAIS) 276 Upper Gulph Road Radnor, PA 19087-2400 215.893.1561 (voice) 215.893.1564 (fax) blawlor@nfais.org Audit Chair Ms. Jody Kempf Science & Engineering Library University of Minnesota 108 Walter Library 117 Pleasant St SE Minneapolis MN 55455 612.624.9399 (voice) 612.625.5583 (fax) j-kemp@umn.edu
  • 56. Chemical Information Bulletin Vol. 62(3) Fall 2010 56 Awards Committee Chair Dr. Phil McHale CambridgeSoft Corporation 375 Hedge Rd Menlo Park, CA 94025-1713 650.235.6169 (voice) 650.362.2104 (fax) pmchale@cambridgesoft.com Careers Committee Chair Ms. Patricia Meindl University of Toronto A. D. Allen Chemistry Library 80 St George Street, Rm 480 Toronto, ON M5S 3H6 416.978.3587 (voice) 416.946.8059 (fax) pmeindl@chem.utoronto.ca Communications and Publications Committee Chair Dr. William Town Kilmorie Consulting 24A Elsinore Road London SE23 2SL United Kingdom +44 20 8699 9764 (voice) bill.town@kilmorie.com Constitution, Bylaws & Procedures Ms. Susanne Redalje University of Washington Chemistry Library BOX 351700 Seattle, WA 98195 206. 543.2070 (voice) curie@u.washington.edu Education Committee Chair Mr. Charles F. Huber University of California, Santa Barbara Davidson Library Santa Barbara, CA 93106 805.893.2762 (voice) 805.893.8620 (fax) huber@library.ucsb.edu Finance Committee Chair See Treasurer Fund Raising Committee Chair Mr. Graham Douglas Scientific Information Consulting 1804 Chula Vista Drive Belmont, CA 94002 510.407.0769 (voice) Graham_C_Douglas@hotmail.com Membership Committee Chair Ms. Jan Carver University of Kentucky Chemistry Physics Library 150 Chem Phys Bldg Lexington, KY 40506-0001 859. 257.4074 (voice) 859. 323.4988 (fax) jbcarv1@email.uky.edu Program Committee Chair Dr. Rajarshi Guha NIH Chemical Genomics Center 9800 Medical Center Drive Rockville, MD 20852 814.404.5449 (voice) 812.856.3825 (fax) rajarshi.guha@gmail.com Chemical Information Bulletin Editor Dr. Svetla Baykoucheva White Memorial Chemistry Library University of Maryland College Park, MD 20742 301.405.9080 (voice) 301. 314.5910 (fax) sbaykouc@umd.edu Tellers Chair Ms. Susan K. Cardinal University of Rochester Carlson Library Rochester, NY 14627 585.275.9007 (voice) 585.273.4656 (fax) scardinal@library.rochester.edu Webmaster Mr. Richard Williams Vertex Pharmaceuticals P.O. Box 290718 Boston,MA 02129 617.444.6325 (voice) RickWilliams@wis-llc.org