0
Adding structures to Web pages and data to structures Alex Allardyce ChemAxon Presented at ACS Spring Meeting, Anaheim, 2011
Demo – index page <ul><li>Lay out input box </li></ul><ul><li>Recently chemicalized, recent queries… </li></ul><ul><li>Dra...
Demo – chemicalizing a Web page <ul><li>URL paste </li></ul><ul><ul><li>Structure images </li></ul></ul><ul><ul><li>TOC an...
Demo – Structure based predictions <ul><li>Properties </li></ul><ul><ul><li>Manage views, move boxes </li></ul></ul><ul><u...
Demo – Structure search <ul><li>Chem search pages </li></ul><ul><ul><li>Search from Calculate properties </li></ul></ul><u...
Demo – Web search <ul><li>define chem and non-chem text query </li></ul><ul><ul><li>Structure synonyms in query </li></ul>...
Who are we <ul><li>70+ people making cheminformatics toolkits and GUI’s in Budapest, Hungary </li></ul><ul><li>4 areas of ...
Why did we do this <ul><li>History </li></ul><ul><li>Free academic package and FreeWeb licensing since 2005 </li></ul><ul>...
So why did we do this <ul><li>There is a lot of content on the web </li></ul><ul><li>Useful + increase visibility/utility ...
chemicalize.org under the hood <ul><li>web application (15kloc): </li></ul><ul><li>MySQL: DB engine  - structure/text stor...
ChemAxon bits <ul><li>Marvin : structure editor, viewer, image generation </li></ul><ul><li>Name <> structure ,  Document ...
Use cases: <ul><li>Wanted to know the logP of… </li></ul><ul><li>What are the structures for known drugs  ( http:// en.wik...
Stats: Raw numbers (Apr 1, 2010 – Mar 25, 2011) <ul><li>URL’s visited: 232,648 </li></ul><ul><li>Total number of names: 3,...
What are they doing on the site Presented at ACS Spring Meeting, Anaheim, 2011
How busy are they? Presented at ACS Spring Meeting, Anaheim, 2011
Top domains Total domains:  13,390  Ave. 17.29 urls per domain Presented at ACS Spring Meeting, Anaheim, 2011
Top pages <ul><li>en.wikipedia.org/wiki/List_of_anaesthetic_drugs </li></ul><ul><li>www.reactivereports.com/chemistry-blog...
Usage statistics – predictions Presented at ACS Spring Meeting, Anaheim, 2011
Future plans..? <ul><li>Remaining free </li></ul><ul><li>Crowdsourcing – new structures/names, bug reporting </li></ul><ul...
Thanks to <ul><li>Andras Stracz Site implementation </li></ul><ul><li>Daniel Bonniot Document & Name to structure </li></u...
Upcoming SlideShare
Loading in...5
×

Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011

1,043

Published on

chemicalize.org is a new free online service developed by ChemAxon which adds chemistry to Web pages as well as data and Web pages to structures. The primary use is to parse chemical names from Web page text and serve an annotated Web page version which includes structure images hyper-linked from the chemical name source. By storing structures and Web page URL's we can search the database to find those Web pages containing any given structure query. For each structure users can also generate structure based prediction results within a user customizable report, predictions include logP, pKa, logD etc. Current developments center around user profiles, 'tracking' structures in newly chemicalized pages and presenting chemicalize.org user activity to give a snapshot of current Web pages and structures that are interesting chemists online.

This presentation will outline the aims of the development, describe the service, current developments and overview use and user feedback.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,043
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • kloc = thousands lines of code (source). Not a great deal
  • some structures have more than 1 name
  • Grphic says: first 4 are key (are delivered as a default but not closed) Next 3 are merely popular Next 6 are not as popular Popular non default predictions are pKa-&gt;geometry
  • Transcript of "Chemicalize org: Adding structures to web pages and data and Web links to structures: ACS Anaheim 2011"

    1. 1. Adding structures to Web pages and data to structures Alex Allardyce ChemAxon Presented at ACS Spring Meeting, Anaheim, 2011
    2. 2. Demo – index page <ul><li>Lay out input box </li></ul><ul><li>Recently chemicalized, recent queries… </li></ul><ul><li>Drag and drop structure images </li></ul><ul><li>Help, about </li></ul>Example: http://www.chemicalize.org/ Presented at ACS Spring Meeting, Anaheim, 2011
    3. 3. Demo – chemicalizing a Web page <ul><li>URL paste </li></ul><ul><ul><li>Structure images </li></ul></ul><ul><ul><li>TOC and links </li></ul></ul><ul><ul><li>Properties link from mouse over image </li></ul></ul><ul><ul><li>Download </li></ul></ul><ul><ul><li>Links work </li></ul></ul>Example: http://www.chemicalize.org/?url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPenicillin Presented at ACS Spring Meeting, Anaheim, 2011
    4. 4. Demo – Structure based predictions <ul><li>Properties </li></ul><ul><ul><li>Manage views, move boxes </li></ul></ul><ul><ul><li>Open MarvinView from double click on any structure image </li></ul></ul><ul><ul><li>Calculate on demand </li></ul></ul><ul><ul><li>Download results </li></ul></ul>Example: http://www.chemicalize.org/structure/#!mol=Penicillin&source=parser Presented at ACS Spring Meeting, Anaheim, 2011
    5. 5. Demo – Structure search <ul><li>Chem search pages </li></ul><ul><ul><li>Search from Calculate properties </li></ul></ul><ul><ul><li>Open Marvin, power query features </li></ul></ul><ul><ul><li>Similarity default search, see other types </li></ul></ul><ul><li>Choose a structure </li></ul><ul><ul><li>List of URL’s, chemicalized links </li></ul></ul><ul><ul><li>Show structures </li></ul></ul><ul><ul><li>Combine chem search with URL </li></ul></ul><ul><ul><li>Download results </li></ul></ul>Examples: 1. http://www.chemicalize.org/search/#m=Penicillin/t=t/h=0 2. http://www.chemicalize.org/search/#m=Penicillin/t=t/h=0/c=46260/p=0 Presented at ACS Spring Meeting, Anaheim, 2011
    6. 6. Demo – Web search <ul><li>define chem and non-chem text query </li></ul><ul><ul><li>Structure synonyms in query </li></ul></ul><ul><ul><li>structures in results panel ‘like web text search + structures in the results” </li></ul></ul>Example: http://www.chemicalize.org/websearch/#m=Serotonin+sexual+preference+site%3Anature.com/p=0 Presented at ACS Spring Meeting, Anaheim, 2011
    7. 7. Who are we <ul><li>70+ people making cheminformatics toolkits and GUI’s in Budapest, Hungary </li></ul><ul><li>4 areas of technology : </li></ul><ul><ul><li>Cheminformatics platform toolkits </li></ul></ul><ul><ul><li>Discovery toolkits </li></ul></ul><ul><ul><li>Desktop applications </li></ul></ul><ul><ul><li>Markush and IP </li></ul></ul><ul><li>Lots of web ready chemistry functionality to play with </li></ul><ul><li>Emerging as industry leader in platform cheminformatics </li></ul>Presented at ACS Spring Meeting, Anaheim, 2011
    8. 8. Why did we do this <ul><li>History </li></ul><ul><li>Free academic package and FreeWeb licensing since 2005 </li></ul><ul><li>Marvin free for all desktops (since the beginning) </li></ul><ul><li>Open support forum developed to allow support for free users (no login to see all threads) </li></ul>Presented at ACS Spring Meeting, Anaheim, 2011
    9. 9. So why did we do this <ul><li>There is a lot of content on the web </li></ul><ul><li>Useful + increase visibility/utility of chemical structures </li></ul><ul><li>Creates user interest in this type of functionality and so demand for chemistry and content for publishers </li></ul><ul><li>Lets us develop directly with end users: </li></ul><ul><ul><li>Functionality/feature development </li></ul></ul><ul><ul><li>GUI usability </li></ul></ul><ul><ul><li>Crowd sourced bug fixing “Report Error” for naming. </li></ul></ul><ul><li>Pushing state of the art </li></ul><ul><ul><li>Browser tech (svg, chunking, reducing calls) </li></ul></ul><ul><ul><li>ChemAxon tech (on the web, must be superfast, finalise features) </li></ul></ul><ul><li>We love cheminformatics “cheminfomaniacs” </li></ul>Presented at ACS Spring Meeting, Anaheim, 2011
    10. 10. chemicalize.org under the hood <ul><li>web application (15kloc): </li></ul><ul><li>MySQL: DB engine - structure/text storage </li></ul><ul><li>ChemAxon bits: see below </li></ul><ul><li>Apache Tomcat – servlet container with code logic </li></ul><ul><li>jQuery + Plugins – UI interactions with code logic </li></ul><ul><ul><li>A fair bit of home grown (46% of code) here </li></ul></ul>Presented at ACS Spring Meeting, Anaheim, 2011
    11. 11. ChemAxon bits <ul><li>Marvin : structure editor, viewer, image generation </li></ul><ul><li>Name <> structure , Document to Structure : parsing, dictionaries and lexing IUPAC names </li></ul><ul><li>JChem Base , JChem Web Services , Standardizer , MCES : structure database, duplicate checking, structure search, web services layer, canonicalization, hit highlighting </li></ul><ul><li>Calculator Plugins : structure based predictions like pKa, logP, logD, charge, HBDA, tautomer, stereoisomers, etc. Notable combined predictions yield argument results – like “Lipinski-likeness” etc </li></ul>Presented at ACS Spring Meeting, Anaheim, 2011
    12. 12. Use cases: <ul><li>Wanted to know the logP of… </li></ul><ul><li>What are the structures for known drugs ( http:// en.wikipedia.org/wiki/List_of_drugs ) </li></ul><ul><li>Seeing structures in relation to the name </li></ul><ul><li>All wikipedia pages with a “chembox” have been indexed by chemicalize.org so can be searched by structure search (sub structure, similar, exact) </li></ul><ul><li>See all similar structures (and names) for any similar structure : sildenafil = viagra, lodenafil, aildenafil, udenafil … </li></ul><ul><li>Draw a structure and see it’s name </li></ul><ul><li>Automatically chemicalize my blog (WordPress plugin) </li></ul>Presented at ACS Spring Meeting, Anaheim, 2011
    13. 13. Stats: Raw numbers (Apr 1, 2010 – Mar 25, 2011) <ul><li>URL’s visited: 232,648 </li></ul><ul><li>Total number of names: 3,383,947 (14.58 names/page) </li></ul><ul><li>Unique names extracted: 220,117 </li></ul><ul><li>Structures extracted: 175,598 </li></ul><ul><li>Total number unique visitors: 44,535 </li></ul><ul><li>Average number of visitors/day (March 2011): 212 </li></ul><ul><li>Average/longest time on site: 4:03 / 28:41 (min:sec) </li></ul>Presented at ACS Spring Meeting, Anaheim, 2011
    14. 14. What are they doing on the site Presented at ACS Spring Meeting, Anaheim, 2011
    15. 15. How busy are they? Presented at ACS Spring Meeting, Anaheim, 2011
    16. 16. Top domains Total domains: 13,390 Ave. 17.29 urls per domain Presented at ACS Spring Meeting, Anaheim, 2011
    17. 17. Top pages <ul><li>en.wikipedia.org/wiki/List_of_anaesthetic_drugs </li></ul><ul><li>www.reactivereports.com/chemistry-blog/arty-with-a-capital-f-and-the-myth-of-absinthe.html/comment-page-1 </li></ul><ul><li>en.wikipedia.org/wiki/Penicillin </li></ul><ul><li>en.wikipedia.org/wiki/Aspirin </li></ul><ul><li>en.wikipedia.org/wiki/Paracetamol </li></ul><ul><li>www.ncbi.nlm.nih.gov/sites/entrez?db=pccompound&term=aspirin </li></ul><ul><li>en.wikipedia.org/wiki/List_of_organic_compounds </li></ul><ul><li>www.biomedcentral.com/info/ifora/figuretypes/ </li></ul><ul><li>www.freepatentsonline.com/y2005/0037033.html </li></ul><ul><li>www.vivo.colostate.edu/hbooks/pathphys/endocrine/pancreas/insulin_phys.html </li></ul>Data only available for last 2 weeks Presented at ACS Spring Meeting, Anaheim, 2011
    18. 18. Usage statistics – predictions Presented at ACS Spring Meeting, Anaheim, 2011
    19. 19. Future plans..? <ul><li>Remaining free </li></ul><ul><li>Crowdsourcing – new structures/names, bug reporting </li></ul><ul><li>Working on sorting and ordering results (biggie) </li></ul><ul><li>Personalization (login) = personal search history, profiles (notifications), dictionaries, calculation/search parameter settings </li></ul><ul><li>Index page as window into internet chemistry use </li></ul><ul><li>Browser Plugins = chemicalize better, particularly in login/https pages (plugins tech approaching unity anyway) </li></ul><ul><li>How about working up the chemistry side such as pharmacophore search, other screening, etc - there is a lot of ChemAxon tech here to play with </li></ul><ul><li>Work on quality of name parsing, black lists etc </li></ul><ul><li>What else guys – this is a provisional list </li></ul>Presented at ACS Spring Meeting, Anaheim, 2011
    20. 20. Thanks to <ul><li>Andras Stracz Site implementation </li></ul><ul><li>Daniel Bonniot Document & Name to structure </li></ul><ul><li>Alex Allardyce, Ferenc Csizmadia Features, project management, idiot and advanced testing </li></ul><ul><li>Zsolt Kocsmarszky Design </li></ul><ul><li>Roland Molnar JChem Web Services </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×