0
The RSC & e-Science:
Reflecting the Change in the
World we Live In
Valery Tkachenko
RSC-OSDD Consultative Workshop on
Chem...
Royal Society of Chemistry and Global
Chemistry Network
The World we live in
Internet World
20+ years into the Internet Revolution
Web 2.0 -> Web 3.0
Connected World
Social Netwo...
Pillars of the World
Data
Data (knowledge) is a King
Dataflow
Navigation
Domain-specific search and navigation
Navigate in...
Science map
Chemical sciences map
Chemistry on the Internet
What’s wrong?!?!
Complexity
Royal Society of Chemistry and Global
Chemistry Network
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
50000ft view at STM publisher
Knowledge
Our User Interfaces
(Desktop, Web, Mobile, etc)
Customers
Delivery Magic
3rd
party...
ChemSpider Suite
Data Layer
ChemSpider
Assays
ChemSpider
Compounds
ChemSpider
Reactions
ChemSpider
Spectra
ChemSpider
Mate...
• 29 million chemicals and growing
• Data sourced from >500 different sources
• Crowdsourced curation and annotation
• Ong...
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
ChemSpider and Atovaquone
Micropublishing
Micropublishing
Micropublishing
ChemSpider Reactions
ChemSpider Reactions
Knowledge in our own archives
DERA and Text Mining
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-
thiadiazol-5-yl)urea prepared in Example...
Text Mining
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-
thiadiazol-5-yl)urea prepared in Example 6 , thio...
It is so difficult to navigate…
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
Wh...
Digitally Enabling RSC Archive
Text, PDF, XML
Structures
Reactions
Spectra
Materials
Chemistry Validation and
Standardizat...
Data quality issue and CVSP
Robochemistry
Proliferation of errors in public and
private databases
Automated quality contro...
ChemSpider issues
DrugBank dataset (6516
records)
~60 records that can’t be dearomatized unambiguously
DB04283 DB04462
~30 records with bonds that do not make
sense
DB04283
DDB04009
2 records where Smiles, InChI, and name did not match
the structure
DB00611 DB01547
~40 records where InChIs did not match the structure
DrugBank ID: DB00755
InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)1...
DB08128
J. Brechner, IUPAC
Graphical Representation of
stereochem. configurations
Section: ST-1.1.10
DB06287
7 records wit...
CVSP validation of ChEMBL 16 (~1.3 mln. records)
• Overall 0.7% of records had validation issues
• Stereo problems (~82%)
...
“Direction of bond makes no sense” –
63%
“Stereo types of the opposite bonds mismatch” -15%
http://www.iupac.org/publications/pac/2006/pdf/7810x1897.pdf
“Stereo types of non-opposite bonds match” – 2%
“atom not recognized” – 3% isotopes
Should be atom from periodic table
No mass difference in atom line
No “M ISO” in conne...
ChemSpider Suite
Data Layer
ChemSpider
Assays
ChemSpider
Compounds
ChemSpider
Reactions
ChemSpider
Spectra
ChemSpider
Mate...
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
Started with 2 servers in a basement
Presently – two farms ~40 servers each
Future – in the Clouds
Compute intensive calculations
Delivery systems
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
AltMetrics
Curation in ChemSpider
Knowledgebases and delivery systems
Big Data challenge
Crowdsourcing and altmetrics
New interfaces
Visualization
Navigation
ChemSpider APIs
We are a part of a larger world
National Chemistry Database
National Data Repository
University 1
Data Hub
Workstations
University 2
Data Hub
Workstations
Company 3
Data Hub
Workstat...
http://www.openphacts.org
Open PHACTS is an Innovative
Medicines Initiative (IMI) project,
aiming to reduce the barriers t...
What does e-Science do in
?
ChemSpider provides many of the
physicochemical properties within the
Open PHACTS Discovery Pl...
RDF Export
Data:
ChEMBL
HMDB
DrugBankChemistry Validation and Standardization
Platform (CVSP)
at cvsp.chemspider.com
•Vali...
We know about Natural Products
Marinlit
OSDD
The Global Chemistry Network
The rsc e science - reflecting the change in the world we live in
Upcoming SlideShare
Loading in...5
×

The rsc e science - reflecting the change in the world we live in

400

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
400
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "The rsc e science - reflecting the change in the world we live in"

  1. 1. The RSC & e-Science: Reflecting the Change in the World we Live In Valery Tkachenko RSC-OSDD Consultative Workshop on Cheminformatics Delhi, September 28th 2013
  2. 2. Royal Society of Chemistry and Global Chemistry Network
  3. 3. The World we live in Internet World 20+ years into the Internet Revolution Web 2.0 -> Web 3.0 Connected World Social Networks Real-time Communications Big Data World Semantic content New Interfaces
  4. 4. Pillars of the World Data Data (knowledge) is a King Dataflow Navigation Domain-specific search and navigation Navigate inside and link out - federation Interfaces HCI (human computer interface) M2M (machine to machine)
  5. 5. Science map
  6. 6. Chemical sciences map
  7. 7. Chemistry on the Internet
  8. 8. What’s wrong?!?! Complexity
  9. 9. Royal Society of Chemistry and Global Chemistry Network
  10. 10. Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  11. 11. Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  12. 12. 50000ft view at STM publisher Knowledge Our User Interfaces (Desktop, Web, Mobile, etc) Customers Delivery Magic 3rd party integrations (our web services)
  13. 13. ChemSpider Suite Data Layer ChemSpider Assays ChemSpider Compounds ChemSpider Reactions ChemSpider Spectra ChemSpider Materials ChemSpider Algorithms Business Objects Layer CSAs BOCSC BO CSR BO CSS BO CSMBO CSABO APIs Layer DS APIExport APISearch API Processing API CSAs APICSC API CSR API CSS API CSMAPI CSAAPI Components Layer JS Components Google Apps Components Python widgets SharePoint Components PHP snippets ASP.NET Components UIs ChemSpider website ChemSpider Reactions mobileweb app ChemSpider desktop app Depositions client Java Beans
  14. 14. • 29 million chemicals and growing • Data sourced from >500 different sources • Crowdsourced curation and annotation • Ongoing deposition of data from our journals and our collaborators • A structure centric hub for web-searching
  15. 15. ChemSpider and Atovaquone
  16. 16. ChemSpider and Atovaquone
  17. 17. ChemSpider and Atovaquone
  18. 18. ChemSpider and Atovaquone
  19. 19. ChemSpider and Atovaquone
  20. 20. ChemSpider and Atovaquone
  21. 21. ChemSpider and Atovaquone
  22. 22. ChemSpider and Atovaquone
  23. 23. ChemSpider and Atovaquone
  24. 24. ChemSpider and Atovaquone
  25. 25. ChemSpider and Atovaquone
  26. 26. ChemSpider and Atovaquone
  27. 27. Micropublishing
  28. 28. Micropublishing
  29. 29. Micropublishing
  30. 30. ChemSpider Reactions
  31. 31. ChemSpider Reactions
  32. 32. Knowledge in our own archives
  33. 33. DERA and Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6, thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer, thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring, for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N- methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  34. 34. Text Mining The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4- thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5 ml ) and benzene ( 50 ml ) were charged into a glass reaction vessel equipped with a mechanical stirrer , thermometer and reflux condenser . The reaction mixture was heated at reflux with stirring , for a period of about one-half hour . After this time the benzene and unreacted thionyl chloride were stripped from the reaction mixture under reduced pressure to yield the desired product N-(β-chloroethyl)-N- methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid residue
  35. 35. It is so difficult to navigate… What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
  36. 36. Digitally Enabling RSC Archive Text, PDF, XML Structures Reactions Spectra Materials Chemistry Validation and Standardization Platform (CVSP) DERA (Text Mining) Biological Activities
  37. 37. Data quality issue and CVSP Robochemistry Proliferation of errors in public and private databases Automated quality control system
  38. 38. ChemSpider issues
  39. 39. DrugBank dataset (6516 records) ~60 records that can’t be dearomatized unambiguously DB04283 DB04462
  40. 40. ~30 records with bonds that do not make sense DB04283 DDB04009
  41. 41. 2 records where Smiles, InChI, and name did not match the structure DB00611 DB01547
  42. 42. ~40 records where InChIs did not match the structure DrugBank ID: DB00755 InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13- 20(18,4)5/h6,8-9,11-12,14H,7,10,13H2,1-5H3,(H,21,22)/b9-6+,12-11+,15-8+,16-14+ DruGBank ID: DB00614
  43. 43. DB08128 J. Brechner, IUPAC Graphical Representation of stereochem. configurations Section: ST-1.1.10 DB06287 7 records with 2 stereo bonds at chiral atoms
  44. 44. CVSP validation of ChEMBL 16 (~1.3 mln. records) • Overall 0.7% of records had validation issues • Stereo problems (~82%) • Directions of bonds do not make sense (~63%) • Ambiguous stereo : 2 stereo bonds at chiral center (~19%)
  45. 45. “Direction of bond makes no sense” – 63%
  46. 46. “Stereo types of the opposite bonds mismatch” -15% http://www.iupac.org/publications/pac/2006/pdf/7810x1897.pdf
  47. 47. “Stereo types of non-opposite bonds match” – 2%
  48. 48. “atom not recognized” – 3% isotopes Should be atom from periodic table No mass difference in atom line No “M ISO” in connection table In molfile:
  49. 49. ChemSpider Suite Data Layer ChemSpider Assays ChemSpider Compounds ChemSpider Reactions ChemSpider Spectra ChemSpider Materials ChemSpider Algorithms Business Objects Layer CSAs BOCSC BO CSR BO CSS BO CSMBO CSABO APIs Layer DS APIExport APISearch API Processing API CSAs APICSC API CSR API CSS API CSMAPI CSAAPI Components Layer JS Components Google Apps Components Python widgets SharePoint Components PHP snippets ASP.NET Components UIs ChemSpider website ChemSpider Reactions mobileweb app ChemSpider desktop app Depositions client Java Beans
  50. 50. Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  51. 51. Started with 2 servers in a basement Presently – two farms ~40 servers each Future – in the Clouds
  52. 52. Compute intensive calculations Delivery systems
  53. 53. Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  54. 54. AltMetrics
  55. 55. Curation in ChemSpider
  56. 56. Knowledgebases and delivery systems Big Data challenge Crowdsourcing and altmetrics New interfaces
  57. 57. Visualization
  58. 58. Navigation
  59. 59. ChemSpider APIs
  60. 60. We are a part of a larger world
  61. 61. National Chemistry Database
  62. 62. National Data Repository University 1 Data Hub Workstations University 2 Data Hub Workstations Company 3 Data Hub Workstations Data Repository indexed storage Data Repository provided data storage Chemically intelligent services Indexes Data External clients Publishers Scientists Funding bodies
  63. 63. http://www.openphacts.org Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to drug discovery in industry, academia and for small businesses. Semantic web is one of the corner stones
  64. 64. What does e-Science do in ? ChemSpider provides many of the physicochemical properties within the Open PHACTS Discovery Platform e-Science develop tools to check and standardise chemical structures • • e-Science is creating the Open PHACTS chemical registration system •
  65. 65. RDF Export Data: ChEMBL HMDB DrugBankChemistry Validation and Standardization Platform (CVSP) at cvsp.chemspider.com •Validation •Standardization •Parent generation •Run on Hadoop-based farm
  66. 66. We know about Natural Products
  67. 67. Marinlit
  68. 68. OSDD
  69. 69. The Global Chemistry Network
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×