2005 04 05 SRI ELN Architecture


Published on

General presentation on ELN architecture delivered to SRI ELN conference in April 2005. Covers a lot of generic stuff.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2005 04 05 SRI ELN Architecture

  1. 1. ELN Architecture Simon Coles President & CTO, Amphora Research Systems
  2. 2. So... • You’re on holiday one day • Doing your normal thing • And then you get the call... • they want an ELN! http://www.amphora-research.com/ 2
  3. 3. http://www.amphora-research.com/ 3
  4. 4. ELN architecture • Hopefully • I am not going to self-destruct • Your project won’t be as exciting • Your task is to • Deliver a state-of-the-art ELN system • In tight timescales • With limited budget • In the real world • That the users like • And will serve you for many years http://www.amphora-research.com/ 4
  5. 5. Introduction • About me • Started working with ELNs in ‘96 • President & Co-founder of Amphora • IT background • First ELN was enterprise-scale ELN for Kodak • Worldwide, 1,000’s of users, diverse user base • Completely Electronic Records (no paper) • After a long & windy road • New products, lots more deployments, many industries • Certain amount of realism about ELN implementation • Provide Patent Evidence Creation & Preservation Systems • Work with a wide variety of “ELN” systems etc. • Now based in the US & UK http://www.amphora-research.com/ 5
  6. 6. This presentation • You can download a copy of this presentation from our web site http://www.amphora-research.com/ 6
  7. 7. Why does architecture matter? • A good architecture can help • Integrate “Best of breed” tools with existing investments • Allow you to split the project into manageable pieces • Ensure you don’t get “captured” by the vendor • Help your system withstand the ravages of time • Keep your TCO down • A bad architecture will hurt • Reliability, Scalability problems • Reduce your options going forward • Force you into “Big bang” project • Some random thoughts on architecture http://www.amphora-research.com/ 7
  8. 8. ELN architecture • Major issues • Diversity & Flexibility • Project size/Justification/ROI • Creating & Preserving Evidence for Patents • Need for long term access to ELN contents • Scalability • Web-based systems • How your network can help you • Trends • Integration methods • Open Source • In the lab • Ones to watch http://www.amphora-research.com/ 8
  9. 9. Diversity & Flexibility • “Science” covers a wide variety of activity • Each of these is served by its own industry • Improvements in each area needs to happen at its own pace • Things change • Different techniques • New data types • Another R&D centre • New devices for use in the lab • The very essence of “Research” is to change the way you work • How do we design an ELN which can accommodate these changes? http://www.amphora-research.com/ 9
  10. 10. Dealing with change • Build on other projects & integrate • if it can be done within another project, then do so • Keeps your life simpler and more focused, clear aims • Those other projects can proceed according to the rhythm and needs of the specific area • Where possible employ loose coupling between systems • Message passing reduces implementation complexity • SOAP/OLE/XML etc. http://www.amphora-research.com/ 10
  11. 11. Loosely-Coupled Systems Keep You Sane http://www.amphora-research.com/ 11
  12. 12. Project size/Justification/ROI • Two approaches • Either attempt to justify the whole ELN in one go (“Big bang”) • Or Phased • Divide the project into phases • Each involves a smaller investment (risk) • With a corresponding payoff • Move forward at a pace that’s comfortable for the business http://www.amphora-research.com/ 12
  13. 13. Phased ELNs • Historically this was very difficult to do with ELNs • Record keeping • Integration with other systems • Needs to be designed into the project (& product) from the start • Patent evidence creation/preservation system • Generic science-neutral platform (can often be your existing IT infrastructure) • Integrate/collaborate with discipline-specific software • When you can do it, makes a huge difference • Can start at a departmental level if needed • Asking the business to take a small risk each time http://www.amphora-research.com/ 13
  14. 14. Creating & Preserving Evidence for Patents • Specialized area with very specific (and unique) considerations • Best done separately from science-specific ELN tools • Hard to reconcile requirements of science and records in one system • You’ll often have a number of science-focused systems, yet want only one Patent evidence system • Run by a small group of people who know they’ll end up in court • Reduce risks & discovery costs • You can have an “Electronic” notebook for the scientist and still create a paper record http://www.amphora-research.com/ 14
  15. 15. Paper or Electronic? • The choice often comes down to • Comfort • Practicality • Cost Paper System Cost Electronic 10 100 500 1000 http://www.amphora-research.com/ 15
  16. 16. Long term access to ELN content • Partly this is records management issue • But there’s a heavy technical component • What format you store your data in • How you store your data • Metadata • You need to make Open Data formats part of your purchasing requirements http://www.amphora-research.com/ 16
  17. 17. “Good” (open) file formats • Publicly documented • Legally unencumbered • No patents, copyright concerns etc. • Any patents or copyright must be in the public domain • Ideally, self documenting (XML is a good start) • Degrade gracefully • If you can’t the data, at least you can see a picture • Based on more open, primitive formats where possible • At least two implementations of readers, one of which is Open Source • Widely used (W3C or IETF standards are good signs) http://www.amphora-research.com/ 17
  18. 18. Data formats for the long term • Good • For text: Plain ASCII, Unicode, HTML, possibly RTF • For graphics: PNG, SVG • For structured data: XML • To preserve appearance: PDF • Worry about • Storing files in databases • The database file format is probably undocumented • Store objects on the file system and use the database to point to them • Anything that is proprietary - there’s no excuse for it, and it dramatically increases your risk • Binary files generally • Mixing content in files (e.g. embedding XML in PDF) • Proprietary digital signatures http://www.amphora-research.com/ 18
  19. 19. IP concerns & data formats • Companies have always used Proprietary Data Formats as a competitive weapon • Companies are waking up to the use of IP tools (licenses, patents, copyrights) to reinforce their control over data formats • Just because a format is published doesn’t mean it is open • The Microsoft Office XML formats are a particularly bad example • Right now it looks positively radioactive • They’re being very careful what they say which indicates to me they’re planning something • http://www.groklaw.net/article.php? story=20050330133833843 • (see section: 4. Dissecting Microsoft’s “Patent License”) http://www.amphora-research.com/ 19
  20. 20. Standards • There are so many to choose from! • Two key ways of generating “Standards” • De Facto - dominant supplier/format • De Jure - committee based • Who gets to “bless” a standard? • What makes a “good standard” • De Jure process has difficulty keeping up with the real world • De Facto process has risk of lock-in • Pragmatic approach • Expect your suppliers to use open file formats • If there is an acceptable standard, use it • Make sure you are using the right kind of format for each purpose http://www.amphora-research.com/ 20
  21. 21. Records considerations • Not all the “Stuff” that’s generated during the research process is the same • Some of if needs to be kept for a long time • Some is only useful for the moment • Some will be benefit anyone • Some is only really useful for the person who created it (using specialized tools) • Some material is suitable for long term preservation, some isn’t • You can go crazy getting into this in too much detail • But you also need to make sure your tools and processes do allow you to manage the data/ records you’re creating http://www.amphora-research.com/ 21
  22. 22. Scalability • Geographical space • In wide area networks, latency becomes the most noticeable issue • Over multiple timezones, acceptable “Maintenance Windows” disappear • More data • Number of data items • Size of individual data items • Number of users • Larger populations generally mean more disparate requirements • How many people will get upset if the system goes down http://www.amphora-research.com/ 22
  23. 23. Latency • The science-specific “Deep” systems • Often highly interactive • Lots of round trips to the server for data etc. • This is what makes them cool • You can’t beat the speed of light (and network hardware add significant latency) • Therefore need to have a server close to the end user • Federation will give you a single overview • “Broad” systems have different usage characteristics • Very much like a normal web site, latency is much less of a problem • Very easy to have one system for worldwide use, even for large companies • Building large systems quite easy http://www.amphora-research.com/ 23
  24. 24. Web-based systems • “Web based” has become a bit of a marketing tool • Generally thin clients offer a lower TCO • And hence IT like them • In practice, most science-supporting ELN front ends will be delivered as a “thick” client • There’s a reason it’s called a browser • Wrapping an OLE object in IE is still “thick” • However, “Ajax” systems like GMail and Google Maps show just what you can do with a web-based system • Web based systems should expose a sensbiel URL interface http://www.amphora-research.com/ 24
  25. 25. How your network can help you • There’s a whole load of useful network services and Interfaces that large companies have • Useful ones • Single Sign On • LDAP • Printer/Fileserver etc. • Security/Status monitoring etc. • Beware of Central Digital Signature Infrastructure • Mixing vulnerabilities - leaves you open to accidents • Often not designed for long term use http://www.amphora-research.com/ 25
  26. 26. ELN architecture • Major issues • Diversity & Flexibility • Project size/Justification/ROI • Creating & Preserving Evidence for Patents • Need for long term access to ELN contents • Scale • Web-based systems • Trends • Integration methods • Open Source • In the lab • Ones to watch http://www.amphora-research.com/ 26
  27. 27. Integration methods • RPC-like mechanisms • Service Oriented Architecture • SOAP • REST • Text file passing (files, email, etc.) • URL launching • Often overlooked, but very powerful • What’s important • Loose-coupling • Open, lightweight systems • Consistent, stable keys • Stable URL (& domain) space http://www.amphora-research.com/ 27
  28. 28. Open Source • Definitely one to watch • Not the “Free” lunch you might think, but a pragmatic business too • Examples • Linux • Postgres • JBoss,Tomcat etc. • Ghostscript • Open Source is part of everyone’s infrastructure • Make sure you can run your systems on a variety of platforms http://www.amphora-research.com/ 28
  29. 29. Why? • Good for records • Gives you top-to-bottom control • Good for TCO • We’re finding the Open Source infrastructure easier to setup and reliable than proprietary alternatives • Enables a better solution • Transparent systems mean you can do things the original designers didn't think of • This is especially important for ELNs http://www.amphora-research.com/ 29
  30. 30. Data point • This is just our experience offering people alternatives for the server portion • 2000 - “What's Open Source? What’s Linux?” • 2001 - No way! • 2002 - some pilots underway, some acceptance • 2003 - majority of installations are Open Source infrastructure • 2005 - we’re wondering where Windows is • We’re not abandoning proprietary infrastructure • But it is clear that Open Source is getting serious consideration • Seeing a migration away from proprietary infrastructure to Open Source http://www.amphora-research.com/ 30
  31. 31. In the lab • ELN use in the lab is a hard problem • Tablets, Laptops, Palmtops etc. doesn’t seem to be working • What does seem to work • Small form-factor PCs on the bench • Remote Desktop & Citrix http://www.amphora-research.com/ 31
  32. 32. Ones to watch • Technology • XML generally • Web Services • Bluetooth and WiFi • RSS • OpenOffice • Jabber (as computer messaging and IM framework) • Trends • File format nasties • DMCA and other copyright legislation http://www.amphora-research.com/ 32