Experience with MarkLogic at ElsevierBradley P. Allen and Darin McBeath, Elsevier LabsPresentation at NoSQL Now 2011San Jo...
Elsevier: who we are •   Elsevier , part of the Reed Elsevier group, is a world leading publisher of     scientific, techn...
MarkLogic at Elsevier • MarkLogic is used pervasively throughout our   business   – Science and Technology   – Health Scie...
Motivations for MarkLogic adoption • Company was committed to XML standard for   content representation • Vision of buildi...
MarkLogic applications at ElsevierBusiness     Product          Description                                               ...
MarkLogic benefits and challenges at Elsevier • MarkLogic brings us two big benefits   – Excellent fit with how we represe...
Developer productivity and adoption • XQuery can be a powerful language for rapid   prototyping   – Can support writing co...
Standards and interoperability • Vendors view XQuery in different ways: some view it as a   query language, some as a tran...
Software ecosystem • The eco-system around XQuery and   MarkLogic is lacking   – Not a tremendous amount of open source   ...
Total solution fit • MarkLogic started out as an XML database   solution • It has added functionality (e.g. free text sear...
TCO relative to other solutions • Traditional enterprise software licensing can   lead to significant costs • NoSQL docume...
MarkLogic in the context of NoSQL in general • NoSQL before it was cool • But there are emerging differences between   the...
Future use of MarkLogic at Elsevier • Persisting as foundation of content repository efforts     – XML legacy drives conti...
Summary• We were an early adopter of MarkLogic• Over ten years it has become a mature  product that we rely on extensively...
Upcoming SlideShare
Loading in …5
×

Experience with MarkLogic at Elsevier

1,592 views

Published on

Elsevier is the world's largest publisher of scientific, medical and technical (STM) content. An early adopter of XML as a standard representation for content, Elsevier has used MarkLogic in the development of a range of information access and discovery solutions for its customers. This presentation will cover Elsevier's experience with XML-centric content management systems in general and MarkLogic's technology in specific, describing Elsevier's initial adoption and uptake of the technology, current use within the Elsevier suite of online products and solutions, and opportunities for future use. Design patterns for content repositories within a publishing context that have emerged during our use of the technology will be described, and we will touch on a number of issues that have emerged, including XQuery and its adoption within the developer community, the challenges facing XML from new representations for documents and metadata such as JSON and RDF, and the delivery of search applications based on XML infrastructure.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,592
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
23
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Experience with MarkLogic at Elsevier

  1. 1. Experience with MarkLogic at ElsevierBradley P. Allen and Darin McBeath, Elsevier LabsPresentation at NoSQL Now 2011San Jose, CA, USA2011-08-25
  2. 2. Elsevier: who we are • Elsevier , part of the Reed Elsevier group, is a world leading publisher of scientific, technical and medical full text literature. 7,000 employees in over 70 offices worldwide publish more than 2,500 journal titles and 11,000 online books. Global Global Global market community audience North 7,000 editors 15 million doctors, America = 70,000 editorial + nurses and health + board members professionals 10 million+ Europe 200,000 referees Asia- researchers in 4,500 Pacific 500,000+ authors institutes 5 million students 2
  3. 3. MarkLogic at Elsevier • MarkLogic is used pervasively throughout our business – Science and Technology – Health Sciences – Operations • It is also a strategic technology for our sister Reed Elsevier organization LexisNexis • We were an early adopter of MarkLogic – Began working with MarkLogic in 2001 3
  4. 4. Motivations for MarkLogic adoption • Company was committed to XML standard for content representation • Vision of building Web services on top of XML content repositories • Enabling new information solutions through reuse and mashup of existing journal and book content • Relational technologies not a good fit 4
  5. 5. MarkLogic applications at ElsevierBusiness Product Description MarkLogic Features Used LaunchedScience & Scopus The largest abstract and citation database containing Repository, Transformation, and 2005Technology both peer-reviewed research literature and quality some extensions (such as web sources fast/accurate counting). Contains 50+ million abstracts Original application that used MarkLogic Scopus Offline version of Scopus Repository, Transformation 2007 Custom Data EMBASE Biomedical database with over 24 million indexed Repository, Search, 2008 records Transformation Methods Task-specific search for experimental methods and Repository, Content Processing 2010 Navigator protocols across 40,000 articles Framework HazMat Chemical safety database based on Brethericks Repository, Content Processing 2010 Navigator Handbook of Reactive Chemical Hazards, others Framework SciVal Funding Database of current research funding opportunities Repository, Content Processing 2010 and award information FrameworkHealth Books 1000 books supporting multiple Health Sciences Repository, ability to present 2006Sciences applications (HESI, NursingConsult, MDConsult). content quickly/easily by chapter, section, paragraph Health Health Sciences journal platform Repository, Search, 2007 Connect Transformation Linked Data 500,000 content enhancement metadata documents Repository, Xpath and a handful 2011 Repository 100% XQuery application of proprietary extensionsOperations ConSyn Batch retrieval service for 10+ million journal articles Search, Repository, Task Server, 2010 Zip, Security, Transformation 5
  6. 6. MarkLogic benefits and challenges at Elsevier • MarkLogic brings us two big benefits – Excellent fit with how we represent our content – Tools (XQuery, XSLT) that support working with that content representation • Those benefits come with challenges, some old, some new – Developer productivity and adoption – Standards and interoperability – Software ecosystem – Total solution fit – TCO relative to other solutions 6
  7. 7. Developer productivity and adoption • XQuery can be a powerful language for rapid prototyping – Can support writing complete web applications • Experienced XQuery resources are difficult to find – Especially relative to emerging JSON/Web framework resources • Difficult to motivate developers committed to more mainstream frameworks, patterns, and languages 7
  8. 8. Standards and interoperability • Vendors view XQuery in different ways: some view it as a query language, some as a transformation language, some as a programming language, all of the above, etc. • These disparate views often lead to confusion in the community as to what really is XQuery • XQuery interoperability is currently difficult and it is doubtful that it ever will be beyond simple applications – Groups such as eXPath will help tidy up some interfaces, but there is far more work that needs to be done. – Elsevier Labs has investigated this issue in the context of the SciVal Showcase application using 4 different XQuery engines (MarkLogic, eXist, 28ms, and XQIB) – This experiment highlighted the differences in the implementations (and the looseness of the W3C recommendation) 8
  9. 9. Software ecosystem • The eco-system around XQuery and MarkLogic is lacking – Not a tremendous amount of open source and/or 3rd party modules or language bindings • The IDEs and debugging tools (while vastly improved) are still not at par with other query languages 9
  10. 10. Total solution fit • MarkLogic started out as an XML database solution • It has added functionality (e.g. free text search) matured over the years – This is a big part of its intended use at LexisNexis • We struggle to understand the tradeoffs between a single solution vs. composition of best-of-breed solution (e.g. MarkLogic standalone vs. MarkLogic integrated with Solr) 10
  11. 11. TCO relative to other solutions • Traditional enterprise software licensing can lead to significant costs • NoSQL document database solutions with business models based on open source plus support services are an emerging alternative • Still working on determining TCO tradeoff between the two in an enterprise context 11
  12. 12. MarkLogic in the context of NoSQL in general • NoSQL before it was cool • But there are emerging differences between the document stores for traditional vs. Internet publishing – XML/XQuery/XSLT vs. JSON/UnSQL/Javascript – Manual scale-out vs automated scale-out • Overhead of legacy standards can be a drag – Where is XML in its adoption lifecycle? – How does HTML5 fit in? 12
  13. 13. Future use of MarkLogic at Elsevier • Persisting as foundation of content repository efforts – XML legacy drives continued use • Turnkey SaaS for publishing, newer NoSQL solutions competing for attention – Solutions that layer XML processing and query technologies on top of non-XML NoSQL stores are beginning to appear (e.g. Ambrosoft’s XML DB project) • Design choices driven by consumer Internet use cases may not yield as good a fit to information publishing as MarkLogic – Emphasis on join-free queries and use-case-driven indexing • We are watching to see how emerging best practices and design patterns associated with consumer Internet that are good fits are supported moving forward – Auto-scaling – Web application frameworks – HTML5 13
  14. 14. Summary• We were an early adopter of MarkLogic• Over ten years it has become a mature product that we rely on extensively across our business• The response of MarkLogic to the emergence of NoSQL document stores, non-XML document serializations and application design patterns from the consumer Internet is of keen interest to us 14

×