MarkLogic User Group - Best of MLW and Search + Semantics
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

MarkLogic User Group - Best of MLW and Search + Semantics

  • 1,196 views
Uploaded on

Live on stage at the MarkLogic User Group London, Matt Turner, MarkLogic's Chief Technologist for Media Solutions, will attempt to channel the groundbreaking, innovative and amazing topics from the......

Live on stage at the MarkLogic User Group London, Matt Turner, MarkLogic's Chief Technologist for Media Solutions, will attempt to channel the groundbreaking, innovative and amazing topics from the recent MarkLogic World conference.

Hear the mission impossible project stories, see the great customer apps and (if all goes well) get a first hand look at some of the new MarkLogic features.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,196
On Slideshare
917
From Embeds
279
Number of Embeds
1

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 279

http://www.scoop.it 279

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • << JBG: Data Now slide needs to be replaced. A slide at the end of this presentation contains an appropriate image. >>
  • << JBG: Data Now slide needs to be replaced. A slide at the end of this presentation contains an appropriate image. >>
  • << JBG: Data Now slide needs to be replaced. A slide at the end of this presentation contains an appropriate image. >>
  • Run it past Michaline and Dave GorbetInclude fulltext index in exposition.
  • Not all index has to be in memoryRoles and permissionsCheck sizingSee a SPARQL querySpend a bit more time on this slide

Transcript

  • 1. <presentation/><presenter>Matt Turner</presenter><title>Chief Technologist, Media Solutions</title>
  • 2. Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.<MLUGL><intro/><talk><bit>Mission Impossible</bit><story>Wiley</story><story>Springer</story><story>Mitchell1</story><bit>Search and Semantics<bit><demo>Old Skool</demo></talk></MLUGL>
  • 3. Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Mission(s) Impossible
  • 4. Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.<story>http://www.marklogic.com/resources/slides-gearing-up-for-the-content-factory-to-quickly-create-innovate-and-monetize/</story>
  • 5. Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Why is it Mission Impossible?Start Revenue Earning January 2013• Publish new content from 1 Jan 2013• Accepted Articles : 20/day; 100/week; 400/month• Early View Articles: 20/day; 100/week; 400/month• Issues : 19/month; 77/quarter; 230/yearGive AGU customers access to all licensed content by 1 January 2013• 21 journals (160,000 articles)• 33 personal choice products (aka virtual journals) based on AGU index terms• 743 special sections• Migrate customers, users, products, licenses, alerts dataVendors, systems & business processes in Editorial & Production ready topublish 2013 Content• Integration with new editorial system• Changes to work flowAnd… it needs to work like how it works on AGU site with over 60 enhancements
  • 6. Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.KeyChallenges•Content with no issue number and no pagination•Journal with 7 parts, of which 3 of those parts have sub-parts!•Many moving parts within Wiley - 17 systems to check•Content completeness and quality (and external vendor)•Unknown unknowns - coping with changing and emerging requirementsthroughout development phaseChallenges to overcome• 4 months left!
  • 7. Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Examples:“Coastal OceanObservatories”“The 11 March 2011Tohoku-Oki Earthquakeand Tsunami”Content-Driven Functionality – Special Section Search
  • 8. Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.How MarkLogic Helped - S/W DevelopmentSearch Service•As a search engine, doesnt need manual/additional re-indexing after loading newcontent. Everything is done on fly – saves time and effort•Enabled reuse and only had to add some enhancements to search service for AGUSave Searches•Search service processing request in XML is easy to save whole search and reuse itfor either alerts or loading the saved searchIndex Terms•Reuse vocabulary service to help with hierarchy of index terms. This was morevaluable for faceting for index terms. Can easily fetch any sub-structure of indextermsFaceting•MarkLogic supports faceting, so no need to do anything special, just add properconfiguration according AGU specification
  • 9. Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.What Variations/Non StandardPractices were introduced• New licensing model (e.g. multi choice product for personal subscribers)• Create Special Sections as another slice of content view• New workflow for handling daily society data updates via feeds• Changing content workflow for legacy vs current content• Improvements to content (not just conversion)• Start development before requirements were clear• Complete testing before we had all the content• Cannot complete certain types of testing• Break some rulesRecipe for Disaster?
  • 10. Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Conclusion•Mission Impossible? Choose not to accept•Mission Impossible? Deal with it – that’s life but may not succeed•Mission Impossible? New organizational capability•Embrace challenge, but put your best people with experience on it•Be brave to break the rules when required•People over Process•Enabling technologies like MarkLogicDevelop as new capability to handle the unexpected and unknowns
  • 11. Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.<story>http://www.marklogic.com/resources/betting-the-company-how-springer-successfully-insourced-its-flagship-content-platform/</story>
  • 12. Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 18Growth in electronic sales0.0%20.0%40.0%60.0%80.0%100.0%2007 2008 2009 2010 2011 2012BudTotal OnlineTotal Print6633
  • 13. Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 19So...Springer decided tobuild its own platform
  • 14. Slide 14 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 2136 man-years of effort to reproduce36 man-yearsHow much time independent software auditorestimated it would take to reproducethe existing code base
  • 15. Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 22A risky move?MetaPresscode base
  • 16. Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 24Oh, and have it readyin 11 months
  • 17. Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 26Where we were in April 2011• People• 1 Executive Champion• 1 Product Owner• 1 Dir. of Dev• 1 Tech Lead• 2 Developers• 1 BA• 0 QA• 0 DevOps• 0 UX/design/front-end• 0 architect• Hardware/Software/Data• 0 databases• 0 servers• 0 documents7 staff**3 managers – who don’t countJan-Erik de BoerBrian Bishop Georg NoldEVP of ITProduct Owner Dir. of Development
  • 18. Slide 18 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
  • 19. Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 29Where we are today• 1 Executive champion• 1 Product Owner• 1 Dir. of Dev• 2 Tech Leads• 16 Developers• 2 Dev Ops• 4 BAs• 6 QAs• 2 UX• 2 Design/Front-end• 1 Architect• 16 servers• 2 live environments• 1 database• 12 pairing stations• 2 Build Agents• 2 dashboard machines• 5.7 million documents• 60 million PNGs• 11TB of data31staff
  • 20. Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 31Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecNew platform release scheduleReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseReleaseRelease
  • 21. Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 34
  • 22. Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 42MarkLogicclusterRESTful APIs realtime.springer.comcitations.springer.comiPhone apps
  • 23. Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 45Goals areprioritized(top to bottom) andstoriesare prioritized(left to right)Velocity is measuredevery week, allowingus to accuratelyforecast when acertain level of workcan be completed
  • 24. Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 55MarkLogic IS agile
  • 25. Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 56MarkLogic agility• Schema-less means we can use our complex XML content as-is• E.g. Different attributes for books, journals, chapters, articles, protocols, etc.• You can decide later if you need to add indexes at very little cost• You don’t have to know everything up front• Ingestion is relatively pain-free• You are free to come up with features without worrying about back-end• Modifying content via Record Loader makes it easy to manipulate data• Handles various types of native content• You don’t even have to use Xquery!
  • 26. Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 69What if you could subscribe toa search query?
  • 27. Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 70Content Entitlements2TBStoring entitlements as queries means any new content loadedautomatically becomes available to authorized usersCustomers<material_ID=“001”>Subject : Engineering<content>Journal_ID:0001ContentType: ArticleDatePublished: 4/4/2012Subject:MathematicsAuthor: John SmithLanguage: EnglishKeywords: “k theory” <material_ID=“002”>Journal_ID: 0001-0099<material_ID=“003”>Subject: EngineeringSearchTerm: “carbon nanotube”DatePublished: 2000-2012<customer=“001”>material_ID : 001These are stored asserialized queries
  • 28. Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 76How did it go?
  • 29. Slide 29 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 72024681012Old NewAverage Page Load Time (sec)
  • 30. Slide 30 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Betting the Company | 4/6/2013 | 77Weekly visits to SpringerLink (millions, Aug 4, 2012 – Mar 2, 2013)Source: Google Analytics0500,0001,000,0001,500,0002,000,0002,500,0003,000,0003,500,0004,000,0004,500,0005,000,000link.springer.comSpringerLink.comTotal
  • 31. Slide 31 Copyright © 2013 MarkLogic® Corporation. All rights reserved.<story>http://www.marklogic.com/resources/the-journey-from-print-to-online/</story>
  • 32. Slide 32 Copyright © 2013 MarkLogic® Corporation. All rights reserved.2011
  • 33. Slide 33 Copyright © 2013 MarkLogic® Corporation. All rights reserved. 65 OEM Auto and PartManufacturers Data on every modern car sold inUS Repair Diagnostics Maintenance Technical Service Bulletins (TSBs) Wiring EstimatorMitchell1: Data
  • 34. Slide 34 Copyright © 2013 MarkLogic® Corporation. All rights reserved.What’ s in the data store today?• Articles – 408,892– 209,987 Narratives– 103,416 Technical Service Bulletins and Recalls– 15,179 Maintenance Schedules• Images – 6,193,647– 5,924,959 Narrative– 268,688 Technical Service Bulletins and Recalls• When it’ s all broken down, it becomes roughly16,000,000 MarkLogic Documents
  • 35. Slide 35 Copyright © 2013 MarkLogic® Corporation. All rights reserved.And how do we describe it?• Preferred Terms– Tends to be the ASE term– Used to describe Components (12,261), Diagnostic TroubleCodes (65,525), and Information Types (98)• Non-Preferred Terms– Tends to be OEM specific terminology– Alternate terms for Components (22,733) and Information Types(757)– Codes do not have Non-Preferred Terms• Spatial References– Because “ Replace the window motor” just isn’ t precise enough
  • 36. Slide 36 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Mitchell1: Data Then, Data Now
  • 37. Slide 37 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Mitchell1: Data Then, Data Now
  • 38. Slide 38 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Mitchell1: Data Then, Data Now
  • 39. Slide 39 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Mitchell1: Market Reactionhttps://www.youtube.com/watch?v=IfM8v-8NY_4&list=UUIOYnh6LBFooV_YxlPVPLvA&index=36
  • 40. Slide 40 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Search . . . and Semantics
  • 41. Slide 41 Copyright © 2013 MarkLogic® Corporation. All rights reserved.One Question . . .
  • 42. Slide 42 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Who’s Smarter?VS
  • 43. Slide 43 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Do domestic dogs interpret pointing as a command?Animal Cognition (2012): 1-12 , November 09, 2012By Scheider, Linda; Kaminski, Juliane; Call, Josep; Tomasello, Michael
  • 44. Slide 44 Copyright © 2013 MarkLogic® Corporation. All rights reserved.What if . . .
  • 45. Slide 45 Copyright © 2013 MarkLogic® Corporation. All rights reserved.HOW?
  • 46. Slide 46 Copyright © 2013 MarkLogic® Corporation. All rights reserved.The Basic IdeaGet some triples . . . if you haven’t already• Grabbed DBPedia• Dumped in Linked Data Consortium• Loaded Lehigh• and NYT’s open dataYou are behind!But what if you could add in documents?
  • 47. Slide 47 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Rich MarkLogic Applications .. Made Richer
  • 48. Slide 48 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Rich MarkLogic Applications .. Made RicherName: John SmithAffiliation: IBMTimezone: PSTCommitter: Hadoop
  • 49. Slide 49 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Semantics ArchitectureTRIPLEXQY XSLT SQL SPARQLGRAPHSPARQL
  • 50. Slide 50 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Triple Index• 3 triple orders• Cached for performance• Works seamlessly with other indexes• Security• 350 bytes per triple on disk• 1 billion+ triples per hostTRIPLE
  • 51. Slide 51 Copyright © 2013 MarkLogic® Corporation. All rights reserved.SPARQL• Executed using the triple index• SPARQL 1.0• Cost-based optimization• Join ordering and algorithms• More in the lightning talksselect * where {?person :birth-place ?place;:first-name “John”}SPARQL
  • 52. Slide 52 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Demo
  • 53. Slide 53 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
  • 54. Slide 54 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Old Skool- Quickie Framework- Circa 2006ish- HTML tables -> 1997 style- ‘action’ controller- <query/> state -> from the query string- No sessions- No CSS- No Javascript- No Adaptive Design- No Facets?
  • 55. Slide 55 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Search
  • 56. Slide 56 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Facets!
  • 57. Slide 57 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Semantics
  • 58. Slide 58 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Just Semantics?
  • 59. Slide 59 Copyright © 2013 MarkLogic® Corporation. All rights reserved.Thank You!