MarkLogic and The Universal Index

6,083 views
5,882 views

Published on

Talk about how MarkLogic Server uses a inverted index (like search engines) to optimize this document oriented NoSQL database

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,083
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
66
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Remember:Ask people if they know: -Map-Reduce,MVCC, Sharding, Shared nothing Clustering, NoSQL, consistent hashing, fsync
  • Worked in large companies like IBM in unstructured data management.Mostly client support.A lot of training.Now focused on clients specially on financial marketsLoves unstructured information data challenges
  • http://www.theregister.co.uk/2010/09/09/google_caffeine_explained
  • Examples: MarkmailApachecouchdb
  • Double buffered in memory stand to ensure maximum throughputStands comprise indexes and respective fragmentsFragments are finalNo “real” update or deleteLess error proneMerging as a self-healing mechanism
  • Introduce MVCC one liner
  • MarkLogic and The Universal Index

    1. 1. MarkLogic Developer Community<br />NoSQL Frankfurt, 2010<br />Awesome document-oriented NoSQL database<br />Beyond NoSQLwith MarkLogicThe Universal Index<br />and<br />
    2. 2. nunojob<br />nuno.job@marklogic.com<br />@dscape| nunojob.com<br />
    3. 3. how??<br />Ad hoc<br />Structure<br />Predefined<br />IDMS<br />Ad hoc<br />Predefined<br />Queries<br />
    4. 4. Indexes!<br />indexes!<br />so… filter map reduce !?<br />well… sort of…<br />flickr.com/ayalan<br />
    5. 5. divide and conquer<br />level of abstraction: ease of use<br />database<br />consistent-hashing-like thingy<br />partition2<br />partition3<br />partition1<br />standa group of trees<br />makes sense to have indexes in the same place<br />
    6. 6. 1st index resolution<br />2nd get documents<br />shared-nothing cluster<br />E Host 1<br />E Host 3<br />E Host 2<br />AppServer<br />Same <br />Code-<br />base<br />Data<br />D Host 4<br />D Host 5<br />D Host 6<br />D Host k<br />HA&DR<br />partition1<br />partition2<br />partition3<br />partitionm<br />partition4<br />
    7. 7. universal index<br />Range Indexes<br />Term<br />Term List<br />“accelerating”<br />123, 127, 129, 152, 344, 791 . . . <br />“creation”<br />122, 125, 126, 129, 130, 167 . . .<br />“content”<br />123, 126, 130, 142, 143, 167 . . .<br />“application”<br />123, 130, 131, 135, 162, 177 . . . <br />“agility”<br />Document References<br />126, 130, 167, 212, 219, 377 . . .<br /><article><br />. . . <br /><article> / <title><br />. . . <br />126, 130, 167, …<br />product: MarkLogic<br />Geospatial<br />
    8. 8. semi structured<br />article<br />title<br />paragraph<br />get tables from <br />computer <br />science articles <br />that include a <br />title with <br />word “content” <br />but not the <br />word “agility”<br />information<br />un-ordered list<br />metadata<br />structure<br />parentchild<br />paragraph<br />table<br />full text<br />footer<br />
    9. 9. universal index<br />in kelly speak: zippy-ing<br />Range Indexes<br />Term<br />Term List<br />“accelerating”<br />123, 127, 129, 152, 344, 791 . . . <br />“creation”<br />122, 125, 126, 129, 130, 167 . . .<br />“content”<br />123, 126, 130, 142, 143, 167 . . .<br />“application”<br />123, 130, 131, 135, 162, 177 . . . <br />“agility”<br />Document References<br />126, 130, 167, 212, 219, 377 . . .<br /><article><br />122, 125, 126, 129, 130, 143, 167<br /><article> / <title><br />122, 125, 126, 129, 130, 167 . . .<br />126, 130, 167, …<br />product: MarkLogic<br />Geospatial<br />
    10. 10. wait a minute…<br />Directories<br />Exclusive, hierarchical, analogous to file <br /> system, map to URI<br />Collections<br />Set-based, N:N relationship<br />Security<br />Invisible to your app<br />
    11. 11. universal index<br />Range Indexes<br />Term<br />Term List<br />“accelerating”<br />123, 127, 129, 152, 344, 791 . . . <br />“creation”<br />122, 125, 126, 129, 130, 167 . . .<br />“content”<br />123, 126, 130, 142, 143, 167 . . .<br />“application”<br />123, 130, 131, 135, 162, 177 . . . <br />“data base”<br />Document References<br />126, 130, 167, 212, 219, 377 . . .<br /><article><br />. . . <br /><article> / <title><br />. . . <br />126, 130, 167, …<br />product: MarkLogic<br />Directory: /articles/<br />Collection: CS<br />Role:Editor + Action:Read<br />Geospatial<br />
    12. 12. throughput<br />in memory stand(s)<br />durability: journal<br />flickr.com/kt<br />
    13. 13. mvcc<br />append only database, use sys-timestamps<br />to know which document is currently<br />available<br />and the marklogic time machine<br />delete<br />update<br />(could also be create)<br />create<br />System<br />timestamp<br />query<br />
    14. 14. too good to be true?<br />try us out… free version available!<br />developer.marklogic.com/products<br />markmail.org<br />pairs.demo.marklogic.com<br />heatmap.demo.marklogic.com<br />bit.ly/ml-demo<br />flickr.com/nattu<br />
    15. 15. questions?<br />Love NoSQLdatabases?<br />Want to change the world?<br />We are hiring!!<br />spkr8.com/t/4590<br />Feedback<br />nuno.job@marklogic.com<br />
    16. 16. Open-source, closed development?<br />REST<br />Mobile<br />XQuery and why it’s awesome!<br />not covered<br />but conversations are welcome!<br />App Server + Search + Database<br />Scalable ACID transactions<br />XML vs. JSON ?<br />Merging / Compaction<br />Relevance<br />MVCC<br />Reverse Indexes<br />Alerting<br />High Order Functions<br />Geospatial queries<br />Co-occurrence<br />Meta programming<br />Document databases<br />

    ×