Super size your search

516 views

Published on

Using Apache ManifoldCF with Alfresco to provide a cloud scalable search solution using Apache Solr, Elastic Search or Amazon Cloud Search

Published in: Technology

Super size your search

  1. 1. Super Size Your Search 14th November 2013 Fran Alvarez (Zaizi) #SummitNow
  2. 2. Agenda • • • • • • Myself & My company Background Our Solution Scenario Demo Conclusions #SummitNow #SummitNow
  3. 3. About me • Director of Zaizi Iberia and Chief Architect • Alfresco Certified Engineer • Responsible of large Alfresco architectures • Semantic Consultant for Sensefy • Alfresco Meetups Organizer #SummitNow #SummitNow
  4. 4. We are an Open Source Development Company that helps people work together more effectively HQ: London (UK) Singapore Seville (Spain) Colombo (Sri Lanka) #SummitNow #SummitNow
  5. 5. What we offer • Open Source System Integrator • Specialist in ECM • Platinum Alfresco partner • Best Systems Integrator Partner EMEA 2012 • Best Systems Integrator Partner EMEA 2013 • Million $ Club in 2013 • Support 24/7 #SummitNow #SummitNow
  6. 6. Background Let‟s put a bit of context #SummitNow #SummitNow
  7. 7. Those Old Days… • Only Lucene in Alfresco 3.4• Indexes were managed within Alfresco context • Permissions were checked after Lucene returned all results #SummitNow #SummitNow
  8. 8. Present • Solr as Search Subsystem • Indexes are managed outside Alfresco context • Permissions are checked at query time • No in-transaction index #SummitNow #SummitNow
  9. 9. Alfresco 4 is… Common Enemies • Find a single document • Return large data sets • Filter by permissions • Be fast! “Sometimes one superhero is not enough” #SummitNow #SummitNow
  10. 10. Alfresco + Solr Approach • Quite a good architecture • Takes care of both performance and usability • Flexibility in deployment and installations However… • Sometimes we just need to use something else #SummitNow #SummitNow
  11. 11. Future Don’t freak out dude! We can arrange something #SummitNow #SummitNow
  12. 12. Our solution • Use Apache ManifoldCF • Decoupled from Alfresco • Can be integrated with either Alfresco or any other repository vendor • Preserve security and permissions within results • API to manage Manifold Services • API for searching, decoupling Search engine chosen • Simple Bundled UI • Lots of Manifold Customization It‟s included in our Semantic solution: Sensefy! #SummitNow #SummitNow
  13. 13. Apache ManifoldCF • Open Source Apache SF Project • Get content from repos • Push content on search services • Based on “Connector” and “Job” concept • Crawling model (add, change, delete) • And respect permissions, bitch! #SummitNow #SummitNow
  14. 14. ManifoldCF Overview Repository 1 Repository 2 Apache ManifoldCF Authority Service Repository 3 user specific search results Search Server 1 Search Server 2 Search Server 3 Authority 1 Repository 4 Authority 2 #SummitNow #SummitNow
  15. 15. ManifoldCF – Architecture Repository Job Search Server ACLs #SummitNow #SummitNow
  16. 16. ManifoldCF – Architecture Repository Connector Repository Job Search Server ACLs #SummitNow #SummitNow
  17. 17. ManifoldCF – Architecture Repository Connector Repository Output Connector Job Search Server ACLs #SummitNow #SummitNow
  18. 18. ManifoldCF – Architecture Repository Connector Repository Output Connector Job Search Server ACLs Authority Connector #SummitNow #SummitNow
  19. 19. ManifoldCF – Architecture Repository Connector query to retrieve contents Repository Output Connector Job Search Server ACLs Authority Connector #SummitNow #SummitNow
  20. 20. ManifoldCF – Architecture Repository Connector query to retrieve contents Repository Output Connector metadata mapping content ingestion Job Search Server ACLs Authority Connector #SummitNow #SummitNow
  21. 21. ManifoldCF – Architecture Repository Connector query to retrieve contents Repository Output Connector metadata mapping content ingestion Job Search Server ACLs Authority Connector retrieve content ACEs #SummitNow #SummitNow
  22. 22. ManifoldCF – Architecture Repository Connector query to retrieve contents Repository Output Connector metadata mapping content ingestion Job Search Server ACLs Authority Connector retrieve content ACEs • verbal description • crawling model • scheduling #SummitNow #SummitNow
  23. 23. Our ManifoldCF Contribution • Alfresco Repository Connector: New implementation • Amazon Cloud Search Output Connector • Alfresco Authority Connector: Design & Development #SummitNow #SummitNow
  24. 24. Some of our most famous villains #SummitNow #SummitNow
  25. 25. Several Alfresco instances Current • Alfresco instances don‟t share indexes • Indexes can‟t be merged • Can‟t have federated search No good approach for presenting results to users #SummitNow #SummitNow
  26. 26. Several Alfresco instances Our solution • Once index to rule them all • Data origin is irrelevant (or not if we don‟t) Single search across repositories • You choose your search engine! #SummitNow #SummitNow
  27. 27. Alfresco + Other data providers Current • Alfresco Search subsystem != Other provider Search services • Alfresco can‟t reach external data No way to merge results uniformly to end users #SummitNow #SummitNow
  28. 28. Alfresco + Other data providers Our solution • Search engine is shared • All of them speak „our language‟ • Alfresco can reach external data through Results are present and accessible between data providers #SummitNow #SummitNow
  29. 29. Alfresco + O(TB) data Current • Alfresco Search subsystem • Single or clustered Solr • Every Solr instance manage its own index • No chance to apply scale techniques Huge server are required and performance might be compromised #SummitNow #SummitNow
  30. 30. Alfresco + O(TB) data Our Solution • Alfresco uses our index • Indexing techniques can be applied according to use cases • Sharding, Replication… Search strategy can be adopted with best suitable search solution #SummitNow #SummitNow
  31. 31. Other benefits • Extract, index and map information from any other sources • Putting them together in a single index • Permissions are checked just once • Search capabilities: facets, highlighting… Red Link Apache ManifoldCF Search Server Authority Service Alfresco Alfresco Permissions #SummitNow #SummitNow
  32. 32. Demo #SummitNow #SummitNow
  33. 33. Demo : Architecture #SummitNow #SummitNow
  34. 34. Demo: Who are these guys? Gareth Bale, footballer Real Madrid latest star Christian Bale, Actor Christopher Nolan‟s Batman #SummitNow #SummitNow
  35. 35. Conclusions • Searching & Indexing in most popular Cloud Search solutions • Retrieving information from most popular repositories and data providers altogether • Manage permission and security for data Fully supported by us! #SummitNow #SummitNow
  36. 36. Conclusions #SummitNow #SummitNow
  37. 37. What‟s coming How can we improve it, dude? - Powerful UI - New connectors - Large data volume benchmarking - Share integration #SummitNow #SummitNow
  38. 38. We are not Batman But we can be your Superhero Zaizi Ltd. enquiries@zaizi.com falvarez@zaizi.com (+44) 20-3582-8330 Fran Álvarez (+34) 666-424-364 #SummitNow #SummitNow
  39. 39. Thank you! • May you want to help us with this one? #SummitNow #SummitNow

×