BP-8 Global Federation and Search


Published on

Many global organizations face similar challenges around sharing information in a timely fashion between regions; for publishers this is often exacerbated due to the size of some their assets such as print quality images or video. Alfresco, with its open extensible architecture, makes a great basis for a global enterprise content or digital asset management system yet there are still numerous challenges to tackle when implementing on a global scale. Federation is one approach that can be used successfully when the regions are generally independent in the production of content, but are producing assets that can be consumed and re-used globally. Alfresco 4.0 uses Solr and that can be leveraged to provide federated search across multiple, disparate Alfresco repositories. This session will cover how: Federated search provided remote content discovery; Share was customized to handle federated search; Intelligent storage provided eventual consistency of files; and Users could request content migration on-demand.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

BP-8 Global Federation and Search

  1. 1. Global Federation and Search! Robin Bramley, Ixxus!
  2. 2. Agenda! •  Who I am! •  Setting the scene! •  The business challenge! •  Alfresco! •  Solr! •  Big Content! •  Global considerations! •  Scaling strategies! •  Alfresco 4! •  Federation approaches! •  ʻIntelligentʼ storage! •  Challenges!
  3. 3. My Background! •  Senior Architect @ Ixxus! •  The UK Alfresco Platinum Partner! •  Lucid Imagination partner! •  Worked at consultancies for 13 years! •  Developing solutions with Alfresco since 0.6! •  First UK Alfresco Gold partner! •  Around the edges I also write! •  GroovyMag author – inc. 4 hands-on Grails articles! •  DZone Most Valuable Blogger! •  Re-published posts include Event Driven indexing with Solr! •  Open source contributions include! •  OpenID support for Acegi / Spring Security! •  Codenarc support for Hudson / Jenkins CI Violations plugin!
  4. 4. The challenge! •  Many global organisations face similar challenges around sharing information in a timely fashion between regions. •  For publishers this is often exacerbated due to the size of some their assets such as print quality images or video.
  5. 5. Alfresco!Hopefully this needs little introduction. •  Clue: itʼs an ECM!
  6. 6. Apache Solr!RESTful Search Service •  POST it documents! •  GET query results!•  Built on top of Lucene!•  Originated from CNET (created by Yonik Seeley)!•  Features ! •  Schema! •  Request handlers! •  Query types! •  Response Writers! •  Admin pages! •  Replication! •  Sharding!•  Professional support available from Lucid Imagination!
  7. 7. Big Content
  8. 8. Going global!
  9. 9. Going global!Global systems can pose additional challenges•  Infrastructure •  Network! •  Bandwidth! •  Latency! •  Reliability!•  Languages•  Timezones•  Collaboration•  Workflow•  Security permissions
  10. 10. Scaling strategies!You can scale / divide & conquer systems in a number of ways:•  Scale up (vertical)
  11. 11. Scaling strategies!•  Scale out (horizontal) •  Typically clustering! •  But could also be! •  Replication! •  Separation of responsibilities!
  12. 12. Scaling strategies!•  Partitioning •  Data Sharding! •  Silos ! •  Divisional / departmental! •  Regional!
  13. 13. Alfresco 4!What’s new in Alfresco 4.0? •  Wonʼt repeat the full press release here…! •  ʻCloud-scale performanceʼ! •  Alfresco Index Server based on Apache Solr! •  Enhanced clustering!
  14. 14. Alfresco 4 Solr!•  Based on Solr 1.4.1!•  Uses a custom alfrescoDataType fieldType!•  Leverages dynamic schema fields heavily! •  Only statically defined field is ʻidʼ! •  Everything else (*) is a multi-valued dynamic field! •  Though it uses the Alfresco model dictionary under the hood!•  Analysis chain (same for index/query)! •  Whitespace tokenized ! •  Word Delimited! •  Breaks up camelCase etc.! •  Converted to lower case!•  Adds a cmis request handler!•  Uses SSL client certificate authentication!
  15. 15. Federating!
  16. 16. Federation Approaches!Build an index with a crawlerPros Cons •  Can index many different •  Timeliness! data sources! •  Pull model not suitable for all •  File systems! scenarios! •  Databases! •  Additional storage requirements! •  Indexing can be inefficient in a global scenario! •  Permissions!
  17. 17. Federation Approaches!Federated Search using OpenSearch•  A collection of simple formats for sharing search results! •  Can use an Atom response format! •  Elements such as totalResults used in CMIS Atom binding!•  Was a big deal in Alfresco 2.0 (2007)! •  Alfresco Explorer has an OpenSearch client! •  Alfresco has an OpenSearch server! •  Provided keyword search! •  Wiki stated: ʻNote: Advanced Web Client Search and Query Language searches will be OpenSearch enabled some time in the future, probably in line with up-and-coming CM standards.ʼ!•  Client not in Share!•  CMIS a better bet for complex queries!
  18. 18. Federation Approaches!Build a meta-search servicePros Cons •  Can work across •  Rebuilding the wheel?! heterogeneous search •  Authentication is a challenge engines! (without SAML or OAuth) ! •  Can implement asynchronous results!
  19. 19. Federation Approaches!Solr shards•  Treat separate Alfresco repositories Solr cores as separate shards!Pros Cons •  Distributed queries are a •  The repositories need to be standard Solr feature! backed by a single authentication source! •  E.g. LDAP! •  Asynchronous results arenʼt supported OOTB!
  20. 20. ʻIntelligentʼ storage!Storage Cloud Technology•  Underpinning for the repository is a storage cloud technology! •  Uses a Content Store Selector!•  Base layer built on commodity hardware! •  Keeps multiple replicas of the content!•  Management layer ! •  Cost-based routing! •  Knows where content resides!•  On-demand content migration between repositories!
  21. 21. Challenges!•  Large file size •  Has to work with streaming! •  Beware of anything that attempts to buffer a full file into memory! •  E.g. to POST it! •  Watch out for processes that need to copy a file!•  User expectations •  Need training on asynchronous behaviour! •  Search results and their appearance! •  Grouping / sort! •  Pagination (of distinct result sets)! •  Time to migrate large content! •  Can be lengthy if there isnʼt a ʻnearʼ copy!
  22. 22. Twitter: @rbramley
Blog: http://leanjavaengineering.com/! Web: http://www.ixxus.com !