Alfresco WebScript Connector for Apache ManifoldCF


Published on

A quick overview about Apache ManifoldCF and the latest work about the new Alfresco connector shown at Codemotion Rome 2013

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Alfresco WebScript Connector for Apache ManifoldCF

  1. 1. Apache ManifoldCFAlfresco WebScript Repository Connector Alfresco Meetup Rome 2013
  2. 2. About me● Open Source ECM Specialist at Sourcesence● Author and Technical Reviewer at Packt Publishing ○ Alfresco 3 Web Services (2010) ○ GateIn Cookbook (2012)● Alfresco Community (nickname OpenPj) ○ Alfresco Community Star ○ Alfresco Wiki Gardener ○ Top 10 supporter (english and italian) ○ Moderator of the italian forum● PMC Member and Committer at the Apache Software Foundation● JBoss Community ○ Content editor for ○ Project Leader and Committer for PortletSwap / Blog / Wiki
  3. 3. Overview● Introducing Apache ManifoldCF ○ What is ManifoldCF? ○ Why ManifoldCF? ○ Architecture ○ Who is using ManifoldCF? ○ The book● How ManifoldCF supports Alfresco● The goal of the new connector ○ Architecture ○ Roadmap ○ The team● Resources
  4. 4. The storyThe original ManifoldCF code base was granted by MetaCarta to theApache Software Foundation in December 2009.The MetaCarta effort represented more than five years of successfuldevelopment and testing in multiple, challenging enterpriseenvironments.The project was graduated as Apache Top Level Project in July 2012.
  5. 5. What is ManifoldCF?Open Source crawler ● crawling model (add, change, delete) ● schedule jobs to create indexes ○ get contents from repositories ○ push contents on search servers Repository 1 Search Server 1 Repository 2 Apache ManifoldCF Search Server 2 Repository 3 Search Server 3
  6. 6. What is ManifoldCF?● Out-Of-The-Box it is distributed as a webapp ○ REST API ○ Authority Service ○ Crawler UI● can be embedded in any Java application
  7. 7. Why ManifoldCF?● Reliability● Incremental● Flexible● Multi repositories● Security model● Monitoring
  8. 8. Why ManifoldCF? - ReliabilityJobs scheduling and configuration are stored in the database tomaintain the state of all the executions Repository Pull Agent Daemon Search Server configuration and scheduling Database
  9. 9. Why ManifoldCF? - Incrementalget content changesets obtained from the repository API Repository complete changesets Apache ManifoldCF
  10. 10. Why ManifoldCF? - FlexibleIf the repository cant supply all the changes Manifold candiscover them through crawling Repository incomplete changesets Apache Manifold CF Change Discovery N1 N2
  11. 11. Why ManifoldCF? - Multi repositoriesJobs can retrieve contents from the following repositories: ● CMIS-compliant ● Alfresco ● IBM FileNet ● EMC Documentum ● Microsoft SharePoint ● OpenText LiveLink ● Autonomy Meridio ● Memex Patriarch ● Windows Share/DFS ● Generic JDBC ● Generic Filesystem ● Generic RSS and Web
  12. 12. Why ManifoldCF? - Multi repositoriesJobs can ingest contents to the following searchservers:● Apache Solr● ElasticSearch● OpenSearchServer● MetaCarta GTS
  13. 13. Why ManifoldCF? - Security modelRetrieve per-content ACLs Authority 1 Authority Service Authority 2 Authority 3 Repository 1 Repository 2 Pull Agent Daemon user access Repository 3 tokens doc access tokens user specific Search Server search results
  14. 14. Why ManifoldCF? - MonitoringUI Crawler allows you to: ● configure jobs and connectors ● monitor jobs execution ● monitor contents ingestion ○ status reports ■ document status ■ queue status ○ history reports ■ simple history ■ maximum activity ■ maximum bandwidth ■ result histogram
  15. 15. Architecture - Job Authority Connector ACLs Repository Connector retrieve Output content ACL Connector Repository Job Search Serverquery to retrieve contents - metadata mapping - verbal description - content ingestion - crawling model - scheduling
  16. 16. Who is using ManifoldCF?
  17. 17. The book: ManifoldCF in ActionManifoldCF in Actionby Karl Wrightpublished by ManningKarl is the original developer and theprincipal committer of Apache ManifoldCFThe book is available at
  18. 18. How ManifoldCF supports Alfresco● CMIS Repository Connector based on OpenCMIS● The current Alfresco Repository Connector only supports CML ○ works on any version of Alfresco 2.x, 3.x and 4.x ○ no support for quering Solr from Alfresco ○ it will die at the end of the year ○ Please see the Alfresco Roadmap
  19. 19. Alfresco Solr search subsystem● Remote crawling of contents and ACLs into Solr ○ REST API for retrieving changesets from Alfresco db● Solr server provided by Alfresco ○ based on Apache Solr 1.4.1 (uhm...really!!!???)● hardcoded● cant be used with your own Solr instance ○ customers have newer version of Solr ■ interested in new features (SolrCloud, sharding...) ■ hundred of improvements available in 3.x and 4.x
  20. 20. Alfresco Solr search subsystem Tra nsa ctio Solr 1.4.1 Alfresco ns a nd A CL (provided by Alfresco) Alfresco REST Client alf_transaction alf_acl_* alf_node_* Indexes
  21. 21. Roadmap
  22. 22. Goal - 1Create a new connector using the Alfresco REST Client● provided and supported by Alfresco ○ for us is a Maven dependency :)● invokes the Alfresco Solr API
  23. 23. Goal - 2 - check feasibilityCreate a real Enterprise alternative for managing indexes● compatibility with the SearchService of Alfresco● repository takes care only of contents● indexes are managed externally● no redundancy for indexeseffort to redirect queries executions
  24. 24. Goal - 3 - Security Implement an Alfresco authority connector ○ manages ACLs indexing
  25. 25. Goal - 4Manage indexes using ManifoldCF against any supportedsearch server● Apache Solr 3.x / 4.x● ElasticSearch● Open Search Server● MetaCarta
  26. 26. Architecture ManifoldCF Search Alfresco Alfresco WebScript Server Repository Connector Alfresco REST Client alf_transaction Output Connector alf_acl_* Indexes alf_node_*
  27. 27. The team of the new connector● Piergiorgio Lucidi (Sourcesense + ASF)● Maurizio Pillitu (Alfresco)● Aingaran Pillai (Zaizi) [new entry]● Fran Alvarez (Zaizi) [new entry]● Abraham Ayala (Zaizi) [new entry]
  28. 28. Join us!● We are looking for developers● this is a work in progress● dont fork the project feel free to join us ^__^
  29. 29. Resources● Apache ManifoldCF● The connector hosted on github:● it will be included in Apache ManifoldCF
  30. 30. Thank you for your attention!