Apache ManifoldCF
Upcoming SlideShare
Loading in...5
×
 

Apache ManifoldCF

on

  • 5,578 views

An overview on Apache ManifoldCF the Open Source crawler that allows to configure jobs to manage search indexes taking contents from repositories.

An overview on Apache ManifoldCF the Open Source crawler that allows to configure jobs to manage search indexes taking contents from repositories.

Statistics

Views

Total Views
5,578
Views on SlideShare
5,222
Embed Views
356

Actions

Likes
2
Downloads
103
Comments
1

5 Embeds 356

http://www.open4dev.com 342
http://dave.thehorners.com 7
https://dave.thehorners.com 4
http://www.linkedin.com 2
http://inotes.lvm.de 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • great work!
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Apache ManifoldCF Apache ManifoldCF Presentation Transcript

  • Apache ManifoldCF
  • Overview● The story● What is ManifoldCF?● Why ManifoldCF?● Architecture● The 0.3-incubating version● The 0.4-incubating version● Whats new in the 0.5-incubating● The book: ManifoldCF in Action● Demo● Resources
  • The storyThe original ManifoldCF code base was granted by MetaCarta Inc.,to the Apache Software Foundation in December 2009.The MetaCarta effort represented more than five years of successfuldevelopment and testing in multiple, challenging enterpriseenvironments.The project is in the Apache Incubator because the community wasnot yet diverse enough, but now the project is towards graduation. ^__^
  • What is ManifoldCF?● Open Source crawler ○ schedule jobs to create indexes ■ get contents from repositories ■ push contents on search servers
  • What is ManifoldCF?● Open Source crawler ○ schedule jobs to create indexes ■ get contents from repositories ■ push contents on search servers● Out-Of-The-Box it is distributed as J2EE web apps ○ REST API ○ Authority Service ○ Crawler UI● Can be embedded in any Java application
  • Why ManifoldCF?● Reliability● Incremental● Multi repositories● Security model● Monitoring
  • Why ManifoldCF? - ReliabilityJobs scheduling and configuration are stored in the databaseto maintain the state of all the executions
  • Why ManifoldCF? - IncrementalJobs can be optionally configured to re-visit contentsincrementally
  • Why ManifoldCF? - Multi repositoriesJobs can retrieve contents from the following repositories: ● CMIS-compliant ● Alfresco ● IBM FileNet ● EMC Documentum ● Microsoft SharePoint ● OpenText LiveLink ● Autonomy Meridio ● Memex Patriarch ● Windows Share/DFS ● Generic JDBC ● Generic Filesystem ● Generic RSS and Web
  • Why ManifoldCF? - Multi repositoriesJobs can ingest contents to the following search servers: ● ElasticSearch ● OpenSearchServer ● Apache Solr ● MetaCarta GTS
  • Why ManifoldCF? - Security modelRetrieve per-content ACLs
  • Why ManifoldCF? - MonitoringUI Crawler allows you to: ● configure jobs and connectors ● monitor jobs execution ● monitor contents ingestion ○ status reports ■ document status ■ queue status ○ history reports ■ simple history ■ maximum activity ■ maximum bandwidth ■ result histogram
  • Architecture● Pull Agent Daemon ○ Jobs ■ Repository Connectors ■ Output Connectors ■ Authority Connectors
  • Architecture● Pull Agent Daemon (the core service) ○ Jobs (execute the ingestion tasks) ■ Repository Connectors (retrieve contents) ■ Output Connectors (ingest contents) ■ Authority Connectors (retrieve ACLs)
  • Architecture
  • Architecture - JobA job is an ingestion work that consists of: ○ verbal description ○ repository connection ■ authority connection (optional) ○ metadata mapping ○ output connection (search server) ○ crawling model ○ scheduling information (on demand or time ranges)
  • Architecture - Job
  • The 0.3-incubating version● CMIS Repository Connector● OpenSearchServer Output Connector● Scripting Language● New Maven build process● Several bug fixes
  • The 0.4-incubating version● Alfresco Connector● JDBC Connector now supports MySQL● CMIS Connector upgraded to OpenCMIS 0.5.0● Several bug fixes
  • Whats new in the 0.5-incubating● Apache Velocity for connectors UI templates● ElasticSearch Output Connector● CMIS Connector upgraded to OpenCMIS 0.6.0● Prebuild connector support: just add jars and go!● New Japanese localization● Several bug fixes
  • The book: ManifoldCF in ActionManifoldCF in Actionby Karl Wrightpublished by ManningKarl is the original developer and theprincipal committer of Apache ManifoldCFThe book is available at the following site:http://www.manning.com/wright
  • DEMO
  • ResourcesHomepage:http://incubator.apache.org/connectorsDownload page:http://incubator.apache.org/connectors/download.html
  • Thank you for your attention!