Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution from Popular Building Blocks

569 views

Published on

Open source has allowed Big Data to emerge, and is now promoting Artificial Intelligence. Within the Search space, Apache Solr and Elasticsearch are the reference technologies, already widely used at a massive scale. Yet they remain building blocks, that target developers and not business users. In this presentation, we introduce Datafari, a complete open source Enterprise search solution, that embeds connectors to data sources, administration interfaces, scalability, semantic, security, and interoperability. We will be covering the following topics:

Origin of the product
Architecture and components
Functionalities
Demo
Use cases in Oil&Gas and Nuclear industries

Published in: Internet
  • Be the first to comment

II-SDV 2017: Datafari - Building an Open Source Enterprise Search Solution from Popular Building Blocks

  1. 1. Datafari - Building an Open Source Enterprise Search Solution from Popular Building Blocks CEDRIC ULMER FRANCE LABS II-SDV 25/04/17
  2. 2. Datafari So what is Datafari? • « Packaged solution » to analyse and search for documents and data • Can index heterogeneous data formats from multiple sources • Federated search interface • Apache v2 licence
  3. 3. Why Datafari ? Choice of the Apache Solr and Elasticsearch technologies (more about this later...) Three possibilities to answer a customer requirements : • Use a packaged solution available on the market from a 3rd party • Starting from Apache Solr or Elasticsearch (or others) • Develop, gather necessary components for each customer needs • Ensure « production-ready » material: docs, processes, tests. • Create our own packaged solution (yeah!)
  4. 4. Why Datafari ? Problems with 3rd party proprietary solutions: • Black box • Roadmap not clear • Resilience (bankrupt, acquisition…) Problems with 3rd party open source solutions: • Lack of technical documentation • Difficulty to setup an understandable debug environment • Delay in the embedded components updates: In particular Solr or ES • License issues (mostly viral ones) • Lack of resilience from the makers => Required us to develop our own solution to better address our customer needs
  5. 5. Why Datafari Idea: • Gather the best of both worlds : • The “packaged” aspect of existing solutions • Many functionalities • All in one • The flexibility of a solution based on Solr and ES • All of that with an Apache v2 licence ☺ • Focus on Enterprise Search: • Admin for search experts • Admin for search admin • Eased AD/LDAP management • Search and data analytics
  6. 6. Based on 4 building blocks: • Apache Solr • The heart of the search engine • Apache Manifold CF • Crawling documents • Ajax FranceLabs • UI • Elasticsearch • Data analytics Ajax FranceLabs
  7. 7. Datafari 3.1 Apache Tomcat 7 Data Sources Datafari Search / Admin Apache ManifoldCF CMS DB Fileshares Web Security (AD, LDAP) PostgreSQL Apache Solr 5.5 Document Index Statistics Index Apache ManifoldCF 2.5 Crawler Service Autorization Service ELK Cassandra (User Management)
  8. 8. Apache Solr Lucene based Full text search engine Apache Top Level project Large communauty (users/devs) Efficient/Reliable Scalable • High availability • Queries • Index volume
  9. 9. Apache Solr Webapp Java REST APIs XML/HTTP • Indexing • Querying Caching Web admin interface Configuration through XML config files or APIs
  10. 10. Apache Lucene/Solr – Some refs
  11. 11. Apache Solr for Datafari Search core of Datafari Preconfigured index for rich documents • Language detection • Standard facets • Autocomplete • Spellchecker Indexing user queries • Enables analytics on search users behavior
  12. 12. Datafari 3.1 Apache Tomcat 7 Data Sources Datafari Search / Admin Apache ManifoldCF CMS DB Fileshares Web Security (AD, LDAP) PostgreSQL Apache Solr 5.5 Document Index Statistics Index Apache ManifoldCF 2.5 Crawler Service Autorization Service ELK Cassandra (User Management)
  13. 13. Apache Manifold CF Framework for data crawling Management of incremental crawling Authorization management Programmable crawls (time windows, loads, regex…)
  14. 14. Apache Manifold CF Many off the shelf connectors: • FileShare (Samba) • JDBC • Website • Alfresco • CMIS • Sharepoint • Mail • Dropbox • LDAP/AD
  15. 15. Apache Manifold CF for Datafari Manages data crawling Manages authentication Preconfigured integration with our Solr
  16. 16. Datafari 3.1 Apache Tomcat 7 Data Sources Datafari Search / Admin Apache ManifoldCF CMS DB Fileshares Web Security (AD, LDAP) PostgreSQL Apache Solr 5.5 Document Index Statistics Index Apache ManifoldCF 2.5 Crawler Service Autorization Service ELK Cassandra (User Management)
  17. 17. Datafari Search Front-End User UI • AjaxFrance Labs Authentication Interactions with Solr (SolrJ) Indexing users queries Admin UI • Solr • ManifoldCF • Statistics
  18. 18. AjaxFranceLabs Inspired by AjaxSolr Javascript/Ajax client Provides several components: • Manager: backend connection • Widgets • Graphical/Logical components • (Advanced) Search • Facet • Geolocalisation (Based on OpenStreetMap)
  19. 19. Browser Datafari Server Datafari Search Manager SearchBarWidget ResultWidget FacetWidget Datafari Search Servlet Ajax
  20. 20. Use case 1 – Oil and Gas Sources: • Sharepoint • Documentum • Fileshare • DB Volume: 28 TB Users: Geoscientists
  21. 21. Use case 2 – Nuclear Sources: • Fileshare • Oracle • DB Volume: 15 M docs Users: Maintenance operators
  22. 22. Démo!!!
  23. 23. Technical Roadmap (1/2) New advanced search Solr 6 Graphical SolrCloud management Always more documentation Annotator
  24. 24. Technical roadmap (2/2) New languages Consolidation Unit test framework More dashboards in ELK Learning-to-Rank
  25. 25. Where can I find Datafari Main hub: http://www.datafari.com/en Source code available on Github: • https://code.google.com/p/datafari/ Install packages for Debian 7 and Windows available on: • www.datafari.com Forum: • https://groups.google.com/forum/#!forum/datafari Documentation on Confluence • Technical and functional Tickets and releases on Jira
  26. 26. Want to follow Datafari ? @francelabs #datafari francelabs francelabs
  27. 27. Become a Datafarian ! ☺ We are always open to suggestions • “Reorganise your docs…” Contribution • What about a German version ?! • UI widgets ? Most important: your use cases and usage feedback
  28. 28. CONTACT Don’t hesitate to reach out to us for any info Our corporate website: www.francelabs.com Email: contact@francelabs.com Tél: 09 72 43 72 85 Fax: 09 72 29 28 14

×