Enterprise search - big data

3,320 views

Published on

Enterprise search - Big data. Overview of the enterprise search engine panorama with open source and commercial solutions. What is different in enterprise search as opposed to web search? How enterprises can leverage big data. Quick overview on how Searchbox can be used for information retrieval. More information: http://www.searchbox.com/enterprise-search-big-data/

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,320
On SlideShare
0
From Embeds
0
Number of Embeds
635
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Enterprise search - big data

  1. 1. ENTERPRISE SEARCHby jonathan.rey@searchbox.com
  2. 2. ENTERPRISE SEARCHPANORAMAIBMDassault SystemsOpen sourceGoogleCommercial solutions (all acquired in the past 5 years)
  3. 3. WHAT IS SEARCHBOX?Searchbox leverages Apache Solr Technology and:Offers various Solr pluginsOffers a Search Framework which can be used todevelop custom search engines tied to businessneedsSearch-as-a-Service (On the cloud)
  4. 4. ENTERPRISE SEARCH VS.WEB SEARCHProductivityHeterogeneous dataFalse negative / False positiveStructured, semi-structured, unstructured data
  5. 5. PRODUCTIVITYWeb search isdriven by theengine.
  6. 6. HETEROGENEOUS DATAIntra/extranetWebsitesCMSFilesystemRepository (Files, XML)
  7. 7. FALSE NEGATIVES ARE AKILLEROn the web it’s ok not tofind a specific documentNot an option within acompanyReal time indexingLiability concernsCompliance (Why thisresult?)
  8. 8. ENTERPRISE DATA IS...StructuredSemi-structuredUnstructuredWeb Search today is “semi-structured”and somewhat consistent.
  9. 9. BUSINESS-CENTRICDATAData NormalizationAdaptation tobusiness needs=> Goal: ProductivitygainLinkedin is a great“enterprise search”example
  10. 10. WHAT IS BIG DATA?Distributed & disparate data from several sourcesStructured - semi structured - non structuredBig data & machine learningEnhance existing unstructured data (tagging, entityextraction, summarization)Content curation
  11. 11. FROM BIG TO SMALLDATA STACKScalable Backend infrastructure & archivingInformation RetrievalAnalysis / DiscoveryVisualizationSharepoint, Cassandra, Hadoop, Oracle, SAP, MangoDB, ...Solr, Lucene, Elastic Search, Business Warehouse,SAP BW, ...Searchbox backendSearchboxfrontendBig DataSmall Data
  12. 12. OUR APPROACH TOCONVERGENCE- Index- Crawl- Fields- Metadata- Facets- Filters- More Like This- Search Framework- Presets- Templating- Tagging- Summarization- SortingConnectDiscoverLift /EnhanceSpecialize
  13. 13. CONCLUSIONWorking with Big data is expensive and timeconsumingRequires high level of expertise in multiple fields(Networking, Programming, ML, NLP, Mathematics,Statistics, ...)Information Retrieval / mining can serve as aniterative tool to leverage value from big data
  14. 14. SEARCHBOX FOR BIGDATAData centric (Machine learning based enhancements)Solr storage (Solr 4.x as scalable key-value store)Hosted Solr Cluster with sharding and replicationIterative processGuided administration panelHuman friendly as opposed to CLI
  15. 15. “Sort by”“Clickabletags”Rangefacets withhistogramSearchbox demo onhttp://pubmed.searchbox.comSearch withautocomplete

×