Apache Solr
la piattaforma di ricerca enterprise
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Chi sono

Luca Bonesini

Informatico Lanciatore di giavellotti
Programmatore
Suonatore di chitarra basso
Sistemista Impren...
Sourcesense

Making sense of Open Source

Contributors
Lucene/Solr
Apache Chemistry
Apache Jackrabbit
OpenSSO-Alfresco
Com...
Lucene e Solr
Cosa sono?
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Apache Lucene (core)

Search by ASF

“Apache Lucene is a high-performance, fullfeatured text search engine library written...
Apache Solr

Search by ASF

“Solr is the popular, blazing fast open source
enterprise search platform from the Apache Luce...
Apache Solr

Search by ASF

Solr is written in Java and runs as a standalone full-text
search server within a servlet cont...
Enterprise

Search

La ricerca con la cravatta
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Enterprise Search, cosa e come.
“Enterprise search is the practice of
making content from multiple enterprisetype sources,...
Enterprise Search, cosa e come.

LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Enterprise Search, cosa e come.
●

●

●

●

●

●

●

Crawler: an Internet bot that systematically browses the World Wide W...
Search e Open Source
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Enterprise Search: prodotti e vendor
Vendors of proprietary enterprise
search software
AskMeNow, Attivio, Concept Searchin...
Open Source, lo fanno anche loro.

LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Open Source
Open Standard

Interoperabilità

Innovazione

Perché Innovazione = Bu$ine$$

OAGi OASIS
W3C IETF IEEE
ETSI Ecm...
Solr e Business
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Solr features
●

●

●

●

●

●

Advanced Full-Text Search
Capabilities
Optimized for High Volume Web
Traffic
Standards Bas...
Search, già una 'commodity'
S e a rc h is E v e ry w h e re ! K e y w o rd s e a rc h is a c o m m o d ity
H o lis tic v i...
Smart senza Search?

LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Solr: chi lo usa?

Buy.com

LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Oltre il Search
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Un caso di successo

LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
Buon search a tutti.

Grazie!

Luca Bonesini
www.sourcesense.com
l.bonesini@sourcesense.com
Tel. +39 366 688 7125
www.luca...
Upcoming SlideShare
Loading in …5
×

Apache Solr, il motore di ricerca enterprise open source

1,782 views

Published on

Evento Titulus User Group del 4 dicembre 2013, organizzato da Kion/Cineca a Bologna.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,782
On SlideShare
0
From Embeds
0
Number of Embeds
258
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Apache Solr, il motore di ricerca enterprise open source

  1. 1. Apache Solr la piattaforma di ricerca enterprise LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  2. 2. Chi sono Luca Bonesini Informatico Lanciatore di giavellotti Programmatore Suonatore di chitarra basso Sistemista Imprenditore IT Manager Marito http://www.lucabonesini.it Tecnico di prevendita Mountainbike-ista Webmaster Padre2 @lbonesini http://it.linkedin.com/in/lucabonesini/ l.bonesini@sourcesense.com +39 366 688 7125 Venditore Cantore Markettaro LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  3. 3. Sourcesense Making sense of Open Source Contributors Lucene/Solr Apache Chemistry Apache Jackrabbit OpenSSO-Alfresco Committers Lead developer Hibernate Search Lucene Project Infinispan Apache/UIMA project integration JBoss GateIn Portal LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  4. 4. Lucene e Solr Cosa sono? LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  5. 5. Apache Lucene (core) Search by ASF “Apache Lucene is a high-performance, fullfeatured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform”. http://lucene.apache.org/core/ fast and efficient scoring and indexing algorithms lots of contributions to make common tasks easier: highlighting, spatial, query parsers, benchmarking tools, etc. most widely deployed search library on the planet LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  6. 6. Apache Solr Search by ASF “Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, near realtime indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search”. Highly reliable, scalable, fault tolerant, distributed indexing, replication, load-balanced querying, automated failover and recovery, centralized configuration. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  7. 7. Apache Solr Search by ASF Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. http://lucene.apache.org/solr Access Lucene over HTTP: Java, XML, Ruby, Python, .NET, JSON, PHP, etc. Most programming tasks in Lucene are configuration tasks in Solr Faceting (guided navigation, filters, etc.) Replication and distributed search support LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  8. 8. Enterprise Search La ricerca con la cravatta LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  9. 9. Enterprise Search, cosa e come. “Enterprise search is the practice of making content from multiple enterprisetype sources, such as databases and intranets, searchable to a defined audience”. [wikipedia] Ingestion → Processing and analysis → Indexing → Query parsing → Matching Ingestion → Processing and analysis → Indexing → Query parsing → Matching Pull Integration API Push Crawler connector Documents types and formats ( XML, HTML, Office, etc.) to plain text Stemming, lemmatization, synonym expansion, entity extraction, part of speech tagging, tokenization. Dictionary of all unique words in the corpus. Ranking. Term frequency. User query. Faceting. Paging. Query-index comparison. References to source documents. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  10. 10. Enterprise Search, cosa e come. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  11. 11. Enterprise Search, cosa e come. ● ● ● ● ● ● ● Crawler: an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (also called Web spider, ant, automatic indexer, web scutter Precision/Recall: in pattern recognition and information retrieval, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved Stemming: the process for reducing inflected (or sometimes derived) words to their stem, base or root form (ie: "fishing", "fished", and "fisher" to the root word, "fish") Lemmatization: in linguistics is the process of grouping together the different inflected forms of a word so they can be analysed as a single item (ie: word "better" has "good" as its lemma) Named-entity recognition (entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Part of speech: a linguistic category of words (or more precisely lexical items), which is generally defined by the syntactic or morphological behaviour of the lexical item in question (ie: noun and verb) Tokenization: the process of demarcating and possibly classifying sections of a string of input characters. The resulting tokens are then passed on to some other form of processing. The process can be considered a sub-task of parsing input. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  12. 12. Search e Open Source LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  13. 13. Enterprise Search: prodotti e vendor Vendors of proprietary enterprise search software AskMeNow, Attivio, Concept Searching Limited, Content Analyst Company LLC, Coveo, Dassault Systèmes (acquired Exalead), Denodo, Dieselpoint, Inc., dtSearch Corp., EMC Corp., Exorbyte GmbH, Expert System S.p.A., Exterro, Inc., Fabasoft, Funnelback, Google Search Appliance, HP (acquired Autonomy Corporation which in turn acquired Verity K2 and Ultraseek), IBM (acquired Vivisimo), Inbenta, inter:gator Enterprise Search, ISYS Search Software, MarkLogic, Microsoft (includes Microsoft Search Server, Fast Search & Transfer), Mindbreeze, Neofonie (includes WeFind), Omniture (acquired by Adobe Systems), Open Text Corporation, Oracle Corporation (includes Secure Enterprise Search and Endeca Technologies Inc.), Perception Software, PolySpot, Q-go, Q-Sensei, Recommind, SAP (includes SAP NetWeaver Enterprise Search, Search Services in SAP NetWeaver AS ABAP, and Search and Classification TREX), Sinequa, SLI_Systems, Sophia Search Limited, TeraText, X1 Technologies, Inc., ZyLAB Technologies, ZL Technologies Free and open source enterprise search software Apache Solr, DataparkSearch, ElasticSearch, ht://Dig, Jumper 2.0, mnoGoSearch, OpenSearchServer, Searchdaimon, Sphinx V e n d o rs o f o p e n s o u rc e e n te rp ris e s e a rc h s o ftw a re 3 0 D ig its ,p a c h e S o ftw a re A F o u n d a tio Lu cid W o rks , ,n S e m a te x t, F la x LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  14. 14. Open Source, lo fanno anche loro. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  15. 15. Open Source Open Standard Interoperabilità Innovazione Perché Innovazione = Bu$ine$$ OAGi OASIS W3C IETF IEEE ETSI Ecma OGF IEC ISO ITU CENELEC CEN BSI UNI CEI DKE DIN AFNOR GIETS LDTI LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  16. 16. Solr e Business LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  17. 17. Solr features ● ● ● ● ● ● Advanced Full-Text Search Capabilities Optimized for High Volume Web Traffic Standards Based Open Interfaces XML, JSON and HTTP Comprehensive HTML Administration Interfaces Server statistics exposed over JMX for monitoring Linearly scalable, auto index replication, auto failover and recovery ● A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys ● Powerful Extensions to the Lucene Query Language ● Faceted Search and Filtering ● Geospatial Search with support for multiple points per document and geo polygons ● Advanced, Configurable Text Analysis ● Highly Configurable and User Extensible Caching ● Performance Optimizations ● External Configuration via XML ● An AJAX based administration interface ● Monitorable Logging ● Fast near real-time incremental indexing and index replication ● ● ● ● Highly Scalable Distributed search with sharded index across multiple hosts JSON, XML, CSV/delimited-text, and binary update formats Easy ways to pull in data from databases and XML files from local disk and HTTP sources Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika ● ● ● ● Near Real-time indexing Flexible and Adaptable with XML configuration Extensible Plugin Architecture Apache UIMA integration for configurable metadata extraction ● Multiple search indices Related Projects: Apache Hadoop, Apache ManifoldCF, Apache Lucene.Net, Apache Lucy, Apache Mahout, Apache Nutch, Apache OpenNLP, Apache Tika, Apache Zookeeper LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  18. 18. Search, già una 'commodity' S e a rc h is E v e ry w h e re ! K e y w o rd s e a rc h is a c o m m o d ity H o lis tic v ie w o f th e d a ta a n d th e u s e rs is c ritic a l S c a la b le S e a rc h , D is c o v e ry a n d A n a ly tic s a re th e k e y to u n lo c k in g th is v ie w o f u s e rs a n d d a ta Documen ts Content Relationships User interacti on Access Traditional • Fast, fuzzy text matching across a large document collection • De-normalized data, “light” relational • Top N problems • Key-value (top 1) • Recommendations • “Good enough” classification, clustering • Faceting, slicing and dicing of enumerated data • Spatial, spell checking, record linkage, highlighting And: ●eCommerce ●Search + Recs + Analysis of users ●Knowledge Management ●Financial, transportation, pharma ●Fraud detection ●Social media ●Trend monitoring ●Information technology ●Log monitoring, analysis ●Healthcare ●DNA Analysis • NoSQL LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  19. 19. Smart senza Search? LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  20. 20. Solr: chi lo usa? Buy.com LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  21. 21. Oltre il Search LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  22. 22. Un caso di successo LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  23. 23. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  24. 24. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  25. 25. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  26. 26. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  27. 27. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  28. 28. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  29. 29. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  30. 30. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  31. 31. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  32. 32. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  33. 33. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  34. 34. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  35. 35. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  36. 36. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  37. 37. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  38. 38. LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013
  39. 39. Buon search a tutti. Grazie! Luca Bonesini www.sourcesense.com l.bonesini@sourcesense.com Tel. +39 366 688 7125 www.lucabonesini.it twitter: @lbonesini skype: lbonesini LucaBonesini | Titulus User Group, Kion – Bologna 4/dic/2013

×