SlideShare a Scribd company logo
1 of 47
Download to read offline
Patrick Beaucamp
Founder of the Vanilla Project
Mail : Patrick.beaucamp@bpm-conseil.com
Custom Open Source Search Engine with Drupal 8
and Solr at French Ministry of Environment
II-SDV, Nice 24th April 2017
1II-SDV, Nice
Presentation Agenda
Open Source Search Engine & Search Platform
Some interesting Platforms
Features expected for Search Platforms (Interface)
2II-SDV, Nice
Open Source Platform at French Ministry
Project Context
Platform Architecture
WebSite Powered by a Search engine
Echo : Tuesday am, presentation from Deep Search 9 and
Tuesday pm prssentation from FranceLabs
Personal Experience of Search
Searching … and finding !
II-SDV : SEARCH, DATA MINING and
VISUALISATION
3II-SDV, Nice
How many times per day do you Google ? (search,
maps, translate …)
Tribute to Open Source at II-SDV
Search is the first Step : collecting information
Searching … and finding !
4II-SDV, Nice
Searching … and finding !
An exemple – my personal experience
5II-SDV, Nice
I tried to find a person during 23 years, roughly from 1993
to 2016
From 1993 to 1998 : no search engine available …
only private investigator ?
From 1999 to 2015 : regular Search – no results
I founded this person on facebook, not on google
From a browser : « f + tab » … « g + tab », « y + tab » …
Some years : no search, other years : multiples search
Searching … and finding !
6II-SDV, Nice
1) We all became private investigators one day or another
Searching … and finding !
7II-SDV, Nice
Searching … and finding !
8II-SDV, Nice
2) Different search engine lead to different results
Searching … and finding !
9II-SDV, Nice
2) Different search engine by country
Searching … and finding !
10II-SDV, Nice
Funny word : SEO … its more « how to be found on
Internet » … and you need to pay for it !
Searching … and finding !
11II-SDV, Nice
3) The person I was looking published on facebook using
his/her real name – its his/her decision to be visible or not
4) Where do we stand with the « Right to Forget »
Searching … and finding !
12II-SDV, Nice
Companies like Facebook have tons of data : they need to
provide search infrastructure (indexing + search interface)
I was lucky to make a try with facebook search interface
Searching … and finding !
13II-SDV, Nice
Discovery of Cholera – 1854 (John Snow)
http://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak
Searching … and finding !
14II-SDV, Nice
Bicycle Accident in Street : who is taking care of trafic management
Example in Boston :
http://www.boston.com/bostonglobe/editorial_opinion/blogs/the_angle/2010/12/bike_crash_map.html
Open Data
Searching … and finding !
15II-SDV, Nice
LION – 2016 (Garth Davis)
Mistake 1 : Ganesh Tanei – Mistake 2 : Saroo
OpenSource LandScape
16II-SDV, Nice
Crawling
Indexing
Storing
WebSite
Reference
WebSite
Accessibility
Update Management
Search Interface
Result Visualization
Auto Completion
Natural Language
Voice Recognition
Maps
Ads
Unstructured data
Access Management
Search Platform Objectives
Constraints : being able to reach WebSite and content :
Internal WebSites (Intranet) & External WebSites
Internal Document Repositories
17II-SDV, Nice
Being able to index WebSite content (and page updates)
Beeing able to store unstructured data
Crawling
Storing
Indexing
Search Platform Objectives
18II-SDV, Nice
Provide usable Search results (auto classification,
visualization)
Don’t Forget why and what you search :
• You search in existing documents
• You need visualization tools
• Its not a crystal ball : search reflects the past
Provide usable Search interfaces (semantic search, multi
language search …)
Search Interface
Result Visualization
19II-SDV, Nice
Lucene is a java based indexing and search API
Solr/Lucene is the leading server extension of Lucene. 2 companies, LucidWorks
(Fusion) and ElasticSearch, provides packaging and extension of top of Lucene
and Solr.
-Nutch is the crawling component
-Tika is a document Metadata manager – content analysis toolkit
-Zookeeper is a multi thread process manager
OpenSource LandScape
20II-SDV, Nice
-Search Landscape
-Lucene : http://lucene.apache.org
-Solr/Lucene : http://lucene.apache.org/solr/
-Plateform OpenSearch : http://www.open-search-server.com
-Plateform Katta : http://katta.sourceforge.net
-Plateform LucidWorks : http://www.lucidworks.com
-Plateform ElasticSearch : http://www.elasticsearch.com
-Sphinx : http://sphinxsearch.com/
-Cloudera : https://www.cloudera.com/documentation/enterprise/5-5-
x/topics/search_architecture.html
-FranceLabs : http://www.francelabs.com/ (Datafari)
-AklaBox : www.aklabox.com (AklaSearch)
OpenSource LandScape
21II-SDV, Nice
Lucene : Retrieval Software library
Use existing Search Infrastructure like Solr/Lucene (Vanilla certified)
http://www.lucidworks.com/ or http://www.elasticsearch.org/
Search Engine Focus
22II-SDV, Nice
-Cloudera with Solr/Cloud (Solr/Lucene)
-Mapr with ElasticSearch (Lucene code)
-HortonWorks with LucidWorks (Solr/Lucene)
Hadoop Search Platform - Big Data
23II-SDV, Nice
Before indexing your document base, you need to access it !
Apache Nutch is a highly extensible and scalable open source web crawler
software project.
Reference : http://nutch.apache.org/
Nutch
24II-SDV, Nice
Solr
• What is Solr
– Indexation and Search Engine
• Promoted by the Apache Foundation
• Built on Top of Apache Lucene (Java Search library)
– Major engine characteristics
• Scalable, fault tolerance, distribution indexation process, dynamic
workload balancer, centraized configuration
– Technical environment
• Java
• Embeded Jetty server for platform administration
25II-SDV, Nice
Solr
Main characteristics
Admin Interface
Flexible and scalable Configuration
Modular
Multiple index management with a signle instance
26II-SDV, Nice
Solr
Main characteristics
Standard communication interfaces (html, xml, json)
Configuration can be done with or without schema
Real time Indexation
27II-SDV, Nice
Solr
Main characteristics
Customizable Full Text analysis
Rich documents indexation (using Tika)
28II-SDV, Nice
Solr
Main characteristics
Search by facet and filters
Term suggestion and orthograph correction
Geospatial Search
29II-SDV, Nice
Solr
Solr behavior
30II-SDV, Nice
-Synonyms
- It is possible to extend the search to synonyms if they are listed in a
glossary. For example, to find articles containing synonyms to “TV” when
you search with the word TV.
-Metadata
- Dictionary for list of searchable keywords
Search Engine Basic (1/2)
31II-SDV, Nice
-Reserved Words, Protected Words
- Indexing usually uses stemming, which is to reduce words to their root, for
example "Developp" to find items also contain the word when trying to
develop the word development. However, sometimes there are adverse
lemmatizations, indexing under one lemma two words that have no
relation. It is possible to prevent the stemming of words by listing them in
a file protwords.txt.
-StopWords
- The stopwords are meaningless words. A word considered insignificant
will be ignored. Note that some words are insignificant in some contexts,
others have homonyms signifiers. For example, can refer to a summer
season (rather mean) or past participle of the verb to be (relatively
insignificant). Stopwords.txt the file looks like this
Search Engine Basic (2/2)
32II-SDV, Nice
-Multi Language support (this is where commercial search engine have still more
to bring to customer), even there is now Asian type language support (Hindi,
Thai, Chineese, …)
-Elision :
- Elisions are a feature of the French, which consist of a contraction of the
words like or when they are followed by a vowel. Example: + aircraft gives
the aircraft. It is possible to remove these elisions using a lexicon.
-Limits solved other the past 3 years
• Full text search interface (language with search engine)
• SubQuery support : now its ok starting with Solr 4.7 (we are v6)
• Scalability (this is where Solr is taking technical advantage)
Search Engine Current Limits
33II-SDV, Nice
-Advance indexing and querying tools.
-Provides distributed searching capabilities to prevent bottleneck for a particular
server.
-Provides document excerpts (snippets) generation that provides summary of the
search
-Relevance ranking display extracts from the documents based on the query.
Search Interface expectation (1/3)
34II-SDV, Nice
-Duplicate document detection, including fuzzy near duplicates
-Rich Document Parsing and Indexing without using Database Indexing.
-Ranking control carry out a targeted ranking of individual documents.
-Search Grouping by Type / Tag / Categories (General page, documents, images)
Search Interface expectation (2/3)
35II-SDV, Nice
-Multi Criteria support
-Ranking
-Natural language support
-Apps Support (Android, Ipad)
Search Interface expectation (3/3)
Project at Ministry
Initial decision and guidelines from Ministry
36II-SDV, Nice
New WebSite will be done using Drupal CMS 8.2
WebSite should be powered by a « Google alike Search Toolbar »
WebSite – Infrastructure – should connect with multiples other
WebSite
All Infra (Software) must be Open Source components
Project at Ministry
37II-SDV, Nice
http://www.developpement-durable.gouv.fr/
Project at Ministry
38II-SDV, Nice
http://www.developpement-durable.gouv.fr/
Project at Ministry - Architecture
39II-SDV, Nice
Project at Ministry - Architecture
40II-SDV, Nice
Project at Ministry - Technical
41II-SDV, Nice
Projects Steps
Nutch crawler for various WebSite
• Facebook, LinkedIn, Twitter, Youtube …
• Internal WebSite, Previous WebSite
Drupal Forms for Metadata & indexation
• Specific Forms for different kind of documents
• Drupal CMS process to add new content
Drupal 8 Module for Solr : custom search, monitoring, reporting
• Existing drupal solr is limited to single instance of drupal
• Not possible to use Solr Admin interface
Project at Ministry - Technical
42II-SDV, Nice
Additional PHP libraries
Curl : Communication Drupal-Solr (http-get http-post & attached file)
Ssh2 : server administration command
Zookeeper : Communication Drupal-Zookeeper
MemCached : Communication Drupal-Memcached
Solarium : Communication Drupal-Solr (abstraction layer)
GoogleApi : youtube content indexation
Project at Ministry – Admin Interface
43II-SDV, Nice
Drupal8 Addon to setup the global infrastructure (Zookeeper, Solr)
Project at Ministry – Admin Interface
44II-SDV, Nice
Drupal8 Addon to monitor the global infrastructure - Statistics
Project at Ministry - Validation
45II-SDV, Nice
Projects Validation & Deployment
No problems with Zookeeper, Solr, Nutch
Stress tests for the global platform : initial slow down with 10 000
simultaneous connection
Sub-Project : Adressing the Single Point of Failure
Solution : Problems with Drupal & MySql -> MemCached
Project at Ministry - Next
46II-SDV, Nice
Next Steps
Review of WebSite content … new Ministry
New Content to be indexed :
• Other WebSite and Social Content
• New set of document to be added in the repository
47II-SDV, Nice

More Related Content

What's hot

II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataII-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataDr. Haxel Consult
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftDr. Haxel Consult
 
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
ICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPDr. Haxel Consult
 
ICIC 2017: New product presentationsLighthouse IP
ICIC 2017: New product presentationsLighthouse IPICIC 2017: New product presentationsLighthouse IP
ICIC 2017: New product presentationsLighthouse IPDr. Haxel Consult
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheDr. Haxel Consult
 
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...Dr. Haxel Consult
 
II-PIC 2017: Product Presentation LexisNexis
II-PIC 2017: Product Presentation LexisNexisII-PIC 2017: Product Presentation LexisNexis
II-PIC 2017: Product Presentation LexisNexisDr. Haxel Consult
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...Dr. Haxel Consult
 
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingII-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingDr. Haxel Consult
 
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...Dr. Haxel Consult
 
II-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirII-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirDr. Haxel Consult
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Thailand
 

What's hot (20)

II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataII-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
II-SDV 2017: Spotting the Stars in your Galaxy of Patent Data
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoft
 
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
ICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IP
 
ICIC 2017: New product presentationsLighthouse IP
ICIC 2017: New product presentationsLighthouse IPICIC 2017: New product presentationsLighthouse IP
ICIC 2017: New product presentationsLighthouse IP
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ Karlsruhe
 
II-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICSII-SDV 2016 GRIDLOGICS
II-SDV 2016 GRIDLOGICS
 
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
 
II-PIC 2017: Product Presentation LexisNexis
II-PIC 2017: Product Presentation LexisNexisII-PIC 2017: Product Presentation LexisNexis
II-PIC 2017: Product Presentation LexisNexis
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
 
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...
 
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingII-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping
 
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2016 VantagePoint
II-SDV 2016 VantagePointII-SDV 2016 VantagePoint
II-SDV 2016 VantagePoint
 
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
II-SDV 2016 Irene Kitsara - Patent Landscape Reports and Other WIPO Activitie...
 
II-SDV 2016 Questel Intellixir
II-SDV 2016 Questel IntellixirII-SDV 2016 Questel Intellixir
II-SDV 2016 Questel Intellixir
 
AI-SDV 2021 - Deep SEARCH 9
AI-SDV 2021 - Deep SEARCH 9AI-SDV 2021 - Deep SEARCH 9
AI-SDV 2021 - Deep SEARCH 9
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 

Similar to II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at French Ministry of Environment

II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...Dr. Haxel Consult
 
II-PIC 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-PIC 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...II-PIC 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-PIC 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...Dr. Haxel Consult
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research CentreMichael Hausenblas
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outlineIan Duncan
 
Web3.0 or The semantic web
Web3.0 or The semantic webWeb3.0 or The semantic web
Web3.0 or The semantic webDarren Wood
 
Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong KongSammy Fung
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteDeep Kayal
 
The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011David F. Flanders
 
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebDataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebJohn Breslin
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Go open2010 sde_20100417
Go open2010 sde_20100417Go open2010 sde_20100417
Go open2010 sde_20100417Sandro D'Elia
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarSpazioDati
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Amit Sheth
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 

Similar to II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at French Ministry of Environment (20)

II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
II-SDV 2014 Search and Data Mining Open Source Platforms (Patrick Beaucamp - ...
 
II-PIC 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-PIC 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...II-PIC 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-PIC 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
 
Searching tech2
Searching tech2Searching tech2
Searching tech2
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research Centre
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
 
Web3.0 or The semantic web
Web3.0 or The semantic webWeb3.0 or The semantic web
Web3.0 or The semantic web
 
Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong Kong
 
Information Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ DeloitteInformation Extraction from Text, presented @ Deloitte
Information Extraction from Text, presented @ Deloitte
 
The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011
 
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebDataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Go open2010 sde_20100417
Go open2010 sde_20100417Go open2010 sde_20100417
Go open2010 sde_20100417
 
2.0 Watch
2.0 Watch2.0 Watch
2.0 Watch
 
Mythology of search engine
Mythology of search engineMythology of search engine
Mythology of search engine
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 

More from Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

More from Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Recently uploaded

定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 

Recently uploaded (20)

定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 

II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at French Ministry of Environment

  • 1. Patrick Beaucamp Founder of the Vanilla Project Mail : Patrick.beaucamp@bpm-conseil.com Custom Open Source Search Engine with Drupal 8 and Solr at French Ministry of Environment II-SDV, Nice 24th April 2017 1II-SDV, Nice
  • 2. Presentation Agenda Open Source Search Engine & Search Platform Some interesting Platforms Features expected for Search Platforms (Interface) 2II-SDV, Nice Open Source Platform at French Ministry Project Context Platform Architecture WebSite Powered by a Search engine Echo : Tuesday am, presentation from Deep Search 9 and Tuesday pm prssentation from FranceLabs Personal Experience of Search
  • 3. Searching … and finding ! II-SDV : SEARCH, DATA MINING and VISUALISATION 3II-SDV, Nice How many times per day do you Google ? (search, maps, translate …) Tribute to Open Source at II-SDV Search is the first Step : collecting information
  • 4. Searching … and finding ! 4II-SDV, Nice
  • 5. Searching … and finding ! An exemple – my personal experience 5II-SDV, Nice I tried to find a person during 23 years, roughly from 1993 to 2016 From 1993 to 1998 : no search engine available … only private investigator ? From 1999 to 2015 : regular Search – no results I founded this person on facebook, not on google From a browser : « f + tab » … « g + tab », « y + tab » … Some years : no search, other years : multiples search
  • 6. Searching … and finding ! 6II-SDV, Nice 1) We all became private investigators one day or another
  • 7. Searching … and finding ! 7II-SDV, Nice
  • 8. Searching … and finding ! 8II-SDV, Nice 2) Different search engine lead to different results
  • 9. Searching … and finding ! 9II-SDV, Nice 2) Different search engine by country
  • 10. Searching … and finding ! 10II-SDV, Nice Funny word : SEO … its more « how to be found on Internet » … and you need to pay for it !
  • 11. Searching … and finding ! 11II-SDV, Nice 3) The person I was looking published on facebook using his/her real name – its his/her decision to be visible or not 4) Where do we stand with the « Right to Forget »
  • 12. Searching … and finding ! 12II-SDV, Nice Companies like Facebook have tons of data : they need to provide search infrastructure (indexing + search interface) I was lucky to make a try with facebook search interface
  • 13. Searching … and finding ! 13II-SDV, Nice Discovery of Cholera – 1854 (John Snow) http://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak
  • 14. Searching … and finding ! 14II-SDV, Nice Bicycle Accident in Street : who is taking care of trafic management Example in Boston : http://www.boston.com/bostonglobe/editorial_opinion/blogs/the_angle/2010/12/bike_crash_map.html Open Data
  • 15. Searching … and finding ! 15II-SDV, Nice LION – 2016 (Garth Davis) Mistake 1 : Ganesh Tanei – Mistake 2 : Saroo
  • 16. OpenSource LandScape 16II-SDV, Nice Crawling Indexing Storing WebSite Reference WebSite Accessibility Update Management Search Interface Result Visualization Auto Completion Natural Language Voice Recognition Maps Ads Unstructured data Access Management
  • 17. Search Platform Objectives Constraints : being able to reach WebSite and content : Internal WebSites (Intranet) & External WebSites Internal Document Repositories 17II-SDV, Nice Being able to index WebSite content (and page updates) Beeing able to store unstructured data Crawling Storing Indexing
  • 18. Search Platform Objectives 18II-SDV, Nice Provide usable Search results (auto classification, visualization) Don’t Forget why and what you search : • You search in existing documents • You need visualization tools • Its not a crystal ball : search reflects the past Provide usable Search interfaces (semantic search, multi language search …) Search Interface Result Visualization
  • 19. 19II-SDV, Nice Lucene is a java based indexing and search API Solr/Lucene is the leading server extension of Lucene. 2 companies, LucidWorks (Fusion) and ElasticSearch, provides packaging and extension of top of Lucene and Solr. -Nutch is the crawling component -Tika is a document Metadata manager – content analysis toolkit -Zookeeper is a multi thread process manager OpenSource LandScape
  • 20. 20II-SDV, Nice -Search Landscape -Lucene : http://lucene.apache.org -Solr/Lucene : http://lucene.apache.org/solr/ -Plateform OpenSearch : http://www.open-search-server.com -Plateform Katta : http://katta.sourceforge.net -Plateform LucidWorks : http://www.lucidworks.com -Plateform ElasticSearch : http://www.elasticsearch.com -Sphinx : http://sphinxsearch.com/ -Cloudera : https://www.cloudera.com/documentation/enterprise/5-5- x/topics/search_architecture.html -FranceLabs : http://www.francelabs.com/ (Datafari) -AklaBox : www.aklabox.com (AklaSearch) OpenSource LandScape
  • 21. 21II-SDV, Nice Lucene : Retrieval Software library Use existing Search Infrastructure like Solr/Lucene (Vanilla certified) http://www.lucidworks.com/ or http://www.elasticsearch.org/ Search Engine Focus
  • 22. 22II-SDV, Nice -Cloudera with Solr/Cloud (Solr/Lucene) -Mapr with ElasticSearch (Lucene code) -HortonWorks with LucidWorks (Solr/Lucene) Hadoop Search Platform - Big Data
  • 23. 23II-SDV, Nice Before indexing your document base, you need to access it ! Apache Nutch is a highly extensible and scalable open source web crawler software project. Reference : http://nutch.apache.org/ Nutch
  • 24. 24II-SDV, Nice Solr • What is Solr – Indexation and Search Engine • Promoted by the Apache Foundation • Built on Top of Apache Lucene (Java Search library) – Major engine characteristics • Scalable, fault tolerance, distribution indexation process, dynamic workload balancer, centraized configuration – Technical environment • Java • Embeded Jetty server for platform administration
  • 25. 25II-SDV, Nice Solr Main characteristics Admin Interface Flexible and scalable Configuration Modular Multiple index management with a signle instance
  • 26. 26II-SDV, Nice Solr Main characteristics Standard communication interfaces (html, xml, json) Configuration can be done with or without schema Real time Indexation
  • 27. 27II-SDV, Nice Solr Main characteristics Customizable Full Text analysis Rich documents indexation (using Tika)
  • 28. 28II-SDV, Nice Solr Main characteristics Search by facet and filters Term suggestion and orthograph correction Geospatial Search
  • 30. 30II-SDV, Nice -Synonyms - It is possible to extend the search to synonyms if they are listed in a glossary. For example, to find articles containing synonyms to “TV” when you search with the word TV. -Metadata - Dictionary for list of searchable keywords Search Engine Basic (1/2)
  • 31. 31II-SDV, Nice -Reserved Words, Protected Words - Indexing usually uses stemming, which is to reduce words to their root, for example "Developp" to find items also contain the word when trying to develop the word development. However, sometimes there are adverse lemmatizations, indexing under one lemma two words that have no relation. It is possible to prevent the stemming of words by listing them in a file protwords.txt. -StopWords - The stopwords are meaningless words. A word considered insignificant will be ignored. Note that some words are insignificant in some contexts, others have homonyms signifiers. For example, can refer to a summer season (rather mean) or past participle of the verb to be (relatively insignificant). Stopwords.txt the file looks like this Search Engine Basic (2/2)
  • 32. 32II-SDV, Nice -Multi Language support (this is where commercial search engine have still more to bring to customer), even there is now Asian type language support (Hindi, Thai, Chineese, …) -Elision : - Elisions are a feature of the French, which consist of a contraction of the words like or when they are followed by a vowel. Example: + aircraft gives the aircraft. It is possible to remove these elisions using a lexicon. -Limits solved other the past 3 years • Full text search interface (language with search engine) • SubQuery support : now its ok starting with Solr 4.7 (we are v6) • Scalability (this is where Solr is taking technical advantage) Search Engine Current Limits
  • 33. 33II-SDV, Nice -Advance indexing and querying tools. -Provides distributed searching capabilities to prevent bottleneck for a particular server. -Provides document excerpts (snippets) generation that provides summary of the search -Relevance ranking display extracts from the documents based on the query. Search Interface expectation (1/3)
  • 34. 34II-SDV, Nice -Duplicate document detection, including fuzzy near duplicates -Rich Document Parsing and Indexing without using Database Indexing. -Ranking control carry out a targeted ranking of individual documents. -Search Grouping by Type / Tag / Categories (General page, documents, images) Search Interface expectation (2/3)
  • 35. 35II-SDV, Nice -Multi Criteria support -Ranking -Natural language support -Apps Support (Android, Ipad) Search Interface expectation (3/3)
  • 36. Project at Ministry Initial decision and guidelines from Ministry 36II-SDV, Nice New WebSite will be done using Drupal CMS 8.2 WebSite should be powered by a « Google alike Search Toolbar » WebSite – Infrastructure – should connect with multiples other WebSite All Infra (Software) must be Open Source components
  • 37. Project at Ministry 37II-SDV, Nice http://www.developpement-durable.gouv.fr/
  • 38. Project at Ministry 38II-SDV, Nice http://www.developpement-durable.gouv.fr/
  • 39. Project at Ministry - Architecture 39II-SDV, Nice
  • 40. Project at Ministry - Architecture 40II-SDV, Nice
  • 41. Project at Ministry - Technical 41II-SDV, Nice Projects Steps Nutch crawler for various WebSite • Facebook, LinkedIn, Twitter, Youtube … • Internal WebSite, Previous WebSite Drupal Forms for Metadata & indexation • Specific Forms for different kind of documents • Drupal CMS process to add new content Drupal 8 Module for Solr : custom search, monitoring, reporting • Existing drupal solr is limited to single instance of drupal • Not possible to use Solr Admin interface
  • 42. Project at Ministry - Technical 42II-SDV, Nice Additional PHP libraries Curl : Communication Drupal-Solr (http-get http-post & attached file) Ssh2 : server administration command Zookeeper : Communication Drupal-Zookeeper MemCached : Communication Drupal-Memcached Solarium : Communication Drupal-Solr (abstraction layer) GoogleApi : youtube content indexation
  • 43. Project at Ministry – Admin Interface 43II-SDV, Nice Drupal8 Addon to setup the global infrastructure (Zookeeper, Solr)
  • 44. Project at Ministry – Admin Interface 44II-SDV, Nice Drupal8 Addon to monitor the global infrastructure - Statistics
  • 45. Project at Ministry - Validation 45II-SDV, Nice Projects Validation & Deployment No problems with Zookeeper, Solr, Nutch Stress tests for the global platform : initial slow down with 10 000 simultaneous connection Sub-Project : Adressing the Single Point of Failure Solution : Problems with Drupal & MySql -> MemCached
  • 46. Project at Ministry - Next 46II-SDV, Nice Next Steps Review of WebSite content … new Ministry New Content to be indexed : • Other WebSite and Social Content • New set of document to be added in the repository