SlideShare a Scribd company logo
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
Building Japanese Full-Text Search System
by Solr
― Document Seach and Application
to Online Shopping Site —
1
Syuta Hashimoto
opensuse-ja
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
Self Introduction
・Syuta Hashimoto @hashimotosyuta
I have worked at Web Product base on open source
eg. Online Shopping site, promotion site, CMS
・ With openSUSE
ー I have used openSUSE
for 4 years in my
home.
  I love geeko!
2
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
Main Topic
 TABLE
1 What is Full-Text Search?
2 What is Solr?
3 Let’s use!
4 What is Index?
5 Structure and Role
6 Solr can search from RDBMS!
7 Facet is easy to count
8 Highlighter is easy to highlight and more functions.
3
※You need RDBMS basic knowledge
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Q:What is Full-text Search?
A:Search from Full-text!(maybe)
 and Search from Full-text in Multiple Files!
“Multiple Files” is important at “full-text search”
and “enterprise search”
   ・Point 1
    Usually, the Full-text Search have two types.
  ・Serial Scan Type
  ・Index Type ←Today’s menu
4
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
USECASE
5
I want to search by word
“openSUSE” from those files!
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Full-text Search Type1 ”Serial Scan Type”
6
# grep -r ‘openSUSE’ files_A ① ② ③
# soffice files_B/LibreOffice Writer.odt → Ctrl + F ④
# soffice files_B/LibreOffice Calc.ods → Ctrl + F ⑤
# okular files_B/pdf.pdf → Ctrl + F ⑥
① ② ③ ④ ⑤ ⑥
For example, search sequential this method
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Full-text Search Type1 ”Serial Scan Type”
# grep -r ‘hogehoge’
⇢”Serial Scan Type” search ‘hogehoge’ word from files
under the currentdirectory.
ー Pros
・easy
ー Cons
・slow
・difficult to search from a rich text (e.g. Word)
・many search noise
7
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Full-text Search Type2 ”Index Type”
8
# curl
‘http://localhost:8983/solr/techproducts/select?indent=on&q=*:o
penSUSE&wt=json’ ① ←Today’s topic
①
To make index beforehand
You can search at once from
index made by to search easy
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Full-text Search Type2 ”Index Type”
   “Index Type” make Index about a word that we will
   search in advance, and search from that index.
ー Pros
・fast
・Index Type can search from a rich text (e.g. Word)
if Index Type can index.
・less search noise
ー Cons
・you have to build search system
・you need to index what you want to search files
9
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
2 What is Solr?
About Solr
・Index type full-text search system
・The sub project in Apache Lucene(™)
  →Apache Lucene is full-text search library
 Solr use this library. so Solr is open source too.
・Because the access is possible like WebAPI,
 The client is OK in anything!
・There is the competitive product
 called the “elasticsearch”
10
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use! 
To build at onse! (for local)
1 Install JVM. java version is 1.8 or later.
  (Leap 42.3 has been already installed.)
2 Download Solr
  You can download Solr from Solr official site. now version is 7.0.1
http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1
The zip file has all set.
3 Extract zip file
# unzip solr-7.0.1.zip
  and move
# cd solr-7.0.1
11
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use! 
12
Starting、Creating core、Indexing
4 # bin/solr start ←At first, Starting Solr.(no core, no index)
5 # bin/solr create -c mycore
←Creating core by the name of “mycore”
6 # bin/post -c mycore /home/hashimoto/doc/*
←indexing from files to “mycore”
「bin/post」indexing automacically
・ ・ ・ (outputing indexing logs….)
It is COMPLETED
※Solr official site has tutorial too.(It can experience cluster)
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use! 
Important Words
・CORE
Core is equivalent to a RDBMS schema.
Core has index format and query settings and more.
When say roughly, search engine itself.
・Schema definition
It calls index format a schema in Solr.
It is like RDBMS table.
・Index
A Data which indexing from target files
according to a schema definition.
13
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use! 
Solr has “Admin UI” by default
 After Starting, to Access http://localhost:8983/solr/ ….
14
Admin UI is displayed
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use!
“mycore” is registered.
15
“mycore” is
registered properly
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use!
You can search from “Query” in “mycore”
16
①this is
“Query”
②Input
search
word
③execute
④result is here
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
What is index?
17
This is.
The contents is correspondence
of a word of each files to a file
name.
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
18
The contents of index
(image of index)
so when you search for the word “openSUSE”, responding
immediately “text1.txt” and “LibreOfficeWriter.ods” has that.
WORD FILE WHICH HAS WORD
openSUSE text1.txt LibreOffice Writer.ods
conference text2.txt pdf.pdf
・・・ ・・・
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
19
Index definition = Schema
A definition is called schema.
It is as follows to define by schema.
   ・Field
    column saying by RDBMS. designated field type.
a text is broken into word and is registered.
   ・Field Type
field definition. defining numeric or string and
whether to do or not morphological analysis
   ・There is Dynamic Field and Copy Field.
(today, these are omitted.)
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
Indexing
20
Indexing is “registering to field according to the field
definition” about “content of search target file”
By The Way・・・
When register to the field, doing something
about easy search.
(Doing something is defined in field type)
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
21
Doing something?
・For example, converting all letters to lowercase.
→”linux” or “Linux” or “LINUX”, convert all of those to “linux”(lowercase).
when searching does the same conversion, can hit all “linux”.
・In Japanese, dividing on the basis of part of speech.
「私は東京都で開催されるアジアサミットに行きます。」
→「私-は-東京-都-で-開催-さ-れる-アジア-サミット-に-行き-ます」
this case search word “東京” is hit but search word “京都” is not hit.
reducing search noise when search from many files.
It is profound technique
called the “morphological analysis”
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
5 Structure and Role
22
Components Figure
①search
④result
①registration
②indexing
②query
③result
access Solr by REST-api
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Set Up is finished! enjoy good search life!!
23
What’s? My Shopping site have
data in MySQL.
Item description like search is too late….
Oh…..
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
DataImportHandler
In fact, Solr has a structure that can
search from RDBMS and more data source.
by a viewpoint from “full-text search”, It
expects search at item description on online
shopping site.
but, Solr can use facet search and
highlighter, so more useful.
24
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
25
Components Figure when using RDBMS
①search
④result
①registration
②indexing
②query
③result
・・・ RDBMS
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Logical structure
2626
Searching “geeko” at
description
Result is in the
data of name is
“openSUSE”
Solr let the field of a schema and a column
of RDBMS be equivalent and index it.
RDBMS
Schema
Field name=id
Field name=name
Field name=description
id name description
1 openSUSE geeko is cute!!
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Settings is slightly troublesome
● Put connector for RDBMS access.
→Put JDBC connector in “server/lib”
● Field Definition
→at next page
● Write settings in solrconfig.xml (setting file of core)
・Read DataImportHandler library
・Declare useing DataImportHandler and setting file *a
● Setting file for DataImportHandler(*a’s file)
・RDBMS connection settings
・Correspondence of a field and the SQL
27
This is an overview.
please see other
document for detail.
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Field Definition
Define schema at admin ui to be quick
28
①select
“Schema”
②choice
“Add Field”
③set each
settings,
and click
the “Add
Field”
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Setup is finished! Let’s import.
In the usual way REST-api.
http://localhost:8983/solr/mycore/dataimport?command=full-i
mport
29
Our “mycore”
Incidentally,
URI「/dataimport」 is defined at
requestHandler setting in
solrconfig.xml
Importing is finished only in this.
You can seach in admin ui.
For practical use, designing to import difference or
timming of importing.
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
7 Facet is easy to count
Facet Search
This is a function to count after grouping.
For example, to get a count of a type in this case.
30
id name description type
1 docker container virtualization
2 emacs multiple editor editor
3 vim multiple editor editor
4 chrome browser browser
5 firefox browser browser
6 sleipnir browser browser
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"type":[
"virtualization",1,
"editor",2
“browser”,3]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
7 Facet is easy to count
Facet Search REST-api
The way is to add query field of facet search.
http://localhost:8983/solr/mycore/select?facet=on&facet.field=
type&indent=on&q=*:*&wt=json
31
・facet=on
  Enable facet search
・facet.field=type
grouping and count by “type”
Of cource, facet search can be combined with a normal
search.
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
8 Highlighter is easy to highlight
Highlighter
Solr can get result of highrigt separately from normal
result.
For example, To search at “worldwide” from a
description in this data.
32
id name description
1 openSUSE The openSUSE project is a worldwide effort that promotes
the use of Linux everywhere. openSUSE creates one of
the world's best Linux distributions, working together in an
open, transparent and friendly manner as part of the
worldwide Free and Open Source Software community.
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
8 Highlighter is easy to highlight
To search with highlighter….
33
"highlighting":{
"1":{
"description":["The openSUSE project is a <em>worldwide</em> effort
that promotes the use of "]}}
The openSUSE project is a worldwide effort that promotes the use of Linux
everywhere. openSUSE creates one of the world's best Linux distributions,
working together in an open, transparent and friendly manner as part of the
worldwide Free and Open Source Software community.
“worldwide” word is surrounded by <em>
tag. and retrieve text around the word.
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
8 Highlighter is easy to highlight
Search with Highligter REST-api
In the usual way to add query parameter.
http://localhost:8983/solr/mycore/select?hl=on&hl.fl=descripti
on&indent=on&q=description:worldwide&wt=json
34
・hl=on
Highligter on
・hl.fl=description
Assign description field for highlight
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
8 Highlighter is easy to highlight
Setting of Highlighter
・”searchComponent” section in solrconfig.xml.
・To set several things to field.
a. set “stored” that keep retrieved data is true.
b. set things analysing to fieldtype.
Highrighter can set some combination.
You can use a default, but settings can careful control.
hl.method
hl.qparser
hl.requireFieldMatch
hl.usePhraseHighlighter
etc...
35
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
And more functions
Spatial
36
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
And more functions
Cloud
37
Recommend
No Image
Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 Solr is index type full-text search system.
2 Field definition is called “Schema”
This decides a structure of an index.
3 Solr can search from RDBMS too.
4 facet search, Highlighter is too easy.
8 ハイラGood Search Life!!
Have a lot of fun...
38
Today’s summary

More Related Content

Viewers also liked

How & Why we have connected Slack & IRC
How & Why we have connected Slack & IRCHow & Why we have connected Slack & IRC
How & Why we have connected Slack & IRC
Youngbin Han
 
Kernel entrance to-geek-
Kernel entrance to-geek-Kernel entrance to-geek-
Kernel entrance to-geek-
mao999
 
Large-scale deploy by AutoYast
Large-scale deploy by AutoYastLarge-scale deploy by AutoYast
Large-scale deploy by AutoYast
Hillwood Yang
 
openSUSE tools on Debian
openSUSE tools on DebianopenSUSE tools on Debian
openSUSE tools on Debian
Hideki Yamane
 
LibreOffice: The Office Suite with Mixing Bowl Culture
LibreOffice: The Office Suite with Mixing Bowl CultureLibreOffice: The Office Suite with Mixing Bowl Culture
LibreOffice: The Office Suite with Mixing Bowl Culture
Naruhiko Ogasawara
 
Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17
 Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17 Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17
Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17
Netwalker lab kapper
 

Viewers also liked (6)

How & Why we have connected Slack & IRC
How & Why we have connected Slack & IRCHow & Why we have connected Slack & IRC
How & Why we have connected Slack & IRC
 
Kernel entrance to-geek-
Kernel entrance to-geek-Kernel entrance to-geek-
Kernel entrance to-geek-
 
Large-scale deploy by AutoYast
Large-scale deploy by AutoYastLarge-scale deploy by AutoYast
Large-scale deploy by AutoYast
 
openSUSE tools on Debian
openSUSE tools on DebianopenSUSE tools on Debian
openSUSE tools on Debian
 
LibreOffice: The Office Suite with Mixing Bowl Culture
LibreOffice: The Office Suite with Mixing Bowl CultureLibreOffice: The Office Suite with Mixing Bowl Culture
LibreOffice: The Office Suite with Mixing Bowl Culture
 
Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17
 Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17 Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17
Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17
 

Similar to Building japanese full text search system by Solr

Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
Shifa Khan
 
Ad507
Ad507Ad507
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
Sarang Shravagi
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptx
Aayush Chimaniya
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
smile790243
 
Mongodb
MongodbMongodb
Mongodb
ichangbai
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
Alexandre Rafalovitch
 
Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4j
Corie Pollock
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Lucidworks
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
Habilelabs
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
Alexandre Rafalovitch
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
Optum
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Karen Thompson
 
Boolean guidance
Boolean guidanceBoolean guidance
Boolean guidance
Syed Yaseen Ahmed
 
Boolean Guidance
Boolean GuidanceBoolean Guidance
Boolean Guidance
Syed Yaseen Ahmed
 
Basic iOS Training with SWIFT - Part 1
Basic iOS Training with SWIFT - Part 1Basic iOS Training with SWIFT - Part 1
Basic iOS Training with SWIFT - Part 1
Manoj Ellappan
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
Kuldeep Singh
 
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
Redis Labs
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Slides bootstrap-4
Slides bootstrap-4Slides bootstrap-4
Slides bootstrap-4
Michael Posso
 

Similar to Building japanese full text search system by Solr (20)

Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Ad507
Ad507Ad507
Ad507
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptx
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
Mongodb
MongodbMongodb
Mongodb
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Polyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4jPolyglot Persistence with MongoDB and Neo4j
Polyglot Persistence with MongoDB and Neo4j
 
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
Boolean guidance
Boolean guidanceBoolean guidance
Boolean guidance
 
Boolean Guidance
Boolean GuidanceBoolean Guidance
Boolean Guidance
 
Basic iOS Training with SWIFT - Part 1
Basic iOS Training with SWIFT - Part 1Basic iOS Training with SWIFT - Part 1
Basic iOS Training with SWIFT - Part 1
 
Longwell final ppt
Longwell final pptLongwell final ppt
Longwell final ppt
 
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Slides bootstrap-4
Slides bootstrap-4Slides bootstrap-4
Slides bootstrap-4
 

Recently uploaded

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 

Recently uploaded (20)

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 

Building japanese full text search system by Solr

  • 1. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 Building Japanese Full-Text Search System by Solr ― Document Seach and Application to Online Shopping Site — 1 Syuta Hashimoto opensuse-ja
  • 2. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 Self Introduction ・Syuta Hashimoto @hashimotosyuta I have worked at Web Product base on open source eg. Online Shopping site, promotion site, CMS ・ With openSUSE ー I have used openSUSE for 4 years in my home.   I love geeko! 2
  • 3. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 Main Topic  TABLE 1 What is Full-Text Search? 2 What is Solr? 3 Let’s use! 4 What is Index? 5 Structure and Role 6 Solr can search from RDBMS! 7 Facet is easy to count 8 Highlighter is easy to highlight and more functions. 3 ※You need RDBMS basic knowledge
  • 4. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 1 What is Full-text Search? Q:What is Full-text Search? A:Search from Full-text!(maybe)  and Search from Full-text in Multiple Files! “Multiple Files” is important at “full-text search” and “enterprise search”    ・Point 1     Usually, the Full-text Search have two types.   ・Serial Scan Type   ・Index Type ←Today’s menu 4
  • 5. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 1 What is Full-text Search? USECASE 5 I want to search by word “openSUSE” from those files!
  • 6. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 1 What is Full-text Search? Full-text Search Type1 ”Serial Scan Type” 6 # grep -r ‘openSUSE’ files_A ① ② ③ # soffice files_B/LibreOffice Writer.odt → Ctrl + F ④ # soffice files_B/LibreOffice Calc.ods → Ctrl + F ⑤ # okular files_B/pdf.pdf → Ctrl + F ⑥ ① ② ③ ④ ⑤ ⑥ For example, search sequential this method
  • 7. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 1 What is Full-text Search? Full-text Search Type1 ”Serial Scan Type” # grep -r ‘hogehoge’ ⇢”Serial Scan Type” search ‘hogehoge’ word from files under the currentdirectory. ー Pros ・easy ー Cons ・slow ・difficult to search from a rich text (e.g. Word) ・many search noise 7
  • 8. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 1 What is Full-text Search? Full-text Search Type2 ”Index Type” 8 # curl ‘http://localhost:8983/solr/techproducts/select?indent=on&q=*:o penSUSE&wt=json’ ① ←Today’s topic ① To make index beforehand You can search at once from index made by to search easy
  • 9. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 1 What is Full-text Search? Full-text Search Type2 ”Index Type”    “Index Type” make Index about a word that we will    search in advance, and search from that index. ー Pros ・fast ・Index Type can search from a rich text (e.g. Word) if Index Type can index. ・less search noise ー Cons ・you have to build search system ・you need to index what you want to search files 9
  • 10. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 2 What is Solr? About Solr ・Index type full-text search system ・The sub project in Apache Lucene(™)   →Apache Lucene is full-text search library  Solr use this library. so Solr is open source too. ・Because the access is possible like WebAPI,  The client is OK in anything! ・There is the competitive product  called the “elasticsearch” 10
  • 11. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 3 Let’s use!  To build at onse! (for local) 1 Install JVM. java version is 1.8 or later.   (Leap 42.3 has been already installed.) 2 Download Solr   You can download Solr from Solr official site. now version is 7.0.1 http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1 The zip file has all set. 3 Extract zip file # unzip solr-7.0.1.zip   and move # cd solr-7.0.1 11
  • 12. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 3 Let’s use!  12 Starting、Creating core、Indexing 4 # bin/solr start ←At first, Starting Solr.(no core, no index) 5 # bin/solr create -c mycore ←Creating core by the name of “mycore” 6 # bin/post -c mycore /home/hashimoto/doc/* ←indexing from files to “mycore” 「bin/post」indexing automacically ・ ・ ・ (outputing indexing logs….) It is COMPLETED ※Solr official site has tutorial too.(It can experience cluster)
  • 13. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 3 Let’s use!  Important Words ・CORE Core is equivalent to a RDBMS schema. Core has index format and query settings and more. When say roughly, search engine itself. ・Schema definition It calls index format a schema in Solr. It is like RDBMS table. ・Index A Data which indexing from target files according to a schema definition. 13
  • 14. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 3 Let’s use!  Solr has “Admin UI” by default  After Starting, to Access http://localhost:8983/solr/ …. 14 Admin UI is displayed
  • 15. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 3 Let’s use! “mycore” is registered. 15 “mycore” is registered properly
  • 16. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 3 Let’s use! You can search from “Query” in “mycore” 16 ①this is “Query” ②Input search word ③execute ④result is here
  • 17. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 4 What is Index? What is index? 17 This is. The contents is correspondence of a word of each files to a file name.
  • 18. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 4 What is Index? 18 The contents of index (image of index) so when you search for the word “openSUSE”, responding immediately “text1.txt” and “LibreOfficeWriter.ods” has that. WORD FILE WHICH HAS WORD openSUSE text1.txt LibreOffice Writer.ods conference text2.txt pdf.pdf ・・・ ・・・
  • 19. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 4 What is Index? 19 Index definition = Schema A definition is called schema. It is as follows to define by schema.    ・Field     column saying by RDBMS. designated field type. a text is broken into word and is registered.    ・Field Type field definition. defining numeric or string and whether to do or not morphological analysis    ・There is Dynamic Field and Copy Field. (today, these are omitted.)
  • 20. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 4 What is Index? Indexing 20 Indexing is “registering to field according to the field definition” about “content of search target file” By The Way・・・ When register to the field, doing something about easy search. (Doing something is defined in field type)
  • 21. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 4 What is Index? 21 Doing something? ・For example, converting all letters to lowercase. →”linux” or “Linux” or “LINUX”, convert all of those to “linux”(lowercase). when searching does the same conversion, can hit all “linux”. ・In Japanese, dividing on the basis of part of speech. 「私は東京都で開催されるアジアサミットに行きます。」 →「私-は-東京-都-で-開催-さ-れる-アジア-サミット-に-行き-ます」 this case search word “東京” is hit but search word “京都” is not hit. reducing search noise when search from many files. It is profound technique called the “morphological analysis”
  • 22. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 5 Structure and Role 22 Components Figure ①search ④result ①registration ②indexing ②query ③result access Solr by REST-api
  • 23. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 6 Solr can search from RDBMS! Set Up is finished! enjoy good search life!! 23 What’s? My Shopping site have data in MySQL. Item description like search is too late…. Oh…..
  • 24. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 6 Solr can search from RDBMS! DataImportHandler In fact, Solr has a structure that can search from RDBMS and more data source. by a viewpoint from “full-text search”, It expects search at item description on online shopping site. but, Solr can use facet search and highlighter, so more useful. 24
  • 25. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 6 Solr can search from RDBMS! 25 Components Figure when using RDBMS ①search ④result ①registration ②indexing ②query ③result ・・・ RDBMS
  • 26. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 6 Solr can search from RDBMS! Logical structure 2626 Searching “geeko” at description Result is in the data of name is “openSUSE” Solr let the field of a schema and a column of RDBMS be equivalent and index it. RDBMS Schema Field name=id Field name=name Field name=description id name description 1 openSUSE geeko is cute!!
  • 27. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 6 Solr can search from RDBMS! Settings is slightly troublesome ● Put connector for RDBMS access. →Put JDBC connector in “server/lib” ● Field Definition →at next page ● Write settings in solrconfig.xml (setting file of core) ・Read DataImportHandler library ・Declare useing DataImportHandler and setting file *a ● Setting file for DataImportHandler(*a’s file) ・RDBMS connection settings ・Correspondence of a field and the SQL 27 This is an overview. please see other document for detail.
  • 28. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 6 Solr can search from RDBMS! Field Definition Define schema at admin ui to be quick 28 ①select “Schema” ②choice “Add Field” ③set each settings, and click the “Add Field”
  • 29. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 6 Solr can search from RDBMS! Setup is finished! Let’s import. In the usual way REST-api. http://localhost:8983/solr/mycore/dataimport?command=full-i mport 29 Our “mycore” Incidentally, URI「/dataimport」 is defined at requestHandler setting in solrconfig.xml Importing is finished only in this. You can seach in admin ui. For practical use, designing to import difference or timming of importing.
  • 30. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 7 Facet is easy to count Facet Search This is a function to count after grouping. For example, to get a count of a type in this case. 30 id name description type 1 docker container virtualization 2 emacs multiple editor editor 3 vim multiple editor editor 4 chrome browser browser 5 firefox browser browser 6 sleipnir browser browser "facet_counts":{ "facet_queries":{}, "facet_fields":{ "type":[ "virtualization",1, "editor",2 “browser”,3]}, "facet_ranges":{}, "facet_intervals":{}, "facet_heatmaps":{}}
  • 31. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 7 Facet is easy to count Facet Search REST-api The way is to add query field of facet search. http://localhost:8983/solr/mycore/select?facet=on&facet.field= type&indent=on&q=*:*&wt=json 31 ・facet=on   Enable facet search ・facet.field=type grouping and count by “type” Of cource, facet search can be combined with a normal search.
  • 32. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 8 Highlighter is easy to highlight Highlighter Solr can get result of highrigt separately from normal result. For example, To search at “worldwide” from a description in this data. 32 id name description 1 openSUSE The openSUSE project is a worldwide effort that promotes the use of Linux everywhere. openSUSE creates one of the world's best Linux distributions, working together in an open, transparent and friendly manner as part of the worldwide Free and Open Source Software community.
  • 33. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 8 Highlighter is easy to highlight To search with highlighter…. 33 "highlighting":{ "1":{ "description":["The openSUSE project is a <em>worldwide</em> effort that promotes the use of "]}} The openSUSE project is a worldwide effort that promotes the use of Linux everywhere. openSUSE creates one of the world's best Linux distributions, working together in an open, transparent and friendly manner as part of the worldwide Free and Open Source Software community. “worldwide” word is surrounded by <em> tag. and retrieve text around the word.
  • 34. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 8 Highlighter is easy to highlight Search with Highligter REST-api In the usual way to add query parameter. http://localhost:8983/solr/mycore/select?hl=on&hl.fl=descripti on&indent=on&q=description:worldwide&wt=json 34 ・hl=on Highligter on ・hl.fl=description Assign description field for highlight
  • 35. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 8 Highlighter is easy to highlight Setting of Highlighter ・”searchComponent” section in solrconfig.xml. ・To set several things to field. a. set “stored” that keep retrieved data is true. b. set things analysing to fieldtype. Highrighter can set some combination. You can use a default, but settings can careful control. hl.method hl.qparser hl.requireFieldMatch hl.usePhraseHighlighter etc... 35
  • 36. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 And more functions Spatial 36
  • 37. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 And more functions Cloud 37 Recommend No Image
  • 38. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21 1 Solr is index type full-text search system. 2 Field definition is called “Schema” This decides a structure of an index. 3 Solr can search from RDBMS too. 4 facet search, Highlighter is too easy. 8 ハイラGood Search Life!! Have a lot of fun... 38 Today’s summary