This document summarizes a presentation about building a Japanese full-text search system using Solr. It introduces Solr as an open source enterprise search platform that can index content and enable fast search. It provides step-by-step instructions on setting up Solr locally, including downloading, extracting, starting Solr, creating a core, and indexing documents. It also discusses how Solr can index content from a relational database, and highlights features like faceted search and result highlighting.
Grow My Search - A Whole New Approach to SearchTsubasa Kato
Grow My Search enables the user to have their own personal search engine with crawler. It can ask other search engine for the starting seed URLs for the crawler to start crawling.
Patent Pending.
This document provides an overview of search and indexing basics for Sitecore. It discusses why search is important, how databases perform search versus indexing, and the fundamentals of Lucene and Solr. Lucene is introduced as a powerful search library, while Solr is described as a popular open source enterprise search platform built on Lucene that is highly scalable and supports features like distributed indexing. The presentation concludes by noting how Sitecore leverages Lucene and Solr for search capabilities.
Google is the most-used search engine and top most-visited website, created in 1997 by Larry Page and Sergey Brin. It functions by accepting search queries as text and ranking pages based on an algorithm that analyzes incoming and outgoing page links. In addition to searching web pages, Google can search images, discussions, and offers features like spell check, definitions, maps, currency conversion, and sports scores to enhance the user experience.
Yahoo BOSS is a service that provides developers access to Yahoo search data and infrastructure to build commercial search applications. It offers a RESTful API, search advertising, and tools. Developers can customize results, mix data from multiple APIs, and authenticate via OAuth. The document provides examples of searches using filters, operators, and output parameters and gives ideas for applications that could be built using BOSS, such as real-time searches or mobile apps.
A popular domain can be a precious treasure, but before you begin your hunt, you'll need a clear purpose in mind.
Work out the what and the why that surround your need for this domain, and then ask these two important questions:
1. Why do I need to find the domain owner?
2. What will I gain afterwards?
Search engines use automated software called spiders or bots to crawl the web, following links from page to page to index websites. The search engine index inventories the words and links on each page. When a user searches, the search engine queries its index, ranks the results, and returns them to the user. Search engine optimization, or SEO, refers to optimizing on-page and off-page factors like keywords, links, content, and social media to achieve high search engine rankings. On-page optimization includes factors like titles, metadata, content, and images while off-page focuses on links, social media, and blogging. White hat SEO follows best practices while black hat involves hidden text, keyword stuffing,
Basic SEO mini workshop for copywriter salomon dayan
The document provides tips on search engine optimization (SEO) best practices for content, links, keywords, social media, and more. It recommends writing content for users rather than search engines, using keyword research tools to identify relevant keywords, optimizing titles, descriptions and images with alt text, and leveraging social media to drive search traffic and links. Tips include using headings, common words, and the inverted pyramid structure for content; internal links with descriptive text; and filling out metadata and properties for images, video and other media.
This document provides an overview of search engines, including what they are, their importance, types of search engines, and how to use them effectively. It defines a search engine as a software system that searches the web using keywords and ranks results by relevance. It explains that search engines are important because they help filter the vast amount of online information to quickly find specific information. The document outlines different types of search engines such as crawler-based like Google and Yahoo, directories, hybrid, and meta-search engines. It concludes by giving tips for using search operators like +, -, quotes, and or to refine searches.
Grow My Search - A Whole New Approach to SearchTsubasa Kato
Grow My Search enables the user to have their own personal search engine with crawler. It can ask other search engine for the starting seed URLs for the crawler to start crawling.
Patent Pending.
This document provides an overview of search and indexing basics for Sitecore. It discusses why search is important, how databases perform search versus indexing, and the fundamentals of Lucene and Solr. Lucene is introduced as a powerful search library, while Solr is described as a popular open source enterprise search platform built on Lucene that is highly scalable and supports features like distributed indexing. The presentation concludes by noting how Sitecore leverages Lucene and Solr for search capabilities.
Google is the most-used search engine and top most-visited website, created in 1997 by Larry Page and Sergey Brin. It functions by accepting search queries as text and ranking pages based on an algorithm that analyzes incoming and outgoing page links. In addition to searching web pages, Google can search images, discussions, and offers features like spell check, definitions, maps, currency conversion, and sports scores to enhance the user experience.
Yahoo BOSS is a service that provides developers access to Yahoo search data and infrastructure to build commercial search applications. It offers a RESTful API, search advertising, and tools. Developers can customize results, mix data from multiple APIs, and authenticate via OAuth. The document provides examples of searches using filters, operators, and output parameters and gives ideas for applications that could be built using BOSS, such as real-time searches or mobile apps.
A popular domain can be a precious treasure, but before you begin your hunt, you'll need a clear purpose in mind.
Work out the what and the why that surround your need for this domain, and then ask these two important questions:
1. Why do I need to find the domain owner?
2. What will I gain afterwards?
Search engines use automated software called spiders or bots to crawl the web, following links from page to page to index websites. The search engine index inventories the words and links on each page. When a user searches, the search engine queries its index, ranks the results, and returns them to the user. Search engine optimization, or SEO, refers to optimizing on-page and off-page factors like keywords, links, content, and social media to achieve high search engine rankings. On-page optimization includes factors like titles, metadata, content, and images while off-page focuses on links, social media, and blogging. White hat SEO follows best practices while black hat involves hidden text, keyword stuffing,
Basic SEO mini workshop for copywriter salomon dayan
The document provides tips on search engine optimization (SEO) best practices for content, links, keywords, social media, and more. It recommends writing content for users rather than search engines, using keyword research tools to identify relevant keywords, optimizing titles, descriptions and images with alt text, and leveraging social media to drive search traffic and links. Tips include using headings, common words, and the inverted pyramid structure for content; internal links with descriptive text; and filling out metadata and properties for images, video and other media.
This document provides an overview of search engines, including what they are, their importance, types of search engines, and how to use them effectively. It defines a search engine as a software system that searches the web using keywords and ranks results by relevance. It explains that search engines are important because they help filter the vast amount of online information to quickly find specific information. The document outlines different types of search engines such as crawler-based like Google and Yahoo, directories, hybrid, and meta-search engines. It concludes by giving tips for using search operators like +, -, quotes, and or to refine searches.
The document announces a kernel mokumoku-kai (kernel hacking meeting) to take place on October 28th in Matsuyama, Ehime Prefecture. It will be held from 9:20 to 18:00 at Ehime University's Center for Information Technology on the 2nd floor. After the event concludes, an after party is planned where the no-alcohol party may transition to an alcohol party. The event is aimed at those interested in kernels, operating systems, and low-level concepts.
This document discusses using AutoYast to perform large-scale deployments of openSUE/SLE systems. AutoYast allows fully automated and unattended installation by configuring partitioning, networking, software selection, firewalls, and services. The autoinst.xml file contains installation details. A PXE boot server with DHCP and TFTP provides network booting capabilities. The document also covers configuring AutoYast in syslinux to retrieve the autoinst.xml file from different sources like HTTP, NFS, USB, etc.
LibreOffice: The Office Suite with Mixing Bowl CultureNaruhiko Ogasawara
The slide is for an opening talk of LibreOffice mini-conference 2017 Japan, which is a sub-event of openSUSE.Asia Summit 2017 Tokyo. This describes the overviews of LibreOffice product, and the project aims to develop LibreOffice.
---
LibreOffice mini-conference 2017 Japan (openSUSE.Asia Summit 2017 Tokyoのイベント内イベントとして開催)のオープニングトークです。LibreOfficeの概要を、プロダクトとプロジェクトの両面から説明しています。
Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17Netwalker lab kapper
This is openSUSE Asia Summit 2017 presentation materials.
I installed openSUSE a lot of mobile devices.
So let's install and play openSUSE!
2 in 1 Language English and Japanese.
#opensuse #opensuseja
This document provides an overview of Elasticsearch including:
- Elasticsearch is a database server that is implemented using RESTful HTTP/JSON and is easily scalable. It is based on Lucene.
- Features include being schema-free, real-time, easy to extend with plugins, automatic peer discovery in clusters, failover and replication, and community support.
- Terminology includes index, type, document, and field which make up the data structure inside Elasticsearch. Searches can be performed across multiple indices.
- Elasticsearch works using full-text searching via inverted indexing and analysis. Analysis extracts terms from text through techniques like removing stopwords, lowercase conversion, and stemming.
- Elasticsearch can be accessed in a RESTful manner
This document summarizes a presentation on leveraging object-oriented programming techniques in LotusScript. It introduces object-oriented concepts like classes, objects, and encapsulation. It then walks through building an application to monitor news sites for company mentions using a class to represent each site and a nested class to represent individual news items. The presentation demonstrates encapsulating the news item class within the site class and using inheritance by extending all classes from a base class. It shows how to make the application more robust by adding logging through the base class.
Page 18Goal Implement a complete search engine. Milestones.docxsmile790243
Page 1/8
Goal: Implement a complete search engine. Milestones Overview
Milestone Goal #1 Produce an initial index for the corpus and a basic retrieval component
#2 Complete Search System
Page 2/8
PROJECT: SEARCH ENGINE Corpus: all ICS web pages We will provide you with the crawled data as a zip file (webpages_raw.zip). This contains the downloaded content of the ICS web pages that were crawled by a previous quarter. You are expected to build your search engine index off of this data. Main challenges: Full HTML parsing, File/DB handling, handling user input (either using command line or desktop GUI application or web interface) COMPONENT 1 - INDEX: Create an inverted index for all the corpus given to you. You can either use a database to store your index (MongoDB, Redis, memcached are some examples) or you can store the index in a file. You are free to choose an approach here. The index should store more than just a simple list of documents where the token occurs. At the very least, your index should store the TF-IDF of every term/document. Sample Index:
Note: This is a simplistic example provided for your understanding. Please do not consider this as the expected index format. A good inverted index will store more information than this. Index Structure: token – docId1, tf-idf1 ; docId2, tf-idf2
Example: informatics – doc_1, 5 ; doc_2, 10 ; doc_3, 7 You are encouraged to come up with heuristics that make sense and will help in retrieving relevant search results. For e.g. - words in bold and in heading (h1, h2, h3) could be treated as more important than the other words. These are useful metadata that could be added to your inverted index data. Optional (1 point for each meta data item up to 2 points max):: Extra credit will be given for ideas that improve the quality of the retrieval, so you may add more metadata to your index, if you think it will help improve the quality of the retrieval. For this, instead of storing a simple TF-IDF count for every page, you can store more information related to the page (e.g. position of the words in the page). To store this information, you need to design your index in such a way that it can store and retrieve all this metadata efficiently. Your index lookup during search should not be horribly slow, so pay attention to the structure of your index COMPONENT 2 – SEARCH AND RETRIEVE: Your program should prompt the user for a query. This doesn’t need to be a Web interface, it can be a console prompt. At the time of the query, your program will look up your index, perform some calculations (see ranking below) and give out the ranked list of pages that are relevant for the query.
COMPONENT 3 - RANKING:
At the very least, your ranking formula should include tf-idf scoring, but you should feel free to add additional components to this formula if you think they improve the retrieval. Optional (1 point for each parameter up to 2 points max): Extra credit will be given if your ranking formula includes par.
The document describes the basics of MongoDB, including that it uses databases containing collections which are made up of documents with fields. Collections can be indexed to improve performance of lookups and sorting. When data is retrieved from MongoDB, it is through a cursor which delays execution until needed. While similar concepts exist in relational databases, MongoDB's documents can have unique fields compared to tables with predefined columns.
This document provides an overview of the Apache Solr search engine. It begins with an introduction to full-text search and how it differs from basic SQL queries. It then covers the basic and advanced features of Solr, highlighting facets, language-specific processing, and geographic search. The document reviews how Solr uses Lucene for indexing and search capabilities. It concludes with discussing ways to get started with Solr, including downloading the software and importing sample data for testing.
Polyglot Persistence with MongoDB and Neo4jCorie Pollock
Learn how to enhance your application by using Neo4j and MongoDB together. Polyglot persistence is the concept of taking advantage of the strengths of different database technologies to improve functionality and enhance your application. In this webinar we will examine some use cases where it makes sense to use a document database (MongoDB) with a graph database (Neo4j) in a single application. Specifically, we will show how MongoDB can be used to provide search and browsing functionality for a product catalog while using Neo4j to provide personalized product recommendations. Finally we will look at the Neo4j Doc Manager project which facilitates syncing data from MongoDB to Neo4j to make polyglot persistence with MongoDB and Neo4j much easier.
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
The document discusses a reference architecture for searching and querying knowledge graphs with Solr/SIREn. It describes challenges in indexing and searching knowledge graphs due to their complex relational structure and diversity of data. The proposed architecture aims to simplify the task by reducing custom code through standardized tools and enabling quick adaptation to changes in data schemas or requirements. Key components include using SPARQL to extract relevant graph subsets and map them to a simplified schema, generating JSON documents from the extracted subgraphs for indexing, and leveraging the SIREn plugin to support structured queries over nested and relational data.
MongoDB is the most famous and loved NoSQL database. It has many features that are easy to handle when compared to conventional RDBMS. These slides contain the basics of MongoDB.
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Overview of structured search technology. Using the structure of a document to create better search results for document search and retrieval.
How both search precision and recall is improved when the structure of a document is used.
How a keyword match in a title of a document can be used to boost the search score.
Case studies with the eXist native XML database.
Steps to set up a pilot project.
- The document discusses an internship report on iOS technology. The intern installed Xcode 6.4 and learned Objective-C programming. They built an iOS application using Xcode and gathered requirements from the design team. They also worked on product documentation.
The document discusses Boolean logic and how it can be used for effective searching. It begins by explaining what Boolean logic is, naming its origins with George Boole. It then outlines the main Boolean operators - AND, OR, NOT, NEAR - and modifiers like quotation marks, parentheses, and wildcards. Examples are given to demonstrate how each operator works. The document emphasizes that Boolean logic allows recruiters to search more efficiently by combining terms and excluding unwanted results. It encourages readers to practice with sample searches to learn how Boolean works for optimizing online searches.
The document discusses Boolean logic and operators that can be used to conduct effective searches of resume databases. It begins by explaining that Boolean logic was invented by George Boole and forms the foundation of digital circuits and internet search engines. The key Boolean operators - AND, OR, NOT, NEAR - and modifiers like quotation marks and parentheses are then defined and examples are provided to illustrate how to use them to search for specific skills combinations or exclude unwanted domains. Recommendations note the importance of covering synonyms and using the NOT operator carefully.
The document provides an overview of iOS training for day 1, which includes introductions to iOS, Objective-C and Swift programming languages, Xcode IDE, Cocoa and Cocoa Touch frameworks, Model-View-Controller architecture, and best practices for iOS development such as project structure, constants, minimum iOS version requirements, and coding style conventions.
The document announces a kernel mokumoku-kai (kernel hacking meeting) to take place on October 28th in Matsuyama, Ehime Prefecture. It will be held from 9:20 to 18:00 at Ehime University's Center for Information Technology on the 2nd floor. After the event concludes, an after party is planned where the no-alcohol party may transition to an alcohol party. The event is aimed at those interested in kernels, operating systems, and low-level concepts.
This document discusses using AutoYast to perform large-scale deployments of openSUE/SLE systems. AutoYast allows fully automated and unattended installation by configuring partitioning, networking, software selection, firewalls, and services. The autoinst.xml file contains installation details. A PXE boot server with DHCP and TFTP provides network booting capabilities. The document also covers configuring AutoYast in syslinux to retrieve the autoinst.xml file from different sources like HTTP, NFS, USB, etc.
LibreOffice: The Office Suite with Mixing Bowl CultureNaruhiko Ogasawara
The slide is for an opening talk of LibreOffice mini-conference 2017 Japan, which is a sub-event of openSUSE.Asia Summit 2017 Tokyo. This describes the overviews of LibreOffice product, and the project aims to develop LibreOffice.
---
LibreOffice mini-conference 2017 Japan (openSUSE.Asia Summit 2017 Tokyoのイベント内イベントとして開催)のオープニングトークです。LibreOfficeの概要を、プロダクトとプロジェクトの両面から説明しています。
Hacking with x86 Windows Tablet and mobile devices on openSUSE #opensuseasia17Netwalker lab kapper
This is openSUSE Asia Summit 2017 presentation materials.
I installed openSUSE a lot of mobile devices.
So let's install and play openSUSE!
2 in 1 Language English and Japanese.
#opensuse #opensuseja
This document provides an overview of Elasticsearch including:
- Elasticsearch is a database server that is implemented using RESTful HTTP/JSON and is easily scalable. It is based on Lucene.
- Features include being schema-free, real-time, easy to extend with plugins, automatic peer discovery in clusters, failover and replication, and community support.
- Terminology includes index, type, document, and field which make up the data structure inside Elasticsearch. Searches can be performed across multiple indices.
- Elasticsearch works using full-text searching via inverted indexing and analysis. Analysis extracts terms from text through techniques like removing stopwords, lowercase conversion, and stemming.
- Elasticsearch can be accessed in a RESTful manner
This document summarizes a presentation on leveraging object-oriented programming techniques in LotusScript. It introduces object-oriented concepts like classes, objects, and encapsulation. It then walks through building an application to monitor news sites for company mentions using a class to represent each site and a nested class to represent individual news items. The presentation demonstrates encapsulating the news item class within the site class and using inheritance by extending all classes from a base class. It shows how to make the application more robust by adding logging through the base class.
Page 18Goal Implement a complete search engine. Milestones.docxsmile790243
Page 1/8
Goal: Implement a complete search engine. Milestones Overview
Milestone Goal #1 Produce an initial index for the corpus and a basic retrieval component
#2 Complete Search System
Page 2/8
PROJECT: SEARCH ENGINE Corpus: all ICS web pages We will provide you with the crawled data as a zip file (webpages_raw.zip). This contains the downloaded content of the ICS web pages that were crawled by a previous quarter. You are expected to build your search engine index off of this data. Main challenges: Full HTML parsing, File/DB handling, handling user input (either using command line or desktop GUI application or web interface) COMPONENT 1 - INDEX: Create an inverted index for all the corpus given to you. You can either use a database to store your index (MongoDB, Redis, memcached are some examples) or you can store the index in a file. You are free to choose an approach here. The index should store more than just a simple list of documents where the token occurs. At the very least, your index should store the TF-IDF of every term/document. Sample Index:
Note: This is a simplistic example provided for your understanding. Please do not consider this as the expected index format. A good inverted index will store more information than this. Index Structure: token – docId1, tf-idf1 ; docId2, tf-idf2
Example: informatics – doc_1, 5 ; doc_2, 10 ; doc_3, 7 You are encouraged to come up with heuristics that make sense and will help in retrieving relevant search results. For e.g. - words in bold and in heading (h1, h2, h3) could be treated as more important than the other words. These are useful metadata that could be added to your inverted index data. Optional (1 point for each meta data item up to 2 points max):: Extra credit will be given for ideas that improve the quality of the retrieval, so you may add more metadata to your index, if you think it will help improve the quality of the retrieval. For this, instead of storing a simple TF-IDF count for every page, you can store more information related to the page (e.g. position of the words in the page). To store this information, you need to design your index in such a way that it can store and retrieve all this metadata efficiently. Your index lookup during search should not be horribly slow, so pay attention to the structure of your index COMPONENT 2 – SEARCH AND RETRIEVE: Your program should prompt the user for a query. This doesn’t need to be a Web interface, it can be a console prompt. At the time of the query, your program will look up your index, perform some calculations (see ranking below) and give out the ranked list of pages that are relevant for the query.
COMPONENT 3 - RANKING:
At the very least, your ranking formula should include tf-idf scoring, but you should feel free to add additional components to this formula if you think they improve the retrieval. Optional (1 point for each parameter up to 2 points max): Extra credit will be given if your ranking formula includes par.
The document describes the basics of MongoDB, including that it uses databases containing collections which are made up of documents with fields. Collections can be indexed to improve performance of lookups and sorting. When data is retrieved from MongoDB, it is through a cursor which delays execution until needed. While similar concepts exist in relational databases, MongoDB's documents can have unique fields compared to tables with predefined columns.
This document provides an overview of the Apache Solr search engine. It begins with an introduction to full-text search and how it differs from basic SQL queries. It then covers the basic and advanced features of Solr, highlighting facets, language-specific processing, and geographic search. The document reviews how Solr uses Lucene for indexing and search capabilities. It concludes with discussing ways to get started with Solr, including downloading the software and importing sample data for testing.
Polyglot Persistence with MongoDB and Neo4jCorie Pollock
Learn how to enhance your application by using Neo4j and MongoDB together. Polyglot persistence is the concept of taking advantage of the strengths of different database technologies to improve functionality and enhance your application. In this webinar we will examine some use cases where it makes sense to use a document database (MongoDB) with a graph database (Neo4j) in a single application. Specifically, we will show how MongoDB can be used to provide search and browsing functionality for a product catalog while using Neo4j to provide personalized product recommendations. Finally we will look at the Neo4j Doc Manager project which facilitates syncing data from MongoDB to Neo4j to make polyglot persistence with MongoDB and Neo4j much easier.
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...Lucidworks
The document discusses a reference architecture for searching and querying knowledge graphs with Solr/SIREn. It describes challenges in indexing and searching knowledge graphs due to their complex relational structure and diversity of data. The proposed architecture aims to simplify the task by reducing custom code through standardized tools and enabling quick adaptation to changes in data schemas or requirements. Key components include using SPARQL to extract relevant graph subsets and map them to a simplified schema, generating JSON documents from the extracted subgraphs for indexing, and leveraging the SIREn plugin to support structured queries over nested and relational data.
MongoDB is the most famous and loved NoSQL database. It has many features that are easy to handle when compared to conventional RDBMS. These slides contain the basics of MongoDB.
Introduction to Solr, presented at Bangkok meetup in April 2014:
http://www.meetup.com/bkk-web/events/172090992/
Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source).
Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.
Overview of structured search technology. Using the structure of a document to create better search results for document search and retrieval.
How both search precision and recall is improved when the structure of a document is used.
How a keyword match in a title of a document can be used to boost the search score.
Case studies with the eXist native XML database.
Steps to set up a pilot project.
- The document discusses an internship report on iOS technology. The intern installed Xcode 6.4 and learned Objective-C programming. They built an iOS application using Xcode and gathered requirements from the design team. They also worked on product documentation.
The document discusses Boolean logic and how it can be used for effective searching. It begins by explaining what Boolean logic is, naming its origins with George Boole. It then outlines the main Boolean operators - AND, OR, NOT, NEAR - and modifiers like quotation marks, parentheses, and wildcards. Examples are given to demonstrate how each operator works. The document emphasizes that Boolean logic allows recruiters to search more efficiently by combining terms and excluding unwanted results. It encourages readers to practice with sample searches to learn how Boolean works for optimizing online searches.
The document discusses Boolean logic and operators that can be used to conduct effective searches of resume databases. It begins by explaining that Boolean logic was invented by George Boole and forms the foundation of digital circuits and internet search engines. The key Boolean operators - AND, OR, NOT, NEAR - and modifiers like quotation marks and parentheses are then defined and examples are provided to illustrate how to use them to search for specific skills combinations or exclude unwanted domains. Recommendations note the importance of covering synonyms and using the NOT operator carefully.
The document provides an overview of iOS training for day 1, which includes introductions to iOS, Objective-C and Swift programming languages, Xcode IDE, Cocoa and Cocoa Touch frameworks, Model-View-Controller architecture, and best practices for iOS development such as project structure, constants, minimum iOS version requirements, and coding style conventions.
Longwell is a tool that provides a graphical interface for exploring RDF data in a web browser. It displays types of resources as filters along the top and facets like properties on the right. Users can browse data by selecting types to view associated resources and properties. Queries powering Longwell return type and property frequencies to display, list properties for a selected type, and populate property panels with object values to enable interactive faceted browsing of RDF datasets.
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedis Labs
This document summarizes a presentation about RediSearch and CRDT.
The presentation covered:
1. An overview of RediSearch and how it can be used for full text search and as a secondary index.
2. A demonstration of RediSearch benchmarking where it indexed a Wikipedia dataset faster than Elasticsearch and returned search results faster.
3. How RediSearch supports a multi-tenant search application with isolated indexes for each tenant, and how it outperformed Elasticsearch in indexing 25 million documents across 50,000 tenants.
4. An explanation of CRDT and how it allows for consensus-free replication between RediSearch instances for an active-active multi-site search engine with
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
The document discusses implementing conceptual search in Solr. It describes how conceptual search aims to improve recall without reducing precision by matching documents based on concepts rather than keywords alone. It explains how Word2Vec can be used to learn related concepts from documents and represent words as vectors, which can then be embedded in Solr through synonym filters and payloads to enable conceptual search queries. This allows retrieving more relevant documents that do not contain the exact search terms but are still conceptually related.
This slide was presented during the Latino Web Developer NYC meetup. Learn the new flexbox grid and components of bootstrap 4. Customize styles using the source Sass files - Michael Posso @micposso
Similar to Building japanese full text search system by Solr (20)
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Building japanese full text search system by Solr
1. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
Building Japanese Full-Text Search System
by Solr
― Document Seach and Application
to Online Shopping Site —
1
Syuta Hashimoto
opensuse-ja
2. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
Self Introduction
・Syuta Hashimoto @hashimotosyuta
I have worked at Web Product base on open source
eg. Online Shopping site, promotion site, CMS
・ With openSUSE
ー I have used openSUSE
for 4 years in my
home.
I love geeko!
2
3. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
Main Topic
TABLE
1 What is Full-Text Search?
2 What is Solr?
3 Let’s use!
4 What is Index?
5 Structure and Role
6 Solr can search from RDBMS!
7 Facet is easy to count
8 Highlighter is easy to highlight and more functions.
3
※You need RDBMS basic knowledge
4. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Q:What is Full-text Search?
A:Search from Full-text!(maybe)
and Search from Full-text in Multiple Files!
“Multiple Files” is important at “full-text search”
and “enterprise search”
・Point 1
Usually, the Full-text Search have two types.
・Serial Scan Type
・Index Type ←Today’s menu
4
5. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
USECASE
5
I want to search by word
“openSUSE” from those files!
6. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Full-text Search Type1 ”Serial Scan Type”
6
# grep -r ‘openSUSE’ files_A ① ② ③
# soffice files_B/LibreOffice Writer.odt → Ctrl + F ④
# soffice files_B/LibreOffice Calc.ods → Ctrl + F ⑤
# okular files_B/pdf.pdf → Ctrl + F ⑥
① ② ③ ④ ⑤ ⑥
For example, search sequential this method
7. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Full-text Search Type1 ”Serial Scan Type”
# grep -r ‘hogehoge’
⇢”Serial Scan Type” search ‘hogehoge’ word from files
under the currentdirectory.
ー Pros
・easy
ー Cons
・slow
・difficult to search from a rich text (e.g. Word)
・many search noise
7
8. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Full-text Search Type2 ”Index Type”
8
# curl
‘http://localhost:8983/solr/techproducts/select?indent=on&q=*:o
penSUSE&wt=json’ ① ←Today’s topic
①
To make index beforehand
You can search at once from
index made by to search easy
9. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 What is Full-text Search?
Full-text Search Type2 ”Index Type”
“Index Type” make Index about a word that we will
search in advance, and search from that index.
ー Pros
・fast
・Index Type can search from a rich text (e.g. Word)
if Index Type can index.
・less search noise
ー Cons
・you have to build search system
・you need to index what you want to search files
9
10. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
2 What is Solr?
About Solr
・Index type full-text search system
・The sub project in Apache Lucene(™)
→Apache Lucene is full-text search library
Solr use this library. so Solr is open source too.
・Because the access is possible like WebAPI,
The client is OK in anything!
・There is the competitive product
called the “elasticsearch”
10
11. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use!
To build at onse! (for local)
1 Install JVM. java version is 1.8 or later.
(Leap 42.3 has been already installed.)
2 Download Solr
You can download Solr from Solr official site. now version is 7.0.1
http://www.apache.org/dyn/closer.lua/lucene/solr/7.0.1
The zip file has all set.
3 Extract zip file
# unzip solr-7.0.1.zip
and move
# cd solr-7.0.1
11
12. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use!
12
Starting、Creating core、Indexing
4 # bin/solr start ←At first, Starting Solr.(no core, no index)
5 # bin/solr create -c mycore
←Creating core by the name of “mycore”
6 # bin/post -c mycore /home/hashimoto/doc/*
←indexing from files to “mycore”
「bin/post」indexing automacically
・ ・ ・ (outputing indexing logs….)
It is COMPLETED
※Solr official site has tutorial too.(It can experience cluster)
13. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use!
Important Words
・CORE
Core is equivalent to a RDBMS schema.
Core has index format and query settings and more.
When say roughly, search engine itself.
・Schema definition
It calls index format a schema in Solr.
It is like RDBMS table.
・Index
A Data which indexing from target files
according to a schema definition.
13
14. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use!
Solr has “Admin UI” by default
After Starting, to Access http://localhost:8983/solr/ ….
14
Admin UI is displayed
15. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use!
“mycore” is registered.
15
“mycore” is
registered properly
16. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
3 Let’s use!
You can search from “Query” in “mycore”
16
①this is
“Query”
②Input
search
word
③execute
④result is here
17. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
What is index?
17
This is.
The contents is correspondence
of a word of each files to a file
name.
18. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
18
The contents of index
(image of index)
so when you search for the word “openSUSE”, responding
immediately “text1.txt” and “LibreOfficeWriter.ods” has that.
WORD FILE WHICH HAS WORD
openSUSE text1.txt LibreOffice Writer.ods
conference text2.txt pdf.pdf
・・・ ・・・
19. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
19
Index definition = Schema
A definition is called schema.
It is as follows to define by schema.
・Field
column saying by RDBMS. designated field type.
a text is broken into word and is registered.
・Field Type
field definition. defining numeric or string and
whether to do or not morphological analysis
・There is Dynamic Field and Copy Field.
(today, these are omitted.)
20. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
Indexing
20
Indexing is “registering to field according to the field
definition” about “content of search target file”
By The Way・・・
When register to the field, doing something
about easy search.
(Doing something is defined in field type)
21. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
4 What is Index?
21
Doing something?
・For example, converting all letters to lowercase.
→”linux” or “Linux” or “LINUX”, convert all of those to “linux”(lowercase).
when searching does the same conversion, can hit all “linux”.
・In Japanese, dividing on the basis of part of speech.
「私は東京都で開催されるアジアサミットに行きます。」
→「私-は-東京-都-で-開催-さ-れる-アジア-サミット-に-行き-ます」
this case search word “東京” is hit but search word “京都” is not hit.
reducing search noise when search from many files.
It is profound technique
called the “morphological analysis”
22. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
5 Structure and Role
22
Components Figure
①search
④result
①registration
②indexing
②query
③result
access Solr by REST-api
23. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Set Up is finished! enjoy good search life!!
23
What’s? My Shopping site have
data in MySQL.
Item description like search is too late….
Oh…..
24. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
DataImportHandler
In fact, Solr has a structure that can
search from RDBMS and more data source.
by a viewpoint from “full-text search”, It
expects search at item description on online
shopping site.
but, Solr can use facet search and
highlighter, so more useful.
24
25. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
25
Components Figure when using RDBMS
①search
④result
①registration
②indexing
②query
③result
・・・ RDBMS
26. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Logical structure
2626
Searching “geeko” at
description
Result is in the
data of name is
“openSUSE”
Solr let the field of a schema and a column
of RDBMS be equivalent and index it.
RDBMS
Schema
Field name=id
Field name=name
Field name=description
id name description
1 openSUSE geeko is cute!!
27. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Settings is slightly troublesome
● Put connector for RDBMS access.
→Put JDBC connector in “server/lib”
● Field Definition
→at next page
● Write settings in solrconfig.xml (setting file of core)
・Read DataImportHandler library
・Declare useing DataImportHandler and setting file *a
● Setting file for DataImportHandler(*a’s file)
・RDBMS connection settings
・Correspondence of a field and the SQL
27
This is an overview.
please see other
document for detail.
28. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Field Definition
Define schema at admin ui to be quick
28
①select
“Schema”
②choice
“Add Field”
③set each
settings,
and click
the “Add
Field”
29. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
6 Solr can search from RDBMS!
Setup is finished! Let’s import.
In the usual way REST-api.
http://localhost:8983/solr/mycore/dataimport?command=full-i
mport
29
Our “mycore”
Incidentally,
URI「/dataimport」 is defined at
requestHandler setting in
solrconfig.xml
Importing is finished only in this.
You can seach in admin ui.
For practical use, designing to import difference or
timming of importing.
30. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
7 Facet is easy to count
Facet Search
This is a function to count after grouping.
For example, to get a count of a type in this case.
30
id name description type
1 docker container virtualization
2 emacs multiple editor editor
3 vim multiple editor editor
4 chrome browser browser
5 firefox browser browser
6 sleipnir browser browser
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"type":[
"virtualization",1,
"editor",2
“browser”,3]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}
31. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
7 Facet is easy to count
Facet Search REST-api
The way is to add query field of facet search.
http://localhost:8983/solr/mycore/select?facet=on&facet.field=
type&indent=on&q=*:*&wt=json
31
・facet=on
Enable facet search
・facet.field=type
grouping and count by “type”
Of cource, facet search can be combined with a normal
search.
32. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
8 Highlighter is easy to highlight
Highlighter
Solr can get result of highrigt separately from normal
result.
For example, To search at “worldwide” from a
description in this data.
32
id name description
1 openSUSE The openSUSE project is a worldwide effort that promotes
the use of Linux everywhere. openSUSE creates one of
the world's best Linux distributions, working together in an
open, transparent and friendly manner as part of the
worldwide Free and Open Source Software community.
33. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
8 Highlighter is easy to highlight
To search with highlighter….
33
"highlighting":{
"1":{
"description":["The openSUSE project is a <em>worldwide</em> effort
that promotes the use of "]}}
The openSUSE project is a worldwide effort that promotes the use of Linux
everywhere. openSUSE creates one of the world's best Linux distributions,
working together in an open, transparent and friendly manner as part of the
worldwide Free and Open Source Software community.
“worldwide” word is surrounded by <em>
tag. and retrieve text around the word.
34. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
8 Highlighter is easy to highlight
Search with Highligter REST-api
In the usual way to add query parameter.
http://localhost:8983/solr/mycore/select?hl=on&hl.fl=descripti
on&indent=on&q=description:worldwide&wt=json
34
・hl=on
Highligter on
・hl.fl=description
Assign description field for highlight
35. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
8 Highlighter is easy to highlight
Setting of Highlighter
・”searchComponent” section in solrconfig.xml.
・To set several things to field.
a. set “stored” that keep retrieved data is true.
b. set things analysing to fieldtype.
Highrighter can set some combination.
You can use a default, but settings can careful control.
hl.method
hl.qparser
hl.requireFieldMatch
hl.usePhraseHighlighter
etc...
35
36. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
And more functions
Spatial
36
37. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
And more functions
Cloud
37
Recommend
No Image
38. Building Japanese Full-Text Search System by Solr #openSUSE.Asia Summit2017 10/21
1 Solr is index type full-text search system.
2 Field definition is called “Schema”
This decides a structure of an index.
3 Solr can search from RDBMS too.
4 facet search, Highlighter is too easy.
8 ハイラGood Search Life!!
Have a lot of fun...
38
Today’s summary