The document describes a presentation given at KohaCon12 about adding browse functionality to Koha using Solr. It details the motivation for adding browse, the design of documents in the Solr index, how the index is loaded and synchronized with Koha, and how browse lists and results are queried. The goal is to provide a way to browse alphabetical lists of headings extracted from authority and bibliographic records in Koha.
RefWorks is a web-based citation management tool that allows users to import references from online databases, organize them into folders, insert citations into Word documents, and automatically generate bibliographies. The tutorial covers how to create a RefWorks account, import references from databases, organize references into folders, insert citations into a Word document using Write-N-Cite, and generate a bibliography from the citations. Users are instructed to contact the RefWorks administrator for login details and access online tutorials for more in-depth training.
Apache Solr is an open-source search platform that allows users to index and search large volumes of textual data. It works by indexing documents that are added to its cores, or indexes, and then allows users to query those indexes to retrieve relevant results. Some key uses of Solr include powering search engines, enabling geospatial search, and performing analytics on large datasets. The document outlines how to install and run Solr, configure cores, and introduces some basic Solr concepts like schemas, documents, fields, analyzers, and queries. It also briefly discusses some alternatives to Solr like Elasticsearch and Algolia.
1. The document provides information about various research resources for childhood education available at CCNY Libraries, including databases, journals, and other materials.
2. It describes how to access and search key databases like EBSCOhost, ERIC, JSTOR, and subject-specific databases, as well as how to find and request print journals and books.
3. Instructions are given for exporting citations to RefWorks, creating bibliographies, and accessing full text through interlibrary loans when articles are not available directly.
Solr is an open source enterprise search platform that provides powerful full-text search, hit highlighting, faceted search, database integration, and document handling capabilities. It uses Apache Lucene under the hood for indexing and search, and provides REST-like APIs, a web admin interface, and SolrJ for indexing and querying. Solr allows adding, deleting, and updating documents in its index via HTTP requests and can index documents in various formats including XML, CSV, and rich documents using Apache Tika. It provides distributed search capabilities and can be configured for high availability and scalability.
This document provides a step-by-step guide to using RefWorks, including:
1. Accessing RefWorks through the library website or directly at www.refworks.com and registering for an individual account.
2. Searching databases from the library website, selecting references to export, and exporting them directly into RefWorks.
3. Organizing references in RefWorks by moving them between folders or creating new folders.
4. Using the Write-N-Cite plugin in Microsoft Word to insert citations from RefWorks into a paper and generate a bibliography.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward. Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
Come learn how you can get your content into Solr and integrate it into your applications!
SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).
RefWorks is a research organization and citation tool that allows users to: save references from databases like Primo and Avery directly into RefWorks, organize references into folders, create bibliographies in various citation styles, and share references with other users. Key features include RefShare for sharing folders, RefGrab-it for adding website references, and Write-n-Cite for inserting citations into papers as you write. Help is available through online tutorials or by emailing the RefWorks team.
RefWorks is a web-based citation management tool that allows users to import references from online databases, organize them into folders, insert citations into Word documents, and automatically generate bibliographies. The tutorial covers how to create a RefWorks account, import references from databases, organize references into folders, insert citations into a Word document using Write-N-Cite, and generate a bibliography from the citations. Users are instructed to contact the RefWorks administrator for login details and access online tutorials for more in-depth training.
Apache Solr is an open-source search platform that allows users to index and search large volumes of textual data. It works by indexing documents that are added to its cores, or indexes, and then allows users to query those indexes to retrieve relevant results. Some key uses of Solr include powering search engines, enabling geospatial search, and performing analytics on large datasets. The document outlines how to install and run Solr, configure cores, and introduces some basic Solr concepts like schemas, documents, fields, analyzers, and queries. It also briefly discusses some alternatives to Solr like Elasticsearch and Algolia.
1. The document provides information about various research resources for childhood education available at CCNY Libraries, including databases, journals, and other materials.
2. It describes how to access and search key databases like EBSCOhost, ERIC, JSTOR, and subject-specific databases, as well as how to find and request print journals and books.
3. Instructions are given for exporting citations to RefWorks, creating bibliographies, and accessing full text through interlibrary loans when articles are not available directly.
Solr is an open source enterprise search platform that provides powerful full-text search, hit highlighting, faceted search, database integration, and document handling capabilities. It uses Apache Lucene under the hood for indexing and search, and provides REST-like APIs, a web admin interface, and SolrJ for indexing and querying. Solr allows adding, deleting, and updating documents in its index via HTTP requests and can index documents in various formats including XML, CSV, and rich documents using Apache Tika. It provides distributed search capabilities and can be configured for high availability and scalability.
This document provides a step-by-step guide to using RefWorks, including:
1. Accessing RefWorks through the library website or directly at www.refworks.com and registering for an individual account.
2. Searching databases from the library website, selecting references to export, and exporting them directly into RefWorks.
3. Organizing references in RefWorks by moving them between folders or creating new folders.
4. Using the Write-N-Cite plugin in Microsoft Word to insert citations from RefWorks into a paper and generate a bibliography.
Apache Solr serves search requests at the enterprises and the largest companies around the world. Built on top of the top-notch Apache Lucene library, Solr makes indexing and searching integration into your applications straightforward. Solr provides faceted navigation, spell checking, highlighting, clustering, grouping, and other search features. Solr also scales query volume with replication and collection size with distributed capabilities. Solr can index rich documents such as PDF, Word, HTML, and other file types.
Come learn how you can get your content into Solr and integrate it into your applications!
SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).
RefWorks is a research organization and citation tool that allows users to: save references from databases like Primo and Avery directly into RefWorks, organize references into folders, create bibliographies in various citation styles, and share references with other users. Key features include RefShare for sharing folders, RefGrab-it for adding website references, and Write-n-Cite for inserting citations into papers as you write. Help is available through online tutorials or by emailing the RefWorks team.
Apache Solr is the popular, blazing fast open source enterprise search platform; it uses
Lucene as its core search engine. Solr’s major features include powerful full-text search, hit
highlighting, faceted search, dynamic clustering, database integration, and complex queries.
Solr is highly scalable, providing distributed search and index replication, and it powers the
search and navigation features of many of the world's largest internet sites.
This document outlines a library instruction session on using RefWorks for COM 501. It discusses setting up a library PIN, creating RefWorks accounts and folders, searching for books and exporting citations to RefWorks, searching article databases and exporting citations, creating bibliographies in RefWorks, and search strategies. Attendees are asked to email their RefWorks bibliography and feedback on the session to the instructor.
This document provides an introduction and overview of RefWorks, a citation management tool. It discusses why RefWorks is useful, including its accessibility, privacy features, support for various databases, and ability to organize references and generate bibliographies. It then covers creating a RefWorks account, adding references from databases, text files or manually, organizing references into folders, and using features like Write-N-Cite to insert citations into Word documents and RefGrab-It to import web pages. Advanced features and getting additional help with citations are also mentioned.
Tips for fixing OCLC Knowledge Base broken linksJeff Siemon
How to Add subtitles to prevent incorrect matches in Discovery to Books with similar titles (many are HathiTrust short titles).
How to add a primary=override OCN to Knowledge base titles without a primary OCNs.
How to change (override) a print book primary OCLC number with the best eBook OCN.
How to add new Open Access collections to your catalog.
How to add free or open access journals not in any KB collection to the collection: “Other Free Journals” collection .ID:freeAccess.misc
Apache Solr is an open-source enterprise search platform that provides fast, scalable, and reliable full-text search functionality. It powers the search capabilities of many large websites and applications. Some key features of Solr include fast indexing and search, faceted search, autocomplete, geospatial search, and integration with various databases and applications.
Mahara is an open source e-portfolio application created by the New Zealand government that allows users to create and maintain a digital portfolio of their learning and work. It provides social networking features to allow users to interact. Mahara can be installed on a server and requires additional software like Apache and PostgreSQL to run. It offers a demo site for trial use that resets daily, requiring exported portfolios to retain work. The main Mahara features include profile creation, journaling, file storage, resume building, collections, sharing, exporting, groups, and messaging capabilities.
Salesforce Admin's guide : the data loader from the command lineCyrille Coeurjoly
Hacks, Habits and Helpful Hints : The salesforce Admin's reference guide. This short guide explain how to use the salesforce data loader in a command line; No more clics, no more errors.
This document discusses building distributed search applications using Apache Solr. It provides an agenda that covers topics such as Solr architecture, schema configuration, indexing data, querying, SolrCloud, and performance factors. It also references a demo app that will be used for hands-on examples during the presentation.
This document proposes the development of a data loader tool with the following key capabilities:
- Load data from a text file into backend databases like MS Access, MySQL, Oracle or FoxPro.
- Import and export tables between different backend databases.
- Encrypt text files and decrypt encrypted text files.
The data loader tool would streamline data loading and transferring processes currently done using multiple individual tools. It aims to provide an easy to use interface to perform these functions.
Managing Electronic Collections in Alma presented at the 2016 GaCOMO in Athens as part of the Pre-Conference sponsored by TSIG and the Cataloging Functional Group of GIL.
This document provides an introduction to enterprise search and its key components. It discusses how search engines work by building indexes on text and answering queries using those indexes. The two main components are indexing, which structures data for easy searching, and search, which returns results based on user queries against the index. It introduces common file formats that can be indexed like text, HTML, PDFs. Lucene and Solr are introduced as open source search libraries, with Solr building on Lucene and adding features like indexing, querying via HTTP, and admin interfaces. The document demonstrates adding, deleting, and searching for documents in Solr.
This document discusses input/output (I/O) streams in C++. It covers declaring input and output stream variables to connect to files, using stream member functions like open() to connect streams to files, and extraction and insertion operators to read from and write to streams. It also describes formatting output using stream manipulators and functions like setw(), setprecision(), and flags like fixed and showpoint.
Mendeley is a desktop and web program for managing and sharing research papers,discovering research data and collaborating online. It combines Mendeley Desktop, a PDF and reference management application (available for Windows, OS X and Linux) and Mendeley for Android and iOS, with Mendeley Web, an online social network for researchers.
Mendeley requires the user to store all basic citation data on its servers—storing copies of documents is at the user's discretion. Upon registration, Mendeley provides the user with 2 GB of free web storage space, which is upgradeable at a cost.
This document provides an overview of how to use RefWorks, a citation management tool. It describes how RefWorks allows users to create personal databases of references without special software, import references from databases with a click of a button, organize and search references, and automatically generate citations and bibliographies in Word documents. It then provides step-by-step instructions on signing up for a RefWorks account, importing references from databases and websites, organizing references into folders, and using the Write-N-Cite plugin to insert citations into Word papers.
This document provides instructions for installing the Koha library management system using a live DVD on a PC. It outlines the steps to boot from the DVD, select installation options like language and time zone, create a system user account, and complete the installation process. Upon restarting, it notes where to find information on accessing the Koha interface and login credentials. It also includes information on rebuilding indexes, backing up the database, and helpful support links.
This document summarizes Chapter 6 on I/O streams as an introduction to objects and classes in C++. It discusses streams and basic file I/O, including declaring input and output stream variables to connect to files, reading from and writing to files using extraction and insertion operators, and using member functions like open() and fail() for file operations. The key topics are objects and classes in C++ and how streams are used to perform input and output with files.
Using Youth Development Approach to Foster Global Learning through Media & Te...pasesetter230
This document provides an overview and agenda for a workshop on using a youth development approach to foster global learning through media and technology. The workshop will discuss how afterschool programs can build youth's global competence and outcomes like improved life skills and relationships. It will provide examples of programs like World Savvy that integrate global learning, media literacy, and civic engagement. It will also overview resources from organizations like PASE, TeachUNICEF, and UNICEF that support global citizenship education and connecting classrooms internationally.
Zettastrom is a design and digital agency that provides strategy, design, and development services. They help brands with UX strategy, content strategy, digital marketing, web and mobile applications, and digital product creation. Some of the clients they have worked with include Yves Saint Laurent, Shu Uemura, Sony, and Matrix. Case studies describe campaigns they designed and executed that increased engagement, sales, and growth across key metrics for each brand.
Apache Solr is the popular, blazing fast open source enterprise search platform; it uses
Lucene as its core search engine. Solr’s major features include powerful full-text search, hit
highlighting, faceted search, dynamic clustering, database integration, and complex queries.
Solr is highly scalable, providing distributed search and index replication, and it powers the
search and navigation features of many of the world's largest internet sites.
This document outlines a library instruction session on using RefWorks for COM 501. It discusses setting up a library PIN, creating RefWorks accounts and folders, searching for books and exporting citations to RefWorks, searching article databases and exporting citations, creating bibliographies in RefWorks, and search strategies. Attendees are asked to email their RefWorks bibliography and feedback on the session to the instructor.
This document provides an introduction and overview of RefWorks, a citation management tool. It discusses why RefWorks is useful, including its accessibility, privacy features, support for various databases, and ability to organize references and generate bibliographies. It then covers creating a RefWorks account, adding references from databases, text files or manually, organizing references into folders, and using features like Write-N-Cite to insert citations into Word documents and RefGrab-It to import web pages. Advanced features and getting additional help with citations are also mentioned.
Tips for fixing OCLC Knowledge Base broken linksJeff Siemon
How to Add subtitles to prevent incorrect matches in Discovery to Books with similar titles (many are HathiTrust short titles).
How to add a primary=override OCN to Knowledge base titles without a primary OCNs.
How to change (override) a print book primary OCLC number with the best eBook OCN.
How to add new Open Access collections to your catalog.
How to add free or open access journals not in any KB collection to the collection: “Other Free Journals” collection .ID:freeAccess.misc
Apache Solr is an open-source enterprise search platform that provides fast, scalable, and reliable full-text search functionality. It powers the search capabilities of many large websites and applications. Some key features of Solr include fast indexing and search, faceted search, autocomplete, geospatial search, and integration with various databases and applications.
Mahara is an open source e-portfolio application created by the New Zealand government that allows users to create and maintain a digital portfolio of their learning and work. It provides social networking features to allow users to interact. Mahara can be installed on a server and requires additional software like Apache and PostgreSQL to run. It offers a demo site for trial use that resets daily, requiring exported portfolios to retain work. The main Mahara features include profile creation, journaling, file storage, resume building, collections, sharing, exporting, groups, and messaging capabilities.
Salesforce Admin's guide : the data loader from the command lineCyrille Coeurjoly
Hacks, Habits and Helpful Hints : The salesforce Admin's reference guide. This short guide explain how to use the salesforce data loader in a command line; No more clics, no more errors.
This document discusses building distributed search applications using Apache Solr. It provides an agenda that covers topics such as Solr architecture, schema configuration, indexing data, querying, SolrCloud, and performance factors. It also references a demo app that will be used for hands-on examples during the presentation.
This document proposes the development of a data loader tool with the following key capabilities:
- Load data from a text file into backend databases like MS Access, MySQL, Oracle or FoxPro.
- Import and export tables between different backend databases.
- Encrypt text files and decrypt encrypted text files.
The data loader tool would streamline data loading and transferring processes currently done using multiple individual tools. It aims to provide an easy to use interface to perform these functions.
Managing Electronic Collections in Alma presented at the 2016 GaCOMO in Athens as part of the Pre-Conference sponsored by TSIG and the Cataloging Functional Group of GIL.
This document provides an introduction to enterprise search and its key components. It discusses how search engines work by building indexes on text and answering queries using those indexes. The two main components are indexing, which structures data for easy searching, and search, which returns results based on user queries against the index. It introduces common file formats that can be indexed like text, HTML, PDFs. Lucene and Solr are introduced as open source search libraries, with Solr building on Lucene and adding features like indexing, querying via HTTP, and admin interfaces. The document demonstrates adding, deleting, and searching for documents in Solr.
This document discusses input/output (I/O) streams in C++. It covers declaring input and output stream variables to connect to files, using stream member functions like open() to connect streams to files, and extraction and insertion operators to read from and write to streams. It also describes formatting output using stream manipulators and functions like setw(), setprecision(), and flags like fixed and showpoint.
Mendeley is a desktop and web program for managing and sharing research papers,discovering research data and collaborating online. It combines Mendeley Desktop, a PDF and reference management application (available for Windows, OS X and Linux) and Mendeley for Android and iOS, with Mendeley Web, an online social network for researchers.
Mendeley requires the user to store all basic citation data on its servers—storing copies of documents is at the user's discretion. Upon registration, Mendeley provides the user with 2 GB of free web storage space, which is upgradeable at a cost.
This document provides an overview of how to use RefWorks, a citation management tool. It describes how RefWorks allows users to create personal databases of references without special software, import references from databases with a click of a button, organize and search references, and automatically generate citations and bibliographies in Word documents. It then provides step-by-step instructions on signing up for a RefWorks account, importing references from databases and websites, organizing references into folders, and using the Write-N-Cite plugin to insert citations into Word papers.
This document provides instructions for installing the Koha library management system using a live DVD on a PC. It outlines the steps to boot from the DVD, select installation options like language and time zone, create a system user account, and complete the installation process. Upon restarting, it notes where to find information on accessing the Koha interface and login credentials. It also includes information on rebuilding indexes, backing up the database, and helpful support links.
This document summarizes Chapter 6 on I/O streams as an introduction to objects and classes in C++. It discusses streams and basic file I/O, including declaring input and output stream variables to connect to files, reading from and writing to files using extraction and insertion operators, and using member functions like open() and fail() for file operations. The key topics are objects and classes in C++ and how streams are used to perform input and output with files.
Using Youth Development Approach to Foster Global Learning through Media & Te...pasesetter230
This document provides an overview and agenda for a workshop on using a youth development approach to foster global learning through media and technology. The workshop will discuss how afterschool programs can build youth's global competence and outcomes like improved life skills and relationships. It will provide examples of programs like World Savvy that integrate global learning, media literacy, and civic engagement. It will also overview resources from organizations like PASE, TeachUNICEF, and UNICEF that support global citizenship education and connecting classrooms internationally.
Zettastrom is a design and digital agency that provides strategy, design, and development services. They help brands with UX strategy, content strategy, digital marketing, web and mobile applications, and digital product creation. Some of the clients they have worked with include Yves Saint Laurent, Shu Uemura, Sony, and Matrix. Case studies describe campaigns they designed and executed that increased engagement, sales, and growth across key metrics for each brand.
Publication cover management in a library system (text)Stefano Bargioni
Book covers can be stored in a Library Management System. This work -presented at 33rd ADLUG meeting in Piazza Armerina, oct 2014- discusses pros and cons, and how to collect book covers during cataloguing or circulation operations.
UN Aviation Group Promotes Environmental SustainabilityDave Pflieger
The International Civil Aviation Organization is a specialized, data-driven agency of the United Nations that is focused on safety, efficiency, and sustainability in world-wide aviation. The agency works closely with UN member nations, as well as aviation industry stakeholders, to promote worldwide agreement on standards and practices that will foster safe air travel for the public and sustainable environmental practices for the industry.
- India has experienced rapid growth in telecommunications over the last 10 years, adding over 750 million phones. However, revenue growth has been slower at 4% as tariffs are the lowest in the world.
- Rural areas will be the main driver of future growth as tele-density and broadband penetration are still low compared to urban areas. The government aims to increase rural tele-density and roll out broadband infrastructure nationwide.
- Telcos are focusing on upgrading networks to support 3G and investing in fiber infrastructure to prepare for 4G/LTE. However, financial constraints have led many operators to reduce capex. Consolidation in the industry is expected with the number of operators decreasing from 14 currently.
HOW TO USE APACHE SOLR TO THE FULLEST EFFORTS: A TECHNICAL EXPLORATION OF SEARCH INDEXING
A search tool improves a website's user experience by making it easier and faster for a user to find what they're looking for. Greater emphasis should be placed on huge, e-commerce, and dynamically updated websites (news sites, blogs).
One of the most well-liked search engines utilized by websites of all sizes is Apache Solr. It is a Java-based open-source search engine that enables you to look up information such as articles, goods, customer reviews, and more. In this article, we will examine Apache Solr in more detail.
What makes Apache Solr so well-liked?
Full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features (non-relational database), and rich document handling are all features of Apace Solr Web Development that make it quick and versatile. These features include the ability to index a variety of document formats, including PDF, MS Office, and Open Office, as well as the ability to index new content instantly.
Some useful information regarding Apache Solr
As a search engine for their websites and publications, CNET Networks, Inc. initially created it. Later, it became an Apache top-level project after being open-sourced. Supports a variety of programming languages, including Ruby, PHP, Java, and Python. Additionally, it offers these languages' APIs.
Has integrated capability for geographic search, enabling location-based content searches. Particularly beneficial for websites like tourism and real estate portals. Use APIs and plugins to support sophisticated search capabilities like spell checking, autocomplete, and custom search. Use Lucene for searching and indexing. What is Apache Lucene An open-source Java search library called Lucene makes it simple to incorporate search or information retrieval into an application. It utilizes a robust search algorithm and is adaptable, strong, and accurate.
Although Lucene is best recognized for its full-text search capabilities, it may also be used to classify documents, analyze data, and retrieve information. Along with English, it also supports a wide variety of additional languages, including German, French, Spanish, Chinese, and Japanese.
Describe indexing
Indexing is the first step for all search engines. The conversion of original data into a highly effective cross-reference lookup to speed up search is known as indexing. Data is not directly indexed by search engines. Tokens (atomic components) are first separated out from the texts. Consulting the search index and obtaining the document that matches the query constitute searching.
Benefits of indexing
• Information retrieval that is quick and accurate (collects, parses, and saves)
• The search engine needs extra time to scan each document without indexing.
• indices of flow
• indices of flow
The document will first be examined and divided into tokens.
This document provides a tutorial for using Apache Solr to index and search plain text files, HTML files, and remote HTML files. It introduces Solr Cell, a new module in Solr 1.4 that can access many file formats. The tutorial walks through installing the example files, indexing text files with cURL and Solr Cell, indexing all files in a directory with a shell script, indexing HTML files and extracting metadata, and indexing remote files by downloading them. The goal is to demonstrate indexing common file types that users are likely to have access to.
Apache Solr is the popular, blazing fast open source enterprise search platform; it uses
Lucene as its core search engine. Solr’s major features include powerful full-text search, hit
highlighting, faceted search, dynamic clustering, database integration, and complex queries.
Solr is highly scalable, providing distributed search and index replication, and it powers the
search and navigation features of many of the world's largest internet sites.
Quick intro to RDA for my staff includes basic overview of how RDA differs from AACR2, MARC, FRBR, and the Semantic Web. Includes examples. by robin fay for UGA Libraries/ DBM, georgiawebgurl@gmail.com
The document provides an introduction to Apache Solr, an open source enterprise search platform. It outlines the objectives of the training, which are to understand the need for enterprise search, how indexing and searching works in Lucene and Solr, Solr features like faceting and highlighting, and job opportunities for Solr developers. The training will cover topics such as Solr architecture, indexing, querying, analysis, and configuration using solrconfig.xml.
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Sitenyccamp
The Apache Solr Search Integration module provides integration with the (free, open-source) Apache Solr server. This great combination of Drupal with a powerful and flexible search server will make your site irresistible to visitors by providing advanced search features like faceting filtering and by delivering the most relevant search results from your site. The module has been re-written for Drupal 7 to integrate with Facet API and those changes have been backported to a new Drupal 6 branch. Thus, you can use this module for all your projects, as well as setting up a shared search index that allows you to search across different Drupal 6 and Drupal 7 sites. This talk will focus on explaining configurations options in the admin UI to help you quickly and confidently configure the facets, pages, related content blocks, and other features for your site. Highlights may include:
- What are the key Solr concepts you need to understand to get the most out of Solr integration?
- How is the module admin UI organized?
- How do I configure facets, sorts, and content recommendation blocks?
- How can I use additional modules to index file attachments?
This documentation is generally for emerging search technology apache solr .
Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.
This document provides an overview of cataloguing for library and information professionals. It defines cataloguing and its purpose of facilitating access and discovery. Key terms are introduced, and the differences between cataloguing in public, academic, and special libraries are explored. The general process of cataloguing a resource is outlined, including using standards like AACR2, subject headings, and classification systems. MARC format and creating bibliographic records is also summarized. Additional resources for learning more about cataloguing are provided.
A very basic overview of RDA, updated. This presentation is appropriate for all library staff including those outside of cataloging, library science students, and others.
Elasticsearch is a distributed, RESTful, free and open source search engine based on Apache Lucene. It allows for fast full text searches across large volumes of data. Documents are indexed in Elasticsearch to build an inverted index that allows for fast keyword searches. The index maps words or numbers to their locations in documents for fast retrieval. Elasticsearch uses Apache Lucene to create and manage the inverted index.
Elasticsearch offers several advantages over Apache Solr including being more easily distributed, replicated, and supporting real-time indexing. It allows for easy sharding and replication of indexes across multiple nodes. However, Elasticsearch lacks some features found in Solr such as spell checking, date math, and facet pagination. The document provides an overview of the similarities and differences between Elasticsearch and Solr for choosing between the two search servers.
Catalog enrichment: importing Dewey Decimal Classification from external sour...Stefano Bargioni
Usually, important catalogs are accessed for copy-cataloguing whole records. It is possible to retrieve "atomic" information too, using unique keys like ISBN.
Library at Pontificia Università della S. Croce developed a tool that allows Dewey retrieval and insertion into bibliographic records, in bulk mode as well as in single record mode, i.e. during cataloguing.
During the bulk process, Dewey classification was added to about 20,000 records, retrieving it from OCLC, Library of Congress and some national libraries, up to 7 external sources.
The single record mode was integrated into the Koha ILS, to make easier to assign Dewey classification during cataloguing.
This document discusses adding browse functionality to the Koha integrated library system using Apache Solr. Key points include:
- The PUSC Library wants to add browse to Koha to help users navigate subjects, authors, and related headings as their Aleph and Amicus systems previously supported this.
- Solr is proposed as the engine to power browse due to its flexibility, performance, and potential future integration into Koha to replace the current search tool.
- A process is outlined for loading authority and bibliographic records into a Solr database, synchronizing it with Koha, and querying it to power browse lists within the Koha OPAC.
- Statistics, security, licensing, and
The cataloguing module in Koha allows librarians to add new bibliographic records through data entry or copy cataloguing. It provides frameworks to catalog different materials like books, e-resources, serials, and periodicals. The book framework includes fields like title, author, ISBN, and subject headings to catalog books. Librarians can save records, add barcodes/accession numbers, and create duplicate or multiple copies of items. The module allows viewing catalogued items in normal or MARC format.
The document discusses Apache Solr, an open source search platform. It provides an overview of Solr, including its history and architecture. It also discusses how to set up a basic two shard Solr cluster with replicas and how Solr's schema works in a distributed environment. Lastly, it covers how to integrate Solr with other projects like Lucene, Zookeeper, Nutch, Mahout, Hadoop and ManifoldCF.
Drupal & Summon: Keeping Article Discovery in the LibraryKen Varnum
How building a Drupal module to bring Summon's article discovery system into our web site increased article searching, decreased direct database use, and maintained context for the library's patrons.
This document discusses shared libraries and dynamic loading in Linux. It begins by explaining how object files are created from source code by compilers and contain machine code, symbols, and other metadata. Libraries are collections of object files that are linked together by linkers. Static libraries copy object code into executables, while shared libraries delay linking until runtime using dynamic linkers. Shared libraries improve modularity and efficiency by loading code only once and sharing it between processes. The document then covers how dynamic linkers load shared libraries at runtime using functions like dlopen(), dlsym(), and dlclose(). It concludes by explaining how to create and link both static libraries and shared libraries in Linux.
This document provides a tutorial on using the Simple Object Access Protocol (SOAP) for communication between components. It introduces an example component, an HTML calendar widget, that can receive event listings and display a calendar of events. The tutorial defines an interface for the calendar widget using CORBA IDL and then demonstrates making a request to add an event listing using a SOAP HTTP request with an XML payload wrapped in a SOAP envelope and body. The SOAP request follows the defined interface by specifying the date and event description in the payload to add a new listing to the calendar widget.
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Stefano Bargioni
Relationship designators are used to specify the relationship between a resource and a person, family, or corporate body associated with that resource. This presentation shows how they were added to the catalog of the library of the Pontificia Università della Santa Croce, in new and -mostly automatic- to legacy records. The Name Cloud, a way to navigate the catalog through related authors, is also shown.
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Stefano Bargioni
Relationship designators are used to specify the relationship between a resource and a person, family, or corporate body associated with that resource. This presentation shows how they were added to the catalog of the library of the Pontificia Università della Santa Croce, in new and -mostly automatic- in legacy records. The Name Cloud, a way to navigate the catalog through related authors, is also shown.
Intervento al convegno "METODI SCELTE STRUMENTI : IL NUOVO CATALOGO DELLA RETE URBS", 11 giugno 2015. Video at https://www.youtube.com/watch?v=gK3_6NKJMzM
Publication cover management in a library system (slides)Stefano Bargioni
Book covers can be stored in a Library Management System. This work -presented at 33rd ADLUG meeting in Piazza Armerina, oct 2014- discusses pros and cons, and how to collect book covers during cataloguing or circulation operations.
Catalog enrichment: importing Dewey Decimal Classification from external sour...Stefano Bargioni
Usually, important catalogs are accessed for copy-cataloguing whole records. It is possible to retrieve "atomic" information too, using unique keys like ISBN.
Library at Pontificia Università della S. Croce developed a tool that allows Dewey retrieval and insertion into bibliographic records, in bulk mode as well as in single record mode, i.e. during cataloguing.
During the bulk process, Dewey classification was added to about 20,000 records, retrieving it from OCLC, Library of Congress and some national libraries, up to 7 external sources.
The single record mode was integrated into the Koha ILS, to make easier to assign Dewey classification during cataloguing.
Catalog enrichment: importing Dewey Decimal Classification from external sour...
Adding browse to Koha using Solr
1. KohaCon12 – Edinburgh, June 5th, 2012
Adding browse to Koha using Solr
Stefano Bargioni
Pontifical University Santa Croce – Rome
Slide 1
It's very exciting for me to take part to the Koha Conference for the first time. Thanks a lot to the
Community for everything I learnt during these days.
Slide The PUSC Library
Basic data about my library are resumed in this slide. We are very young, since my university was
founded only 26 years ago. It was inspired by Saint Josemaría Escrivá, founder of Opus Dei.
Twenty years ago we participated in the foundation of a consortium, URBE, the Roman Union of
Ecclesiastical Libraries.
Slide Why we need browse at PUSC?
The idea of alphabetically sorted lists of headings (authors, titles, series, subjects and so on) is
implemented in some LMS like another kind of search. We think it is not a “must”, thanks to the
power of simple and advanced searches. However, our users and the typology of our data suggested
us to add it to Koha.
Starting from Koha, our catalog experienced a strong increase in quality: we added full authority
records (we had only cross-references), and we started introducing subject headings. This is why we
are interested in browsing headings, coming from authority records as well as bibliographic records.
Slide How do you say?
Ancient authors, Popes, institutions, and other kind of authors, also due to the cataloguing rules
adopted by the library, can generate the needing of helping users and cataloguers to choose the
correct form for searching the catalog.
In the Virtual International Authority File, Dante Alighieri, who wrote the famous Divine Comedy,
has hundreds of varying forms. Which is the chosen form in your library?
Slide Grouping
Clustering and counting headings is another reason to use browse: it is interesting for managing and
searching series, looking at your catalog using Dewey, and so on.
Slide Browse Functionalities
What do you may ask to a browse tool? Basically, to navigate alphabetically sorted lists. So you
will need to extract headings from your catalog, build a sort form, and add information like, first of
all, usage count.
1/4
2. Slide Browse requirements
We tried to write a utility with the following requirements. The most important, maybe, is the ability
to include in the same list headings coming from different tags, from authority or bibliographic
records.
If I'm not wrong, its implementation is independent from the MARC flavour.
Slide The engine
We tried using Zebra, but it is very difficult for me to configure.
We considered MySQL, but SQL dbms do not have good performances when required to extract a
little subset of sorted records from a very large set of headings.
Solr was our choice as the search engine, due to its ability to work with facets. And its future
integration in Koha could be a win-win for the browse.
Slide The Solr document (1)
Solr works using document as a metaphor. Every heading we are interested in include in a list, will
be a Solr document.
In the Solr schema we defined some fields that we are going to discuss now in some slides.
The most important field is the ID. Since we can have identical sort forms under the same list, we
cannot use the sort form in the ID. For example, we need to distinguish title The Bible from title
Bible, even if their sort form is the same, due to the non-filing characters that strip out the initial
article.
The ID is of course the way used by Solr to delete or replace a document. It will be discussed in
detail afterwards.
Every document belongs to a list, it comes from authority or bibliographic records, from a tag and
from an occurrence of the tag. It also has a type: it can be a main heading, a see from, see also, and
so on.
It is unuseful in the Solr document to store information about subfields used to extract information.
Many times, every subfield will be extracted, but in other cases we only need some of them. The
configuration file will reflect this.
Slide The Solr document (2)
Here is an example of Solr document for the main author Dante Alighieri. Please note its ID.
Slide The Solr document (3)
And this is an example of Solr document for a title. Titles rather than uniform titles are not from
authority records. They will always have type 'acc', that is 'main'. Also note the ID.
Slide The Solr document (4)
The ID has a complex structure: we built it using a concatenation of list name, “a” for authority or
“b” for bibliographic, the authid or the biblionumber, the tag, the zero based occurrence number.
We think this is a unique identifier. If no, only the last heading with the same ID entered in Solr will
survive, leading to a silent error.
2/4
3. Slide The Solr document (5)
This screen shows the algorithm we use to build the sort form.
Maybe there is a better way to generate sort forms, taking into account that Koha is used in many
languages and in the same catalog there can be more than one script. Is International Components
for Unicode, aka ICU, the solution? I'm not so experienced... sorry.
Slide Architecture
The architecture is simple: a Solr db is updated with new or modified Koha records.
At the same time, users access the Solr db through the web and a Perl CGI.
Slide Loading & Synchronizing (1)
An important component of browse is the loader. We wrote it in Perl, with the ability to run for the
initial bulk loader as well as the updater.
It connects to Koha SQL tables in reading and adds or updates Solr documents.
The experience with Solr suggested us to issue commit and optimize commands on a regular basis,
to avoid memory consumption and ensure the fastest load. These parameters can vary depending on
the server running Solr.
Slide Loading & Synchronizing (2)
The configuration of the loader can be a large file. I chose XML but I know that the Koha developer
Community prefers YAML. Sorry.
It contains two main sections, one that gathers tag coming from authority records, the second one
for records coming from bibliographic records.
Here are two examples: on the left side, MARC21 authority tag 400 is sent to the list of authors,
type see. Every subfield will be copied. Suffix will ensure that the heading will end with the
specified string.
The example on the right side refers to a MARC21 bibliographic tag 245, i.e. a title. The
skip_indicator contains the number of the indicator where the skip in filing value is contained.
More preferences are available for each tag, like required_subfields and omit_subfields. They allow
to process tags with a higher level of detail.
Slide Loading & Synchronizing (3)
Solr db also contains some special documents, whose type is “system”. Two timestamps register the
start and the end of the update process, while each list has a counter to monitor its usage.
Four MySQL tables are involved. One of them, deleted_auth_header, is new. Whenever an authority
record is deleted, a slightly modified C4::AuthoritiesMarc.pm logs the event in this table.
The synchronizing process runs as a cron job. We chose to run it once a minute. A lock file ensures
that only one instance is running at the same time.
Slide Querying (1)
To access lists, we created a new page in Koha, with a link near the “Advanced Search”. The
screenshot shows public lists, the starting from text field and the number of results per page
3/4
4. available.
This page is generated by a a CGI Perl script.
Slide Querying (2)
When listing 5 authors starting from Alighieri, we obtain this result. Each heading can be clicked to
access related documents, whose count is the number in the 3rd column. See also and Used for
headings, if any, are listed in the 4th column.
The red link, available only for authors, starts a search on the rich VIAF catalog. Due to its
completeness, very often we obtain a successful result. Of course, more links could be added, for
instance to the Wikipedia Biography Portal.
The count usage is performed on the fly. It is not stored in the Solr db. For headings coming from
authorities, this ensures that pressing the author name, will show the exact number of bibliographic
records even if the synchronization is not running.
Slide Querying (3)
When listing titles, the result page contains titles from many tags, including series titles, even if we
have a list with only series titles. To set apart series titles, we added a special gray label.
The usage count for headings that comes from bibliographic records is performed by Solr facets. In
fact, there will be for instance seventeen Solr documents (see the last line) with the same sort form
in the titles list.
Slide Statistics
A special button for statistics is available. It shows fresh counts for each list, as well as the search
counts (not shown here). A good way to monitor the Solr browse db.
Slide Security
Solr interaction is driven by http requests. In a standard installation, anybody could access
documents. It is very dangerous.
There are many ways to solve this issue. We chose to manage security setting a Jetty username and
password. Jetty is the application server included in the Solr standard distribution.
Slide License and portability
This implementation of browse is open sourced with the same license of Koha.
However it is not published yet. It requires more work to become a standard Koha tool, since the
manufacturer is not a Koha developer, is an abecedarian. I know that Claire Hernandez of BibLibre
has a lot of experience in Solr. I would be happy to share the source code with her.
Slide Grazie
Thank you very much to the Koha Community, now in Scotland!
4/4