SlideShare a Scribd company logo
1 of 46
Bringing Solr to Drupal 1 A General and a Library-Specific Use CaseKirály PétereXtensible Catalog
Two ways of using Solr in Drupal 2 General solution: Apache Solr Search Integration and related modules: Stats, Autocomplete, Multisite, Ajax, Biblio, Attachments, Übercart (e-commerce integration), Views, Multilingual, Geospatial and many othersFits for the overall needs, uses predefined fields Library specific solution: eXtensible Catalog modulesFits for library needs, uses dynamic fields
Part 1 – the general solution 3 General solution: Apache Solr Search Integration and related modules: Stats, Autocomplete, Multisite, Ajax, Biblio, Attachments, Übercart (e-commerce integration), Views, Multilingual, Geospatial and many othersFits for the overall needs, uses predefined fields Library specific solution: eXtensible Catalog modulesFits for library needs, uses dynamic fields This part of the presentation is based on the works and previous presentations of Robert Douglass.
Drupal architecture 4 Relational databasefor storage Solr index for search
Purposes 5 Creating a general insfrastructure which is usable in every Drupal installation Core module, and additional module for covering specific Solr functionalities (statistics, autocomplete etc.) Replace the original (and still default) MySQL-based search feature
6 sort Facet 1 Facet 2
7 List of modules page sort module search API version categories
Advanced search: issues 8
Whitehouse.gov on Drupal & Solr 9 sort facets
Boosting by Drupal specific properties 10
Boosting and ignoring by document type 11
Boosting by fields/ HTML tags 12
More like this implementation 13
Solrin Views integration 14 Views is a very popular module, helps creating interactive DB queries andresult pages. Now it can handle Apache Solr as data source.
Part of the Views admin page 15 You can specify fields, sorting, filters, layout, arguments, behaviours and more
Using Tika: file search 16
Indexing/searching multiple sites 17
Search in comments 18
CCK date searching 19 Content Construction Kit: popular module to create document and field types. CCK date is a special field type handling dates.
statistics 20 impressive numbers – that’s why welove Solr…
Statistics of facet usage 21
autocomplete 22
Future plans 23 Crawling with Nutch Geospatial search eDismax (Solr 1.5) Drupal 7 API changes Improving documentation
People behind these modules 24 Robert Douglass(DE) http://drupal.org/user/5449 Alejandro Garza (MX) http://drupal.org/user/153120 Peter Wolanin (US) http://drupal.org/user/49851 James McKinney (CA) http://drupal.org/user/472460 Scott Reynolds (US) http://drupal.org/user/60009 Mike O'Connor (US) http://drupal.org/user/104525 Markus Kalkbrenner (DE) http://drupal.org/user/124705 and others…
Links 25 apachesolr http://drupal.org/project/apachesolr (this is the best starting point) content recommendation patchhttp://drupal.org/node/372767 views integration http://drupal.org/project/apachesolr_views, http://acquia.com/node/911667 file search http://drupal.org/project/apachesolr_attachments, http://acquia.com/node/1129446 date facet for CCK fieldhttp://drupal.org/node/558160 statisticshttp://drupal.org/project/apachesolr_stats multisitehttp://drupal.org/project/apachesolr_multisitesearch autocompletehttp://drupal.org/project/apachesolr_autocomplete
Part 2 – Library specific solution 26 General solution: Apache Solr Search Integration and related modules: Stats, Autocomplete, Multisite, Ajax, Biblio, Attachments, Übercart (e-commerce integration), Views, Multilingual, Geospatial and many othersFits for the overall needs, uses predefined fields Library specific solution: eXtensible Catalog modulesFits for library needs, uses dynamic fields
About eXtensible Catalog 27 a project creating an open source next generation library ‘discovery interface’ and an FRBR-based metadata platform started in 2007 driven by new theories of library science, cultural anthropology and the practice of web 2.0, library 2.0 Universities of Rochester, Notre Dame, Cornell, North Carolina at Charlotte, Rochester Institute of Technology and CARLI consortium
Architecture 28 Drupal CMS MARC Normalization DCNormalization XC DrupalToolkit Transformation Aggregation  XC Metadata Services Toolkit circulation data XC NCIPToolkit XC OAI Toolkit Integrated Library System Repository
Purposes of XC Drupal Toolkit 29 integrate library data into a popular content management system customizable functionalities customizable interface(s) internationalization, localization 5000+ custom modules, 20+ library specific modules wide range on mashup options all features are available through user interfaces
Search results 30 bibliographical data cover images highlighted terms facets availability information
Customized interface (Kyushu University) 31
Similar documents 32
XML attribute handling 33 subject=„History” and subject_OCLC=„History”subject_OCLC=„History”subject=„History” and subject_type=„OCLC”none <subject type="OCLC">History</subject> could be indexed as…
Mapping schema fields to Solr types 34
Setup a facet 35 Aggregating values ofdifferent fields into onefacet specifySolr type custom PHP code to modify field values(conditions)
Custom PHP code for displaying title 36
Getting records into Drupal: OAI harvesting 37 List of scheduled harvests Harvest is running
Data flow between components 38 OAI-PMHprovider Drupal batch delete/insert documents creating nodes MySQL Solr
Creating a ‘more like this’ parameter set 39 Saving parameters for ‘More like this’ functionality
Creating highlighter 40 wrapper around the highlighter’s parameters
Setting up field and date facet properties 41 Date facet properties Field facet properties
Putting facets together: ‘facet group’ 42 General properties List of facets, and their type
Reordering facets 43 Just drag and drop You haven’t  saved changes!
Using facet term list in search form 44 dropdown fullfilled withlanguagefacet terms dropdown definition
Adding widgets to UI: navigation bar 45 definition of navigation bar navigation bar in action
Links 46 Project page http://eXtensibleCatalog.org XC Drupal Toolkit http://drupal.org/project/xc Metadata Services Toolkit http://code.google.com/p/xcmetadataservicestoolkit OAI Toolkit http://code.google.com/p/xcoaitoolkit NCIP Toolkit http://code.google.com/p/xcnciptoolkit Developers: Mlen-Too Wesley(GH) http://drupal.org/user/318924 Király Péter (H) http://drupal.org/user/352587, http://twitter.com/kiru

More Related Content

What's hot

Splitgraph: Docker for Data
Splitgraph: Docker for DataSplitgraph: Docker for Data
Splitgraph: Docker for DataSplitgraph
 
Data access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery PortalData access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery PortalGasperi Jerome
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Chengjen Lee
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebookAshwini Mathur
 
Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Mark Tabladillo
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataFelix Ostrowski
 
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityIEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityThamme Gowda
 
Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkThamme Gowda
 

What's hot (11)

Lighty
LightyLighty
Lighty
 
Splitgraph: Docker for Data
Splitgraph: Docker for DataSplitgraph: Docker for Data
Splitgraph: Docker for Data
 
Data access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery PortalData access and data extraction services within the Land Imagery Portal
Data access and data extraction services within the Land Imagery Portal
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109
 
Qlikview online training
Qlikview online trainingQlikview online training
Qlikview online training
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebook
 
Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106
 
Regal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic DataRegal - a Repository for Electronic Documents and Bibliographic Data
Regal - a Repository for Electronic Documents and Bibliographic Data
 
4
44
4
 
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityIEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
 
Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache Spark
 

Viewers also liked

Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Péter Király
 
The eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal ToolkitThe eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal ToolkitPéter Király
 
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your SiteDrupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Sitenyccamp
 
Drupal solr
Drupal solrDrupal solr
Drupal solrHen Chen
 
內容管理系統 - Drupal入門
內容管理系統 - Drupal入門內容管理系統 - Drupal入門
內容管理系統 - Drupal入門Hen Chen
 
Metadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionMetadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionPéter Király
 
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Péter Király
 

Viewers also liked (7)

Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)
 
The eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal ToolkitThe eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal Toolkit
 
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your SiteDrupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
 
Drupal solr
Drupal solrDrupal solr
Drupal solr
 
內容管理系統 - Drupal入門
內容管理系統 - Drupal入門內容管理系統 - Drupal入門
內容管理系統 - Drupal入門
 
Metadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full versionMetadata Quality Assurance Framework at QQML2016 conference - full version
Metadata Quality Assurance Framework at QQML2016 conference - full version
 
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
 

Similar to Solr in Drupal

Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...jaxLondonConference
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill MapR Technologies
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and SparkLucidworks
 
Publishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHubPublishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHubGlobus
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksLucidworks
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapDr. Mirko Kämpf
 
Things Made Easy: One Click CMS Integration with Solr & Drupal
Things Made Easy: One Click CMS Integration with Solr & DrupalThings Made Easy: One Click CMS Integration with Solr & Drupal
Things Made Easy: One Click CMS Integration with Solr & Drupallucenerevolution
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
Softwae and database in data communication network
Softwae and database in data communication networkSoftwae and database in data communication network
Softwae and database in data communication networkAyoubSohiabMohammad
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Easy In, Easy Out: Customizing Your Open Source Publishing Software
Easy In, Easy Out: Customizing Your Open Source Publishing SoftwareEasy In, Easy Out: Customizing Your Open Source Publishing Software
Easy In, Easy Out: Customizing Your Open Source Publishing SoftwareNina McHale
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Polyglot metadata for Hadoop
Polyglot metadata for HadoopPolyglot metadata for Hadoop
Polyglot metadata for HadoopJim Dowling
 
OpenLayers for Drupal: The 10,000 Foot View
OpenLayers for Drupal: The 10,000 Foot ViewOpenLayers for Drupal: The 10,000 Foot View
OpenLayers for Drupal: The 10,000 Foot ViewRobert Bates
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science PlatformQAware GmbH
 

Similar to Solr in Drupal (20)

Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Publishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHubPublishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHub
 
Syllabus.pdf
Syllabus.pdfSyllabus.pdf
Syllabus.pdf
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road mapEtosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
 
Things Made Easy: One Click CMS Integration with Solr & Drupal
Things Made Easy: One Click CMS Integration with Solr & DrupalThings Made Easy: One Click CMS Integration with Solr & Drupal
Things Made Easy: One Click CMS Integration with Solr & Drupal
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
Softwae and database in data communication network
Softwae and database in data communication networkSoftwae and database in data communication network
Softwae and database in data communication network
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Easy In, Easy Out: Customizing Your Open Source Publishing Software
Easy In, Easy Out: Customizing Your Open Source Publishing SoftwareEasy In, Easy Out: Customizing Your Open Source Publishing Software
Easy In, Easy Out: Customizing Your Open Source Publishing Software
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Polyglot metadata for Hadoop
Polyglot metadata for HadoopPolyglot metadata for Hadoop
Polyglot metadata for Hadoop
 
OpenLayers for Drupal: The 10,000 Foot View
OpenLayers for Drupal: The 10,000 Foot ViewOpenLayers for Drupal: The 10,000 Foot View
OpenLayers for Drupal: The 10,000 Foot View
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
 

More from Péter Király

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Péter Király
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Péter Király
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Péter Király
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Péter Király
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)Péter Király
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Péter Király
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Péter Király
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Péter Király
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Péter Király
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Péter Király
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Péter Király
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)Péter Király
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)Péter Király
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Péter Király
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Péter Király
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Péter Király
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Péter Király
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Péter Király
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Péter Király
 

More from Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 

Solr in Drupal

  • 1. Bringing Solr to Drupal 1 A General and a Library-Specific Use CaseKirály PétereXtensible Catalog
  • 2. Two ways of using Solr in Drupal 2 General solution: Apache Solr Search Integration and related modules: Stats, Autocomplete, Multisite, Ajax, Biblio, Attachments, Übercart (e-commerce integration), Views, Multilingual, Geospatial and many othersFits for the overall needs, uses predefined fields Library specific solution: eXtensible Catalog modulesFits for library needs, uses dynamic fields
  • 3. Part 1 – the general solution 3 General solution: Apache Solr Search Integration and related modules: Stats, Autocomplete, Multisite, Ajax, Biblio, Attachments, Übercart (e-commerce integration), Views, Multilingual, Geospatial and many othersFits for the overall needs, uses predefined fields Library specific solution: eXtensible Catalog modulesFits for library needs, uses dynamic fields This part of the presentation is based on the works and previous presentations of Robert Douglass.
  • 4. Drupal architecture 4 Relational databasefor storage Solr index for search
  • 5. Purposes 5 Creating a general insfrastructure which is usable in every Drupal installation Core module, and additional module for covering specific Solr functionalities (statistics, autocomplete etc.) Replace the original (and still default) MySQL-based search feature
  • 6. 6 sort Facet 1 Facet 2
  • 7. 7 List of modules page sort module search API version categories
  • 9. Whitehouse.gov on Drupal & Solr 9 sort facets
  • 10. Boosting by Drupal specific properties 10
  • 11. Boosting and ignoring by document type 11
  • 12. Boosting by fields/ HTML tags 12
  • 13. More like this implementation 13
  • 14. Solrin Views integration 14 Views is a very popular module, helps creating interactive DB queries andresult pages. Now it can handle Apache Solr as data source.
  • 15. Part of the Views admin page 15 You can specify fields, sorting, filters, layout, arguments, behaviours and more
  • 16. Using Tika: file search 16
  • 19. CCK date searching 19 Content Construction Kit: popular module to create document and field types. CCK date is a special field type handling dates.
  • 20. statistics 20 impressive numbers – that’s why welove Solr…
  • 23. Future plans 23 Crawling with Nutch Geospatial search eDismax (Solr 1.5) Drupal 7 API changes Improving documentation
  • 24. People behind these modules 24 Robert Douglass(DE) http://drupal.org/user/5449 Alejandro Garza (MX) http://drupal.org/user/153120 Peter Wolanin (US) http://drupal.org/user/49851 James McKinney (CA) http://drupal.org/user/472460 Scott Reynolds (US) http://drupal.org/user/60009 Mike O'Connor (US) http://drupal.org/user/104525 Markus Kalkbrenner (DE) http://drupal.org/user/124705 and others…
  • 25. Links 25 apachesolr http://drupal.org/project/apachesolr (this is the best starting point) content recommendation patchhttp://drupal.org/node/372767 views integration http://drupal.org/project/apachesolr_views, http://acquia.com/node/911667 file search http://drupal.org/project/apachesolr_attachments, http://acquia.com/node/1129446 date facet for CCK fieldhttp://drupal.org/node/558160 statisticshttp://drupal.org/project/apachesolr_stats multisitehttp://drupal.org/project/apachesolr_multisitesearch autocompletehttp://drupal.org/project/apachesolr_autocomplete
  • 26. Part 2 – Library specific solution 26 General solution: Apache Solr Search Integration and related modules: Stats, Autocomplete, Multisite, Ajax, Biblio, Attachments, Übercart (e-commerce integration), Views, Multilingual, Geospatial and many othersFits for the overall needs, uses predefined fields Library specific solution: eXtensible Catalog modulesFits for library needs, uses dynamic fields
  • 27. About eXtensible Catalog 27 a project creating an open source next generation library ‘discovery interface’ and an FRBR-based metadata platform started in 2007 driven by new theories of library science, cultural anthropology and the practice of web 2.0, library 2.0 Universities of Rochester, Notre Dame, Cornell, North Carolina at Charlotte, Rochester Institute of Technology and CARLI consortium
  • 28. Architecture 28 Drupal CMS MARC Normalization DCNormalization XC DrupalToolkit Transformation Aggregation XC Metadata Services Toolkit circulation data XC NCIPToolkit XC OAI Toolkit Integrated Library System Repository
  • 29. Purposes of XC Drupal Toolkit 29 integrate library data into a popular content management system customizable functionalities customizable interface(s) internationalization, localization 5000+ custom modules, 20+ library specific modules wide range on mashup options all features are available through user interfaces
  • 30. Search results 30 bibliographical data cover images highlighted terms facets availability information
  • 33. XML attribute handling 33 subject=„History” and subject_OCLC=„History”subject_OCLC=„History”subject=„History” and subject_type=„OCLC”none <subject type="OCLC">History</subject> could be indexed as…
  • 34. Mapping schema fields to Solr types 34
  • 35. Setup a facet 35 Aggregating values ofdifferent fields into onefacet specifySolr type custom PHP code to modify field values(conditions)
  • 36. Custom PHP code for displaying title 36
  • 37. Getting records into Drupal: OAI harvesting 37 List of scheduled harvests Harvest is running
  • 38. Data flow between components 38 OAI-PMHprovider Drupal batch delete/insert documents creating nodes MySQL Solr
  • 39. Creating a ‘more like this’ parameter set 39 Saving parameters for ‘More like this’ functionality
  • 40. Creating highlighter 40 wrapper around the highlighter’s parameters
  • 41. Setting up field and date facet properties 41 Date facet properties Field facet properties
  • 42. Putting facets together: ‘facet group’ 42 General properties List of facets, and their type
  • 43. Reordering facets 43 Just drag and drop You haven’t saved changes!
  • 44. Using facet term list in search form 44 dropdown fullfilled withlanguagefacet terms dropdown definition
  • 45. Adding widgets to UI: navigation bar 45 definition of navigation bar navigation bar in action
  • 46. Links 46 Project page http://eXtensibleCatalog.org XC Drupal Toolkit http://drupal.org/project/xc Metadata Services Toolkit http://code.google.com/p/xcmetadataservicestoolkit OAI Toolkit http://code.google.com/p/xcoaitoolkit NCIP Toolkit http://code.google.com/p/xcnciptoolkit Developers: Mlen-Too Wesley(GH) http://drupal.org/user/318924 Király Péter (H) http://drupal.org/user/352587, http://twitter.com/kiru

Editor's Notes

  1. We can see lots of search parameters: status, priority, component etc.
  2. To modify relevancy values, you can map field boosting values to different Drupal features: whether a node is promoted to the front page, or is sticky, the number of comments, recently commented nodes etc.
  3. You can boost or diminish the ranking of individual content types, or exclude content types from being indexed altogether.
  4. Dries Buytaert is the creator of Drupal, now the head of Acquia.com
  5. Here you can see how many searches from a music site are being filtered by genre or instrumentation.