The document discusses integrating Solr into Hippo for search capabilities. It outlines problems with the current search architecture, objectives of improving search by integrating Solr, and how ContentBeans can be indexed using annotations to integrate them with Solr search results. Key points include fixing current problems, making search customizable, scalable, and document-oriented through Solr integration and indexing ContentBeans.
Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr.
This document summarizes a presentation about rapid prototyping with Solr. It discusses getting documents indexed into Solr quickly, adjusting Solr's schema to better match needs, and showcasing data in a flexible search UI. It outlines how to leverage faceting, highlighting, spellchecking and debugging in rapid prototyping. Finally, it discusses next steps in developing a search application and taking it to production.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Here is one way to build a custom search component that automatically selects facets based on the results:
1. Create a class that extends SearchComponent and implements the prepare and process methods.
2. In prepare, analyze the query and use Lucene's term vectors or other analysis to determine which fields are likely to provide useful facets. Add these fields to the response builder.
3. In process, after the normal query processing, generate facet counts for the fields added in prepare. Add the facet counts to the response.
4. Register the component in solrconfig.xml and configure it to run after the query and facet components.
Now facets will be automatically selected without needing to specify them in the request.
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
Based on user feedback, I discuss the most requested features for PostgreSQL, their implementation status, difficulties, blockers, and future plans. Items include replication, materialized views, parallel queries, in-place upgrade.
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Presented by Andrzej Bialecki, LucidWorks
This session presents a set of Solr components for easy management of "sidecar indexes" - indexes that extend the main index with additional stored and / or indexed fields. Conceptually this can be viewed as an extension of the ExternalFileField or as a static join between documents from two collections. This functionality is useful in applications that require very different update regimes for the two parts of the index (e.g. main catalogue items combined with clickthroughs).
Solr 4.0 dramatically improves scalability, performance, and flexibility. An overhauled Lucene underneath sports near real-time (NRT) capabilities allowing indexed documents to be rapidly visible and searchable. Lucene’s improvements also include pluggable scoring, much faster fuzzy and wildcard querying, and vastly improved memory usage. These Lucene improvements automatically make Solr much better, and Solr magnifies these advances with “SolrCloud.” SolrCloud enables highly available and fault tolerant clusters for large scale distributed indexing and searching. There are many other changes that will be surveyed as well. This talk will cover these improvements in detail, comparing and contrasting to previous versions of Solr.
This document summarizes a presentation about rapid prototyping with Solr. It discusses getting documents indexed into Solr quickly, adjusting Solr's schema to better match needs, and showcasing data in a flexible search UI. It outlines how to leverage faceting, highlighting, spellchecking and debugging in rapid prototyping. Finally, it discusses next steps in developing a search application and taking it to production.
Solr Recipes provides quick and easy steps for common use cases with Apache Solr. Bite-sized recipes will be presented for data ingestion, textual analysis, client integration, and each of Solr’s features including faceting, more-like-this, spell checking/suggest, and others.
Here is one way to build a custom search component that automatically selects facets based on the results:
1. Create a class that extends SearchComponent and implements the prepare and process methods.
2. In prepare, analyze the query and use Lucene's term vectors or other analysis to determine which fields are likely to provide useful facets. Add these fields to the response builder.
3. In process, after the normal query processing, generate facet counts for the fields added in prepare. Add the facet counts to the response.
4. Register the component in solrconfig.xml and configure it to run after the query and facet components.
Now facets will be automatically selected without needing to specify them in the request.
Lucene powers the search capabilities of practically all library discovery platforms, by way of Solr, etc. The Lucene project evolves rapidly, and it's a full-time job to keep up with the ever improving features and scalability. This talk will distill and showcase the most relevant(!) advancements to date.
Based on user feedback, I discuss the most requested features for PostgreSQL, their implementation status, difficulties, blockers, and future plans. Items include replication, materialized views, parallel queries, in-place upgrade.
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
code4lib 2011 preconference, presented by Erik Hatcher of Lucid Imagination.
Abstract: The library world is fired up about Solr. Practically every next-gen catalog is using it (via Blacklight, VuFind, or other technologies). Solr has continued improving in some dramatic ways, including geospatial support, field collapsing/grouping, extended dismax query parsing, pivot/grid/matrix/tree faceting, autosuggest, and more. This session will cover all of these new features, showcasing live examples of them all, including anything new that is implemented prior to the conference.
Presented by Andrzej Bialecki, LucidWorks
This session presents a set of Solr components for easy management of "sidecar indexes" - indexes that extend the main index with additional stored and / or indexed fields. Conceptually this can be viewed as an extension of the ExternalFileField or as a static join between documents from two collections. This functionality is useful in applications that require very different update regimes for the two parts of the index (e.g. main catalogue items combined with clickthroughs).
Apache Solr is an open-source enterprise search platform built on Apache Lucene. It started as an in-house project at CNET for adding search functionality to their website and was donated to the Apache Software Foundation in 2006. Key features of Solr include faceted search, filtering, hit highlighting, dynamic clustering, database integration, and replication to support scalability.
This document compares Apache Solr 4.0 and ElasticSearch 0.19. It outlines their main features for searching, indexing, similarities, and provides references for further information. Key differences include ElasticSearch being better for real-time search applications while Solr is more mature. ElasticSearch also supports push queries and has a schema-free structure while Solr has more advanced search features like result grouping.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Elasticsearch and Apache Solr are both distributed search engines that provide full text search capabilities and real-time analytics on large volumes of data. The document compares their architectures, data models, query languages, and other features. Key differences include Elasticsearch having a more dynamic schema while Solr relies more on predefined schemas, and Elasticsearch natively supports features like nested objects and parent/child relationships that require additional configuration in Solr.
The document describes a presentation about rapidly prototyping with Solr. It will demonstrate ingesting documents into Solr, adjusting Solr's schema, and showcasing data in a flexible search UI. The presentation will cover faceting, highlighting, spellchecking, and debugging. Time will also be spent outlining next steps to develop and take the search application to production.
SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
This document provides an overview of the Elasticsearch search engine. It discusses that Elasticsearch is designed for the cloud and NoSQL generation. It is based on Apache Lucene and hides complexity with RESTful and JSON interfaces. Key points are that Elasticsearch is easy to get started with, scales horizontally by adding nodes, and is powerful with Lucene and parallel processing. The document also covers storing data as documents in types and indexes, and interacting with Elasticsearch via its REST API.
The document provides an overview and agenda for an Apache Solr crash course. It discusses topics such as information retrieval, inverted indexes, metrics for evaluating IR systems, Apache Lucene, the Lucene and Solr APIs, indexing, searching, querying, filtering, faceting, highlighting, spellchecking, geospatial search, and Solr architectures including single core, multi-core, replication, and sharding. It also provides tips on performance tuning, using plugins, and developing a Solr-based search engine.
An overview of ORDS for building RESTful Web Services and your Oracle Database with BEER examples!
Thanks and credit to the POUG organization for making this possible.
SE2016 - Java EE revisits design patterns 2016Alex Theedom
Design patterns are not only cool but represent the collective wisdom of many developers. Since the publication of Design Patterns: Elements of Reusable Object-Oriented Software by GoF many new concepts have extended the coverage of these design patterns, and now Java EE provides out-of-the box implementations of many of the most well known patterns. This talk will show how, by taking advantage of Java EE features such as CDI and the smart use of annotations, traditional design patterns can be implemented in a much cleaner and quicker way. Among the design patterns discuss there will be Singleton, Façade, Observer, Factory, Dependency Injection, Decorator and more.
Apache Solr is a popular, open source enterprise search platform built on the Java based search engine library Apache Lucene. It powers the search and navigation features of many of the world's largest companies like Netflix, Instagram, LinkedIn, Twitter and eBay, etc.
This document provides an introduction to search in Drupal 7 and discusses how to optimize search using Apache Solr. It begins with an overview of why search is important and the limitations of Drupal core search. It then covers how to install and configure Apache Solr for Drupal, including indexing the site, configuring search pages and blocks, and customizing Solr for better results. The document also discusses how to improve the user experience through facets, location-based searching, and customizing result displays. Exercises are provided to help implement many of the discussed optimizations.
This document provides an overview of Lucene and Solr. It introduces Erik Hatcher, who is a committer to Lucene and Solr projects and co-founder of Lucid Imagination, a company that provides commercial support for Lucene and Solr. It then provides brief descriptions of Lucene, its inverted index structure, segments and merging, and scoring. Finally, it discusses Solr architecture and some extension points for customizing Lucene and Solr functionality.
Solr Flair demonstrates the powerful user interfaces and interactions that can be built with Apache Solr. It shows examples leveraging features like suggest, instant search, spell checking, faceting, filtering, grouping, and clustering. These examples are presented with full code, configuration, and UI elements. A variety of technologies are used to build the UIs, including Solr, Lucene, jQuery, Velocity templating, and others. The presentation concludes by showing some live systems that have been built using these techniques.
Apache solr is an enterprise search engine. It facilitates indexing of large number of documents of any size and provides very robust search techniques. This ppt provides brief introduction of it.
Presentation at FOSSETCON 2015
http://www.fossetcon.org/2015/sessions/getting-started-solr-open-source-search-platform-0
Solr is a very popular open source search engine which builds upon the capabilities of Lucene. It's the perfect tool to index loads of text and make it easily searchable. And it's very fast!
Powerful features such as facets, typeahead, and "did you mean" help your users to quickly navigate through a very large dataset and find what they're looking for.
A REST-style JSON interface makes it language-agnostic, you can even work with it straight from the command line using curl!
A flexible plugin mechanism lets you augment your searches with complementary tools such as rich document parsing, text analysis, or your own custom code.
In this session, learn the basics of making your content searchable with Solr.
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
This talk describes how you can practically apply some of Lucene 4's new features (such as flexible indexing, scoring improvements, column-stride fields) to improve your search application.
The talk will give a brief description of these new features and some example use-cases, to address practical use cases you can try yourself in and around the new features now available in Lucene 4. We'll cover application of functions where you can configure Solr to:
Set up the schema to use Pulsing or Memory codec for a primary key field
Not use a separate spellcheck index, controlling character-level swaps from the query processor
Sorting with a different locale
Per-field similarity configurations, such as using a non-vector-space algorithm
1 1/2 years ago we have rolled out a new integrated full-text search engine for our Intranet based on Apache Solr. The search engine integrates various data sources such as file systems, wikis, internal websites and web applications, shared calendars, our corporate database, CRM system, email archive, task management and defect tracking etc. This talk is an experience report about some of the good things, the bad things and the surprising things we have encountered over two years of developing with, operating and using a Intranet search engine based on Apache Solr.
After setting the scene, we will discuss some interesting requirements that we have for our search engine and how we solved them with Apache Solr (or at least tried to solve). Using these concrete examples, we will discuss some interesting features and limitations of Apache Solr.
In the second part of the talk, we will tell a couple of "war stories" and walk through some interesting, annoying and surprising problems that we faced, how we analyzed the issues, identified the cause of the problems and eventually solved them.
The talk is aimed at software developers and architects with some basic knowledge about Apache Solr, the Apache Lucene project familiy or similar full-text search engines. It is not an introduction into Apache Solr and we will dive right into the interesting and juicy bits.
Presented by Ingo Renner, Software Engineer, Infield Design
TYPO3 is an Open Source Content Management System that is very popular in Europe, especially in the German market, and gaining traction in the U.S., too.
TYPO3 is a good example of how to integrate Solr with a CMS. The challenges we faced are typical of any CMS integration. We came up with solutions and ideas to these challenges and our hope is that they might be of help for other CMS integrations as well.
That includes content indexing, file indexing, keeping track of content changes, handling multi-language sites, search and facetting, access restrictions, result presentation, and how to keep all these things flexible and re-usable for many different sites.
For all these things we used a couple additional Apache projects and we would like to show how we use them and how we contributed back to them while building our Solr integration.
Hippo gettogether april 2012 faceted navigation a tale of daemonsHippo
Wouter Danes gave a presentation about using daemons to derive faceted navigation properties for documents in Hippo CMS. The standard derived data engine cannot populate facets from multiple related documents, so they used a daemon module running in the repository to dynamically derive the facet properties. Some lessons learned were that daemon modules need to be thread-safe and can cause issues with publishing if properties are derived incorrectly. Improving the derived data engine or waiting for SOLR integration were suggested to make faceting of related documents easier in Hippo CMS.
Apache Solr is an open-source enterprise search platform built on Apache Lucene. It started as an in-house project at CNET for adding search functionality to their website and was donated to the Apache Software Foundation in 2006. Key features of Solr include faceted search, filtering, hit highlighting, dynamic clustering, database integration, and replication to support scalability.
This document compares Apache Solr 4.0 and ElasticSearch 0.19. It outlines their main features for searching, indexing, similarities, and provides references for further information. Key differences include ElasticSearch being better for real-time search applications while Solr is more mature. ElasticSearch also supports push queries and has a schema-free structure while Solr has more advanced search features like result grouping.
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
ZendCon 2010 - Building Intelligent Search Applications with Apache Solr and PHP5. This is a presentation on how to create intelligent web-based search applications using PHP 5 and the out-of-the-box features available in Solr 1.4.1 After we finish we finish the illustration of adding, updating and removing data from the Solr index, we will discuss how to add features such as auto-completion, hit highlighting, faceted navigation, spelling suggestions etc
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
Elasticsearch and Apache Solr are both distributed search engines that provide full text search capabilities and real-time analytics on large volumes of data. The document compares their architectures, data models, query languages, and other features. Key differences include Elasticsearch having a more dynamic schema while Solr relies more on predefined schemas, and Elasticsearch natively supports features like nested objects and parent/child relationships that require additional configuration in Solr.
The document describes a presentation about rapidly prototyping with Solr. It will demonstrate ingesting documents into Solr, adjusting Solr's schema, and showcasing data in a flexible search UI. The presentation will cover faceting, highlighting, spellchecking, and debugging. Time will also be spent outlining next steps to develop and take the search application to production.
SolrTM is the popular, blazing fast open Source Enterprise search platform from the Apache LuceneTM project. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites like (Aol, Yahoo, Buy.com, Cnet, CitySearch, Netflix, Zappos, Stubhub!, digg, eTrade, Disney, Apple, NASA and MTV).
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
This document provides an overview of the Elasticsearch search engine. It discusses that Elasticsearch is designed for the cloud and NoSQL generation. It is based on Apache Lucene and hides complexity with RESTful and JSON interfaces. Key points are that Elasticsearch is easy to get started with, scales horizontally by adding nodes, and is powerful with Lucene and parallel processing. The document also covers storing data as documents in types and indexes, and interacting with Elasticsearch via its REST API.
The document provides an overview and agenda for an Apache Solr crash course. It discusses topics such as information retrieval, inverted indexes, metrics for evaluating IR systems, Apache Lucene, the Lucene and Solr APIs, indexing, searching, querying, filtering, faceting, highlighting, spellchecking, geospatial search, and Solr architectures including single core, multi-core, replication, and sharding. It also provides tips on performance tuning, using plugins, and developing a Solr-based search engine.
An overview of ORDS for building RESTful Web Services and your Oracle Database with BEER examples!
Thanks and credit to the POUG organization for making this possible.
SE2016 - Java EE revisits design patterns 2016Alex Theedom
Design patterns are not only cool but represent the collective wisdom of many developers. Since the publication of Design Patterns: Elements of Reusable Object-Oriented Software by GoF many new concepts have extended the coverage of these design patterns, and now Java EE provides out-of-the box implementations of many of the most well known patterns. This talk will show how, by taking advantage of Java EE features such as CDI and the smart use of annotations, traditional design patterns can be implemented in a much cleaner and quicker way. Among the design patterns discuss there will be Singleton, Façade, Observer, Factory, Dependency Injection, Decorator and more.
Apache Solr is a popular, open source enterprise search platform built on the Java based search engine library Apache Lucene. It powers the search and navigation features of many of the world's largest companies like Netflix, Instagram, LinkedIn, Twitter and eBay, etc.
This document provides an introduction to search in Drupal 7 and discusses how to optimize search using Apache Solr. It begins with an overview of why search is important and the limitations of Drupal core search. It then covers how to install and configure Apache Solr for Drupal, including indexing the site, configuring search pages and blocks, and customizing Solr for better results. The document also discusses how to improve the user experience through facets, location-based searching, and customizing result displays. Exercises are provided to help implement many of the discussed optimizations.
This document provides an overview of Lucene and Solr. It introduces Erik Hatcher, who is a committer to Lucene and Solr projects and co-founder of Lucid Imagination, a company that provides commercial support for Lucene and Solr. It then provides brief descriptions of Lucene, its inverted index structure, segments and merging, and scoring. Finally, it discusses Solr architecture and some extension points for customizing Lucene and Solr functionality.
Solr Flair demonstrates the powerful user interfaces and interactions that can be built with Apache Solr. It shows examples leveraging features like suggest, instant search, spell checking, faceting, filtering, grouping, and clustering. These examples are presented with full code, configuration, and UI elements. A variety of technologies are used to build the UIs, including Solr, Lucene, jQuery, Velocity templating, and others. The presentation concludes by showing some live systems that have been built using these techniques.
Apache solr is an enterprise search engine. It facilitates indexing of large number of documents of any size and provides very robust search techniques. This ppt provides brief introduction of it.
Presentation at FOSSETCON 2015
http://www.fossetcon.org/2015/sessions/getting-started-solr-open-source-search-platform-0
Solr is a very popular open source search engine which builds upon the capabilities of Lucene. It's the perfect tool to index loads of text and make it easily searchable. And it's very fast!
Powerful features such as facets, typeahead, and "did you mean" help your users to quickly navigate through a very large dataset and find what they're looking for.
A REST-style JSON interface makes it language-agnostic, you can even work with it straight from the command line using curl!
A flexible plugin mechanism lets you augment your searches with complementary tools such as rich document parsing, text analysis, or your own custom code.
In this session, learn the basics of making your content searchable with Solr.
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
This talk describes how you can practically apply some of Lucene 4's new features (such as flexible indexing, scoring improvements, column-stride fields) to improve your search application.
The talk will give a brief description of these new features and some example use-cases, to address practical use cases you can try yourself in and around the new features now available in Lucene 4. We'll cover application of functions where you can configure Solr to:
Set up the schema to use Pulsing or Memory codec for a primary key field
Not use a separate spellcheck index, controlling character-level swaps from the query processor
Sorting with a different locale
Per-field similarity configurations, such as using a non-vector-space algorithm
1 1/2 years ago we have rolled out a new integrated full-text search engine for our Intranet based on Apache Solr. The search engine integrates various data sources such as file systems, wikis, internal websites and web applications, shared calendars, our corporate database, CRM system, email archive, task management and defect tracking etc. This talk is an experience report about some of the good things, the bad things and the surprising things we have encountered over two years of developing with, operating and using a Intranet search engine based on Apache Solr.
After setting the scene, we will discuss some interesting requirements that we have for our search engine and how we solved them with Apache Solr (or at least tried to solve). Using these concrete examples, we will discuss some interesting features and limitations of Apache Solr.
In the second part of the talk, we will tell a couple of "war stories" and walk through some interesting, annoying and surprising problems that we faced, how we analyzed the issues, identified the cause of the problems and eventually solved them.
The talk is aimed at software developers and architects with some basic knowledge about Apache Solr, the Apache Lucene project familiy or similar full-text search engines. It is not an introduction into Apache Solr and we will dive right into the interesting and juicy bits.
Presented by Ingo Renner, Software Engineer, Infield Design
TYPO3 is an Open Source Content Management System that is very popular in Europe, especially in the German market, and gaining traction in the U.S., too.
TYPO3 is a good example of how to integrate Solr with a CMS. The challenges we faced are typical of any CMS integration. We came up with solutions and ideas to these challenges and our hope is that they might be of help for other CMS integrations as well.
That includes content indexing, file indexing, keeping track of content changes, handling multi-language sites, search and facetting, access restrictions, result presentation, and how to keep all these things flexible and re-usable for many different sites.
For all these things we used a couple additional Apache projects and we would like to show how we use them and how we contributed back to them while building our Solr integration.
Hippo gettogether april 2012 faceted navigation a tale of daemonsHippo
Wouter Danes gave a presentation about using daemons to derive faceted navigation properties for documents in Hippo CMS. The standard derived data engine cannot populate facets from multiple related documents, so they used a daemon module running in the repository to dynamically derive the facet properties. Some lessons learned were that daemon modules need to be thread-safe and can cause issues with publishing if properties are derived incorrectly. Improving the derived data engine or waiting for SOLR integration were suggested to make faceting of related documents easier in Hippo CMS.
Introducing Apricot, The Eclipse Content Management PlatformNuxeo
This talk delivered by Florent Guillaume, Director of R&D at Nuxeo, will provide the audience with a global understanding of what Apricot is and also provide a general overview of what a Content Repository is from a functional standpoint: exploring all the services it offers, identifying the main standards and technologies integrated within a framework of this caliber, such as the Content Management Interoperability Standard (CMIS), and understanding the main technical challenges to be resolved, in particular high scalability and high performance.
The document discusses a content repository, which is a generic API for content storage that provides CRUD functionality as well as versioning, transactions, and search capabilities. It describes how a content repository enforces simplicity, encourages standardization, and improves scalability. Examples of content repository implementations are provided, including Apache Jackrabbit and eXo Platform. Key features of content repositories are explored such as the content model, repository structure with workspaces and nodes/properties, and node type definitions.
This document summarizes a workshop on automatic export in Hippo CMS. It discusses how automatic export can export configuration changes from a running CMS to a developer's local project to simplify updating content modules. It provides an overview of how automatic export works and how to configure it, including enabling/disabling it, setting exclusion patterns and filters, and exporting to multiple modules. It concludes with some tips for using automatic export effectively.
This document discusses strategies for building high-performance cross-platform mobile apps for Office 365 and SharePoint. It recommends using Sencha Touch as a strategic framework and discusses its pros and cons. It also covers challenges like native-HTML integration, HTML5 rendering, and client-server interaction. It provides demos and discusses approaches to address these challenges through techniques like lazy rendering, connection pooling, caching, and optimized APIs.
Niko Bellic is the main protagonist in the 2008 video game Grand Theft Auto IV. He is a war veteran from Serbia who immigrates to Liberty City seeking a new life away from conflict and crime, but ultimately finds himself drawn back into a world of criminal activity.
This document provides an overview of web application development using portlets. It introduces application servers, portlets, and the Liferay portal framework. Key points covered include the portlet lifecycle and interfaces, deploying portlets using tools like Ant and the Liferay Plugin SDK, and hosting portlets on an application server like Glassfish. The goal is to provide tutorial on developing portlets for science gateways.
This week, web content management analyst Janus Boye is hosting a group from Denmark to tour and learn about the latest trends and technologies in the online/digital world. His “Digital Priorities for 2012” tour stopped here in Amsterdam, where we were very proud to present Hippo CMS.
In this presentation the following issues are addressed:
- The Dutch Government’s approach to a new e-government initiative and how they utilized Hippo CMS to go from multiple CMS’s and web sites to one completely powered by Hippo.
- Our forward-looking technology trends into social and mobile content delivery, as well as Hippo’s focus on Context Aware Content Management.
- A sneak preview of some of the upcoming, new features of Hippo CMS
This presentation is about the history of Hippo and includes the lessons learned.
Hippo started in 1999 as three university friends with a hunch that this “internet” thing would take off. We’ve grown since then, and taken part in and shaped a few IT milestones along the way.
As we turn 15, we’re proud to say we’re using cutting edge technology to help major clients reach their audiences on every channel, and keeping them ready for whatever channel comes next.
The document introduces JCR 2.0 and its top 10 new features compared to JCR 1.0. Key highlights include:
1) Query extensions with the Abstract Query Model and Java Query Object Model that replace XPath queries.
2) Support for access control lists and policies.
3) Integration with records management systems through retention policies and legal holds.
4) Simplified linear versioning model.
5) Support for lifecycle management and expressing transitions between states.
6) A standardized way to register new and modified node types.
7) New properties and node types were added.
8) Standardized creation and removal of workspaces.
9
This document provides an overview of Module 1 of a physics course. It discusses:
- Physics as the most basic science, dealing with concepts like motion, forces, energy, and the composition of atoms.
- How physics helps improve lives through technologies like transportation, communication devices, and appliances built upon its principles.
- The roles of physicists in developing theories, laws, and inventions that enhance modern society.
- The 4 lessons to be covered in Module 1: the nature of physics, connections between physics and technology/society, the contributions of physicists, and using the scientific method.
The document discusses the Java Content Repository (JCR) and the Sling framework, which builds on JCR to enable scriptable web applications. It provides an overview of using Sling to develop a simple blog application with only 46 lines of code through JavaScript scripts and RESTful interfaces. The document also demonstrates more advanced features of Sling like content observation and generation of image thumbnails through an OSGi bundle.
Από το πρόγραμμα περιβαλλοντικής εκπαίδευσης "Κρήτη & Σαντορίνη: Τα νησιά του κρασιού" που υλοποιήθηκε στο 7ο Γυμνάσιο Ηρακλείου το σχολικό έτος 2014-2015
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
The document discusses building a large scale SEO/SEM application using Apache Solr. It describes some of the key challenges faced in indexing and searching over 40 billion records in the application's database each month. It discusses techniques used to optimize the data import process, create a distributed index across multiple tables, address out of memory errors, and improve search performance through partitioning, index optimization, and external caching.
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
Slides from my talk on "Building a Large Scale SEO/SEM Application with Apache Solr" in Lucene/Solr Revolution 2014 where I talk how we handle Indexing/Search of 40 billion records (documents)/month in Apache Solr with 4.6 TB compressed index data.
Abstract: We are working on building a SEO/SEM application where an end user search for a "keyword" or a "domain" and gets all the insights about these including Search engine ranking, CPC/CPM, search volume, No. of Ads, competitors details etc. in a couple of seconds. To have this intelligence, we get huge web data from various sources and after intensive processing it is 40 billion records/month in MySQL database with 4.6 TB compressed index data in Apache Solr.
Due to large volume, we faced several challenges while improving indexing performance, search latency and scaling the overall system. In this session, I will talk about our several design approaches to import data faster from MySQL, tricks & techniques to improve the indexing performance, Distributed Search, DocValues(life saver), Redis and the overall system architecture.
A very short briefing of design choices selected in the Telstra Health (HealthConnex) FHIR server codenamed "sqlonfhir".
Presented at a HL7 New Zealand Conference in June 2016 along with a code walkthrough for those attending.
This document provides an agenda and overview for a one-day Lucene boot camp tutorial. The schedule includes sessions on introducing Lucene, indexing, analysis, searching, and performance. It also covers topics like indexing in Lucene, analyzing text, querying, sorting results, and optimizing search performance. The document seeks to help attendees understand Lucene's core capabilities through real examples, code, and data. It encourages attendees to ask questions.
This document discusses approaches for improving Django performance. It notes that front-end performance issues typically account for 80-90% of response time and recommends caching static assets, bundling/minifying assets, and using a CDN. For back-end issues, it recommends profiling views to identify SQL or Python bottlenecks and provides techniques like select_related, prefetch_related, and caching to address different problem areas. The key message is that performance work requires understanding where time is actually being spent before applying optimizations.
This document discusses MongoDB and scaling strategies when using MongoDB. It begins with an overview of MongoDB's architecture, data model, and operations. It then describes some early performance issues encountered with MongoDB including issues with durability settings, queries locking servers, and updates moving documents. The document recommends strategies for scaling such as adding more RAM, partitioning data through sharding, and monitoring replication delay closely for disaster recovery.
Lucene, Solr and java 9 - opportunities and challengesCharlie Hull
Apache Lucene and Solr needed to be updated to work with Java 9's new module system. This introduced challenges around strong encapsulation and reflective access. The talk discussed changes like compact strings and performance improvements from intrinsics and the G1 garbage collector. It also recommended using multi-release JARs to include Java 9 specific implementations of utils classes for compatibility. Migrating to Java 9 could improve security and performance in some cases for Elasticsearch users.
Solr at zvents 6 years later & still going stronglucenerevolution
Presented by Amit Nithianandan, Lead Engineer Search/Analytics New Platforms, Zvents/Stubhub
Zvents has been a user of Apache Solr since 2007 when it was very early. Since then, the team has made extensive use of the various features and most recently completed an overhaul of the search engine to Solr 4.0. We'll touch on a variety of development/operational topics including how we manage the build lifecycle of the search application using Maven, release the deployment package using Capistrano and monitor using NewRelic as well as the extensive use of virtual machines to simplify node management. Also, we’ll talk about application level details such as our unique federated search product, and the integration of technologies such as Hypertable, RabbitMQ, and EHCache to power more real-time ranking and filtering based on traffic statistics and ticket inventory.
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objectsITD Systems
The document discusses a scheme for sharding Alfresco repositories to address scalability and storage limitations. Key points of the scheme include:
- Repositories are sharded across multiple independent servers, each storing a part of the content.
- A level 7 switch balances requests across repositories and provides a single API entry point.
- An external SOLR cloud indexes all repositories in a single index to allow federated queries.
- The scheme is benchmarked to scale to 15,000 concurrent users on commodity hardware. Additional considerations for production include auto-discovery, configuration management, and safety checks.
Solr and ElasticSearch demo and speaker feb 2014nkabra
The document provides an overview of distributed database architecture and search technologies. It discusses Solr and ElasticSearch, including their history, key features, use cases, and migration process. A presentation is given covering basics, current usage, highlights, and taking questions. Examples are provided of companies using ElasticSearch for applications like resume recommendations, integration, and searching large collections of documents.
06 integrating extra features and looking forwardМарина Босова
This document discusses various topics related to integrating extra features into Entity Framework, including stored procedures, concurrency detection, best practices, and new features in Entity Framework 7. Stored procedures can provide a single point of access control and help limit dynamic SQL queries. Concurrency conflicts can be detected by adding a timestamp property to entities. Best practices include using Include to eager load related data and disposing of database contexts properly. Entity Framework 7 brings performance improvements and support for additional platforms and data stores.
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
This document provides an overview of steps to build an agile analytics application, beginning with raw event data and ending with a web application to explore and visualize that data. The steps include:
1) Serializing raw event data (emails, logs, etc.) into a document format like Avro or JSON
2) Loading the serialized data into Pig for exploration and transformation
3) Publishing the data to a "database" like MongoDB
4) Building a web interface with tools like Sinatra, Bootstrap, and JavaScript to display and link individual records
The overall approach emphasizes rapid iteration, with the goal of creating an application that allows continuous discovery of insights from the source data.
pandas.(to/from)_sql is simple but not fastUwe Korn
Pandas provides convenience methods to read and write to databases using to_sql and read_sql. They provide great usability and a uniform interface for all databases that support an SQL Alchemy connection. Sadly, the layer of convenience also introduces a performance loss. Luckily, for a lot of databases, a performant access layer is available.
Have you heard that all in-memory databases are equally fast but unreliable, inconsistent and expensive? This session highlights in-memory technology that busts all those myths.
Redis, the fastest database on the planet, is not a simply in-memory key-value data-store; but rather a rich in-memory data-structure engine that serves the world’s most popular apps. Redis Labs’ unique clustering technology enables Redis to be highly reliable, keeping every data byte intact despite hundreds of cloud instance failures and dozens of complete data-center outages. It delivers full CP system characteristics at high performance. And with the latest Redis on Flash technology, Redis Labs achieves close to in-memory performance at 70% lower operational costs. Learn about the best uses of in-memory computing to accelerate everyday applications such as high volume transactions, real time analytics, IoT data ingestion and more.
Tired of seeing the loading spinner of doom while trying to analyze your big data on Tableau? Learn how Jethro accelerates your database so you can interactively analyze your big data on Tableau and gain the crucial insights that you need without losing your train of thought. Jethro enables you to be completely flexible with no need for partitions in order to speed up the data. This presentation will explain how indexing is a superior architecture for the BI use case when dealing with big data while compared to MPP architecture.
SQL Now! How Optiq brings the best of SQL to NoSQL data.Julian Hyde
A talk given at the August, 2013 NoSQL Now! conference in San Jose, California.
2013 is the year of that SQL meets Hadoop & NoSQL. There is a surprisingly good fit between thirty-something SQL and the upstart millennial data technologies. SQL helps you deliver faster value, improve integration with visualization and analytics tools, and bring your data to a larger audience.
Julian Hyde, author of Mondrian and Optiq, and a contributor to Apache Drill and Cascading Lingual, shows how to quickly build a SQL interface to a NoSQL system using the Optiq framework.
As a case study, he shows how Optiq has allows the Mondrian in-memory analysis engine (part of Pentaho's business intelligence suite) to leverage the memory and compute power of a cluster and pull data from hybrid SQL and NoSQL sources.
Slides for a talk.
Talk abstract:
In the dark of the night, if you listen carefully enough, you can hear databases cry. But why? As developers, we rarely consider what happens under the hood of widely used abstractions such as databases. As a consequence, we rarely think about the performance of databases. This is especially true to less widespread, but often very useful NoSQL databases.
In this talk we will take a close look at NoSQL database performance, peek under the hood of the most frequently used features to see how they affect performance and discuss performance issues and bottlenecks inherent to all databases.
This document provides tips and best practices for optimizing Apache Spark performance and resource allocation. It discusses:
- The components of Spark including executors, drivers, and tasks
- Configuring Spark on YARN and dynamic resource allocation
- Optimizing memory usage, avoiding data skew, and reducing serialization costs
- Best practices for Spark Streaming around microbatching, fault tolerance, and performance
- Recommendations for running Spark on cloud object stores like S3
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
2. About me:
Ard Schrijvers
1. Working at Hippo since 2001
2. Email: a.schrijvers@onehippo.com
ard@apache.org
3. Worked primarily on:
1. HST
2. Hippo Repository / Jackrabbit
3. Lucene
4. Cocoon
5. Slide
4. Apache committer of Jackrabbit and Cocoon
4. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
5. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
6. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
7. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A very fast demo
8. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A very fast demo
6. Wrap up
9. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A very fast demo
6. Wrap up
7. Questions
11. Current search architecture
So
An HSTQuery
is translated to an
XPath query
Which is delegated to the repository that returns a
JCR NodeIterator
which the HST binds back to
HippoBean's
16. Current search architecture
Reasons:
1. Back in the days when Jackrabbit 1 started, Lucene was at
version 1.4
2. The first JSR-170 spec imposed some very harsh
constraints : A save must result in directly updated search
results
17. Current search architecture
Reasons:
1. Back in the days when Jackrabbit 1 started, Lucene was at
version 1.4
2. The first JSR-170 spec imposed some very harsh
constraints : A save must result in directly updated search
results
3. Support for XPath / SQL was needed. However, Lucene
likes flattened data, JCR with XPath / SQL is all about
hierarchical data
18. Current search architecture
Reasons:
1. Back in the days when Jackrabbit 1 started, Lucene was at
version 1.4
2. The first JSR-170 spec imposed some very harsh
constraints : A save must result in directly updated search
results
3. Support for XPath / SQL was needed. However, Lucene
likes flattened data, JCR with XPath / SQL is all about
hierarchical data
4. JCR Nodes != Documents
19. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A short HOWTO as developer
6. A very fast demo
7. Wrap up
8. Questions
20. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
21. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
2. A search result only returns Nodes (Rows) : what if you
want something else, like auto-completion
22. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
2. A search result only returns Nodes (Rows) : what if you
want something else, like auto-completion
3. Very hard and very limited to customize
23. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
2. A search result only returns Nodes (Rows) : what if you
want something else, like auto-completion
3. Very hard and very limited to customize
4. A single index for an entire workspace
24. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
2. A search result only returns Nodes (Rows) : what if you
want something else, like auto-completion
3. Very hard and very limited to customize
4. A single index for an entire workspace
5. Support for very complex XPath / SQL queries at a price
of CPU, Memory and complexity
25. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
2. A search result only returns Nodes (Rows) : what if you
want something else, like auto-completion
3. Very hard and very limited to customize
4. A single index for an entire workspace
5. Support for very complex XPath / SQL queries at a price
of CPU, Memory and complexity
6. Only JCR Nodes and properties are indexed : no 'derived'
field indexes
26. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
2. A search result only returns Nodes (Rows) : what if you
want something else, like auto-completion
3. Very hard and very limited to customize
4. A single index for an entire workspace
5. Support for very complex XPath / SQL queries at a price
of CPU, Memory and complexity
6. Only JCR Nodes and properties are indexed : no 'derived'
field indexes
7. To index external sources, the sources need to be stored in
the repository
27. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
2. A search result only returns Nodes (Rows) : what if you
want something else, like auto-completion
3. Very hard and very limited to customize
4. A single index for an entire workspace
5. Support for very complex XPath / SQL queries at a price
of CPU, Memory and complexity
6. Only JCR Nodes and properties are indexed : no 'derived'
field indexes
7. To index external sources, the sources need to be stored in
the repository
8. Range queries (and others) easily blow up
28. Current problems / shortcomings /
mismatches
1. JCR Nodes are indexed instead of Documents
(#nodes >> #documents)
2. A search result only returns Nodes (Rows) : what if you
want something else, like auto-completion
3. Very hard and very limited to customize
4. A single index for an entire workspace
5. Support for very complex XPath / SQL queries at a price
of CPU, Memory and complexity
6. Only JCR Nodes and properties are indexed : no 'derived'
field indexes
7. To index external sources, the sources need to be stored in
the repository
8. Range queries (and others) easily blow up
9. Getting the number of hits is complex
29. Current problems / shortcomings /
mismatches
Extra problem
JCR Nodes
!=
Documents
For example : A news document contains a link to an author
document : Through the author name, the news document
should be found
30. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A very fast demo
6. Wrap up
7. Questions
31. Objectives
1. Fix all the 9+ problems / shortcomings/ mismatches from
previous slides
2. Easy to use and customize
3. Satisfied customers
4. Satisfied partners
5. Scalable searches : CPU, memory and large document
numbers
6. Document oriented
7. Integration with HST ContentBeans (HippoBeans)
8. Index external sources
9. Control the SIZE of the index yourself
10. Don't invent but integrate ( with out-of-the-box features
supported by a large community)
32. Objective: Fix all the 9 problems /
shortcomings/ mismatches from
previous slides
33. Objective: Fix all the 9 problems /
shortcomings/ mismatches from
previous slides
Easy:
Solr integration to rescue
40. Objective: Easy to use and
customize
You decide 'from where', 'what', 'how' and 'when' to index
41. Objective: Easy to use and
customize
You decide 'from where', 'what', 'how' and 'when' to index
1. from where: which sources (jcr, webpages, database,
noSQL store, nuxeo, alfresco, anything)
42. Objective: Easy to use and
customize
You decide 'from where', 'what', 'how' and 'when' to index
1. from where: which sources (jcr, webpages, database,
noSQL store, nuxeo, alfresco, anything)
2. what : which parts of a document (not jcr node) or external
source
43. Objective: Easy to use and
customize
You decide 'from where', 'what', 'how' and 'when' to index
1. from where: which sources (jcr, webpages, database,
noSQL store, nuxeo, alfresco, anything)
2. what : which parts of a document (not jcr node) or external
source
3. how :
1. which analyzer,
2. index on document level, property level or both
3. store the text
44. Objective: Easy to use and
customize
You decide 'from where', 'what', 'how' and 'when' to index
1. from where: which sources (jcr, webpages, database,
noSQL store, nuxeo, alfresco, anything)
2. what : which parts of a document (not jcr node) or external
source
3. how :
1. which analyzer,
2. index on document level, property level or both
3. store the text
4. when : when do you want to index
45. Objective: Easy to use and
customize
But of course, out-of-the-box support and tooling
ready to be used by YOU
46. Objective: Easy to use and
customize
But of course, out-of-the-box support and tooling
ready to be used by YOU
1. Default hippo repository indexer & observer
47. Objective: Easy to use and
customize
But of course, out-of-the-box support and tooling
ready to be used by YOU
1. Default hippo repository indexer & observer
2. ContentBean (HippoBean) annotations for indexing
48. Objective: Easy to use and
customize
But of course, out-of-the-box support and tooling
ready to be used by YOU
1. Default hippo repository indexer & observer
2. ContentBean (HippoBean) annotations for indexing
3. Binding search results to ContentBean's
49. Objective: Easy to use and
customize
But of course, out-of-the-box support and tooling
ready to be used by YOU
1. Default hippo repository indexer & observer
2. ContentBean (HippoBean) annotations for indexing
3. Binding search results to ContentBean's
4. Deployment support
50. Objective: Easy to use and
customize
But of course, out-of-the-box support and tooling
ready to be used by YOU
1. Default hippo repository indexer & observer
2. ContentBean (HippoBean) annotations for indexing
3. Binding search results to ContentBean's
4. Deployment support
5. Clustering support
55. Objective: Satisfied customers
If they are not satisfied enough you can:
1. Easily customize it (aka tune it until 'je een ons weegt')
2. Hire anyone with Solr experience : All our partners have
Solr experience
56. Objective: Satisfied customers
Still not satisfied?
Let them pay too much for a Google Search appliance,
Autonomy or any of the other 'useless to pay for software'
60. Objective: Satisfied partners
1. Our partners frequently have good knowledge about Solr
2. Our partners depend less on the current search limitations
61. Objective: Satisfied partners
1. Our partners frequently have good knowledge about Solr
2. Our partners depend less on the current search limitations
3. Our partners can pitch with their Solr knowledge
62. Objective: Satisfied partners
1. Our partners frequently have good knowledge about Solr
2. Our partners depend less on the current search limitations
3. Our partners can pitch with their Solr knowledge
4. Our partners can sell more Hippo implementations
63. Objective: Satisfied partners
1. Our partners frequently have good knowledge about Solr
2. Our partners depend less on the current search limitations
3. Our partners can pitch with their Solr knowledge
4. Our partners can sell more Hippo implementations
5. Our partners will earn more on Hippo and have happier
developers
64. Objective: Satisfied partners
1. Our partners frequently have good knowledge about Solr
2. Our partners depend less on the current search limitations
3. Our partners can pitch with their Solr knowledge
4. Our partners can sell more Hippo implementations
5. Our partners will earn more on Hippo and have happier
developers
6. Hippo will earn more through HES: Which will satisfy
partners again, because Hippo can spend more on AR&D
==> more features
68. Objective: Scalable searches
1. Using Solr to do the searches
2. Not the complex JCR hierarchical searches
3. Document oriented instead of JCR Nodes ( #docs <<
#nodes)
79. Objective: Integration with
ContentBeans (HippoBeans)
Annotate your getters with
@IndexField
or
@IndexField(name="foo")
And account for them in Solr schema.xml
<field name="title" type="text_general" indexed="true" stored="true" />
<field name="summary" type="text_general" indexed="true" stored="true"/>
80. Objective: Integration with
ContentBeans (HippoBeans)
An example:
@Node(jcrType="demosite:textdocument")
public class TextBean extends BaseDocument {
@IndexField
public String getTitle() {
return getProperty("demosite:title") ;
}
@IndexField(name="samenvatting")
public String getSummary() {
return getProperty("demosite:summary") ;
}
}
81. Objective: Integration with
ContentBeans (HippoBeans)
Another example:
@Node(jcrType="demosite:textdocument")
public class TextBean extends BaseDocument {
@IndexField
public String getTitle() {
return getProperty("demosite:title") ;
}
@IndexField
public String getSummary() {
return getProperty("demosite:summary") ;
}
@IndexField
public String getAuthor() {
return getLinkedBean("demosite:author", Author.class). etAuthor();
g
}
}
82. Objective: Integration with
ContentBeans (HippoBeans)
Another example:
@Node(jcrType="demosite:textdocument")
public class TextBean extends BaseDocument {
@IndexField
public String getTitle() {
return getProperty("demosite:title") ;
}
@IndexField
public String getSummary() {
return getProperty("demosite:summary") ;
}
@ReIndexOnChange
@IndexField
public Author getAuthor() {
return getLinkedBean("demosite:author", Author.class);
}
}
83. Objective: Integration with
ContentBeans (HippoBeans)
Another example: Setters
@Node(jcrType="demosite:textdocument")
public class TextBean extends BaseDocument {
private String title;
private String summary;
@IndexField
public String getTitle() {
return title == null ? getProperty("demosite:title"): title ;
}
public void setTitle(String title) {
this.title = title;
}
@IndexField
public String getSummary() {
return summary == null ? getProperty("demosite:summary"): summary ;
}
public void setSummary(String summary) {
this.summary = summary;
}
}
Bonus : What can we achieve with the Setters?
84. Objective: Integration with
ContentBeans (HippoBeans)
That's all you need to do
And the HST binds some extra indexing fields like
1. The path
2. The canonicalUUID
3. The name
4. The localized name
5. The depth
6. The class hierarchy (including interfaces)
87. Objective: Index external sources
You can
1. Push them directly to Solr
2. Push them to a HST JAX-RS resource that binds to a
ContentBean and commits to Solr
88. Objective: Index external sources
You can
1. Push them directly to Solr
2. Push them to a HST JAX-RS resource that binds to a
ContentBean and commits to Solr
3. Crawl from the HST and bind to ContentBeans and commit
them to Solr
89. Objective: Index external sources
A ContentBean does *not* need a JCR Node!
ContentBean interface:
public interface ContentBean {
@IndexField(name="id")
String getPath();
void setPath(String path);
}
90. Objective: Index external sources
An example : GoGreenProductBean in Testsuite
public class GoGreenProductBean implements ContentBean {
private String path;
private String title;
private String summary;
private String description;
public String getPath() {return path;}
public void setPath(final String path) {this.path = path;}
@IndexField
public String getTitle() {return title;}
public void setTitle(String title) {this.title = title;}
@IndexField
public String getSummary() {return summary ;}
public void setSummary(String summary) {this.summary = summary;}
@IndexField
public String getDescription() {return description;}
public void setDescription(String description) {this.description = description;}
}
91. Objective: Index external sources
And add the GoGreenProductBean to Solr
{
List<GoGreenProductBean> gogreenBeans = new ArrayList<GoGreenProductBean>();
// FILL THE gogreenBeans LIST
// NOW ADD TO INDEX
HippoSolrManager solrManager =
HstServices.getComponentManager().getComponent(
HippoSolrManager.class.getName(), SOLR_MODULE_NAME);
try {
solrManager.getSolrServer().addBeans(gogreenBeans);
UpdateResponse commit = solrManager.getSolrServer().commit();
} catch (IOException e) {
e.printStackTrace();
} catch (SolrServerException e) {
e.printStackTrace();
}
}
93. Objective: Control the SIZE of the
index yourself
JCR / Jackrabbit / Hippo-Repository has a generic
one-fits-all-index (or one-fits-none-index)
Which grows very large easily, and can hardly be customized
94. Objective: Control the SIZE of the
index yourself
However, search is
domain specific
Thus,
Just index what is needed
for the customer
97. Objective: Don't invent but integrate
For example:
HippoSolrManager solrManager = ...
String query = ...
HippoQuery hippoQuery = solrManager.createQuery(query);
hippoQuery.setLimit(pageSize);
hippoQuery.setOffset((page - 1) * pageSize);
// hippoQuery.getSolrQuery() is the SolrQuery object
// include scoring
hippoQuery.getSolrQuery().setIncludeScore(true);
hippoQuery.getSolrQuery().setHighlight(true);
hippoQuery.getSolrQuery().setHighlightFragsize(200);
hippoQuery.getSolrQuery().addHighlightField("title");
hippoQuery.getSolrQuery().addHighlightField("summary");
hippoQuery.getSolrQuery().addHighlightField("htmlContent");
HippoQueryResult result = hippoQuery.execute(true);
98. Objective: Don't invent but integrate
For example:
HippoSolrManager solrManager = ...
String query = ...
HippoQuery hippoQuery = solrManager.createQuery(query);
hippoQuery.setLimit(pageSize);
hippoQuery.setOffset((page - 1) * pageSize);
// hippoQuery.getSolrQuery() is the SolrQuery object
// include scoring
hippoQuery.getSolrQuery().setIncludeScore(true);
hippoQuery.getSolrQuery().setHighlight(true);
hippoQuery.getSolrQuery().setHighlightFragsize(200);
hippoQuery.getSolrQuery().addHighlightField("title");
hippoQuery.getSolrQuery().addHighlightField("summary");
hippoQuery.getSolrQuery().addHighlightField("htmlContent");
HippoQueryResult result = hippoQuery.execute(true);
99. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A very fast demo
6. Wrap up
7. Questions
101. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A very fast demo
6. Wrap up
7. Questions
102. A very fast demo
setup
~75.000 long wikipedia docs in repository
............... doing the demo .................
104. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A very fast demo
6. Wrap up
7. Questions
106. Wrap up
I think that with the Solr integration
1. Developers will be happier
107. Wrap up
I think that with the Solr integration
1. Developers will be happier
2. Customers will be happier
108. Wrap up
I think that with the Solr integration
1. Developers will be happier
2. Customers will be happier
3. Partners will be happier
109. Wrap up
I think that with the Solr integration
1. Developers will be happier
2. Customers will be happier
3. Partners will be happier
4. Hippo will be happier
110. Wrap up
I think that with the Solr integration
1. Developers will be happier
2. Customers will be happier
3. Partners will be happier
4. Hippo will be happier
And finally, last and least
111. Wrap up
I think that with the Solr integration
1. Developers will be happier
2. Customers will be happier
3. Partners will be happier
4. Hippo will be happier
5. Infra will be happier because the servers stop sweating
112. Outline
1. The current search (HST / repo) architecture
2. The current problems / shortcomings / mismatches
3. What we are trying to improve, the objectives
4. Solr integration to rescue
5. A very fast demo
6. Wrap up
7. Questions
113. Questions?
Check out the example at :
http://svn.onehippo.org/repos/hippo/hippo-cms7/testsuite/trunk