Elgg Search Scalability &
Solr Integration
Matt Beckett
Matt Beckett
● Elgg Core Team
Member
● Lead Dev – Arck
Interactive
● Scuba Diver
Outline
● Bundled Elgg Search
● Scalability issues
● Birth of the Elgg Solr Plugin
● What is Solr?
● Elgg-Solr integration
● Customization
● Case Study
Elgg Search
● Bundled core plugin
● Provides customizable UI
● Search logic is hookable
●Works out of the box
Elgg Search Scalability
● Large sites run into slow search times
● Can affect performance of all areas of site
● Combination of MySQL and Elgg data
normalization
● Elgg Community - 2014
What is Solr?
● Java based search engine
● Single purpose and built for speed
● Flat xml document structure
● File content searching
● Flexible setup options (same/different server,
load balancing)
Solr Plugin Design
● Generic for use in any Elgg project
● Utilize existing:
- Pagehandlers
- Views
- Hooks
Indexing
● Mirroring an ElggEntity in Solr
● Hookable custom field management
● Flatten data structure
● Match Solr entity with ElggEntity by GUID
● Event-based synchronization
How it works
ElggEntity
Annotation
Elgg DB
create/update
Event
Event
Event
Shutdown
Cached
GUID
Solr Index
Searching
● Pagehandler & hook calling handled by core plugin
● Default hooks unregistered
● Hook parameters interpreted into Solr Query notation
● All default parameters handled automagically
Search Hook Parameters
$params['select'] = [
'start' => (int) offset,
'rows' => (int) limit,
'fields' => (array) field names to match against
];
$params['sorts'] = [
'score' => 'desc',
'time_created' => 'desc'
];
Search Hook Parameters
$params['qf'] = “title^1.5 description^1 location^1”;
$params['hlfields'] = [
'title',
'description'
];
$params['fragsize'] = 200;
Search Hook Parameters
$params['fq'] = [
'type' => 'type:object',
'subtype' => 'subtype:blog'
];
eg. $params['fq'] = [
'profile_pic' => 'profile_pic:true'
];
How it works
Elgg DB
Search
Pagehander
Hook
Solr Index
User
Query
SolariumSolarium
Hook
Results
Code time, finally
$event = new ElggObject();
$event->subtype = 'event';
$event->access_id = ACCESS_PUBLIC;
$event->title = $title;
$event->description = $description;
$event->location = $location;
$event->start_time = time(); // starting now
$event->end_time = strtotime('+3 days'); // ending in 3 days
Helper Plugin
Dynamic fields
_i : integer
_is : array of integers
_s : string (title)
_ss : array of strings (tags, etc)
_t : general text (description)
_txt : array of texts
_b : boolean
_bs : array of booleans
_f : float
_fs : array of floats
Case Study: EN MIRG
● Executive Networks
● Member Information Report Generator
● Staff facilitated communication
● Multiple reports with varying conditions
Solr to the rescue!
Conclusions
MySQL: 41.89 seconds
Solr: 0.29 seconds
Solr === Fast
(144x faster in this case)
Todos
● Https://github.com/arckinteractive/elgg_solr
● Code cleanup
● Multi-threaded reindex
● Index auto-correction
● Other ideas?

Elgg solr presentation

  • 1.
    Elgg Search Scalability& Solr Integration Matt Beckett
  • 2.
    Matt Beckett ● ElggCore Team Member ● Lead Dev – Arck Interactive ● Scuba Diver
  • 3.
    Outline ● Bundled ElggSearch ● Scalability issues ● Birth of the Elgg Solr Plugin ● What is Solr? ● Elgg-Solr integration ● Customization ● Case Study
  • 4.
    Elgg Search ● Bundledcore plugin ● Provides customizable UI ● Search logic is hookable ●Works out of the box
  • 5.
    Elgg Search Scalability ●Large sites run into slow search times ● Can affect performance of all areas of site ● Combination of MySQL and Elgg data normalization ● Elgg Community - 2014
  • 9.
    What is Solr? ●Java based search engine ● Single purpose and built for speed ● Flat xml document structure ● File content searching ● Flexible setup options (same/different server, load balancing)
  • 10.
    Solr Plugin Design ●Generic for use in any Elgg project ● Utilize existing: - Pagehandlers - Views - Hooks
  • 13.
    Indexing ● Mirroring anElggEntity in Solr ● Hookable custom field management ● Flatten data structure ● Match Solr entity with ElggEntity by GUID ● Event-based synchronization
  • 14.
    How it works ElggEntity Annotation ElggDB create/update Event Event Event Shutdown Cached GUID Solr Index
  • 15.
    Searching ● Pagehandler &hook calling handled by core plugin ● Default hooks unregistered ● Hook parameters interpreted into Solr Query notation ● All default parameters handled automagically
  • 16.
    Search Hook Parameters $params['select']= [ 'start' => (int) offset, 'rows' => (int) limit, 'fields' => (array) field names to match against ]; $params['sorts'] = [ 'score' => 'desc', 'time_created' => 'desc' ];
  • 17.
    Search Hook Parameters $params['qf']= “title^1.5 description^1 location^1”; $params['hlfields'] = [ 'title', 'description' ]; $params['fragsize'] = 200;
  • 18.
    Search Hook Parameters $params['fq']= [ 'type' => 'type:object', 'subtype' => 'subtype:blog' ]; eg. $params['fq'] = [ 'profile_pic' => 'profile_pic:true' ];
  • 19.
    How it works ElggDB Search Pagehander Hook Solr Index User Query SolariumSolarium Hook Results
  • 20.
    Code time, finally $event= new ElggObject(); $event->subtype = 'event'; $event->access_id = ACCESS_PUBLIC; $event->title = $title; $event->description = $description; $event->location = $location; $event->start_time = time(); // starting now $event->end_time = strtotime('+3 days'); // ending in 3 days
  • 21.
  • 22.
    Dynamic fields _i :integer _is : array of integers _s : string (title) _ss : array of strings (tags, etc) _t : general text (description) _txt : array of texts _b : boolean _bs : array of booleans _f : float _fs : array of floats
  • 25.
    Case Study: ENMIRG ● Executive Networks ● Member Information Report Generator ● Staff facilitated communication ● Multiple reports with varying conditions
  • 32.
    Solr to therescue!
  • 33.
    Conclusions MySQL: 41.89 seconds Solr:0.29 seconds Solr === Fast (144x faster in this case)
  • 34.
    Todos ● Https://github.com/arckinteractive/elgg_solr ● Codecleanup ● Multi-threaded reindex ● Index auto-correction ● Other ideas?

Editor's Notes

  • #2 Community search moved to solr Jun 17, 2014
  • #3 Hello and welcome. My name is Matt Beckett, you may know me from such places as the internet and underwater. I have been involved with Elgg since April, 2011 and quickly became a very productive plugin writer for various clients, notably Athabasca University. I have been a member of the Elgg core team since October 2013. I'm also the lead developer at Arck Interactive, one of the top Elgg dev outfits. Sorry for the shameless plug, but every time I say “Arck Interactive” Paul gives me a raise ;)
  • #4 Before we dive into the code lets just back up a bit and take a look at the history of search in Elgg. I came to Elgg when at version 1.7.8, and search was a bundled core plugin. It has been since 1.7.0. According to the code attribution it was a collaborative effort between Curverider and The MITRE Corporation (oh, also, whenever I say “The MITRE Corporation they send me a contract offer worth more than Paul's last raise – so t his should be a profitable trip!) The core plugin brought some important features to search capability – a standardized hook based framework and a nice way to customize results display with simple view overrides. The plugin is mostly unchanged to from that point to now
  • #5 Bundled with Elgg – this is something people expect with a social framework, the ability to search, and there it is supported in core. Works as advertised – you type in a query, and you get results matching that query. No magic involved and not much unexpected. No setup/config – it comes enabled by default and there's nothing else to it. No APIs, external services or technical debt.
  • #6 Bundled with Elgg – this is something people expect with a social framework, the ability to search, and there it is supported in core. Works as advertised – you type in a query, and you get results matching that query. No magic involved and not much unexpected. No setup/config – it comes enabled by default and there's nothing else to it. No APIs, external services or technical debt.
  • #7 MyISAM – a big part of the “standard” elgg performance improvements include converting database tables to innodb for row level transactions. Benchmarks have consistently shown this to be a faster overall schema, but until recently innodb did not support full text search Not scalable with DB size: we saw this on the Elgg community. Back in 2014 we had to switch over to google search because searches were timing out over 30 seconds... Tag search: we have the ability to register multiple names for tag metadata, each one causes the query to become heavier SLOW: filtering results by arbitrary metadata
  • #8 MyISAM – a big part of the “standard” elgg performance improvements include converting database tables to innodb for row level transactions. Benchmarks have consistently shown this to be a faster overall schema, but until recently innodb did not support full text search Not scalable with DB size: we saw this on the Elgg community. Back in 2014 we had to switch over to google search because searches were timing out over 30 seconds... Tag search: we have the ability to register multiple names for tag metadata, each one causes the query to become heavier SLOW: filtering results by arbitrary metadata
  • #9 So that's core search in Elgg, so what is Solr?
  • #10 First and foremost Solr is a java based search engine. It's a single purpose application built for speed of searching xml documents. XML documents have arbitrary fields so they can be fit to model your data. It has a file parser that allows for indexing the content of a wide array of file types. Being java based it's OS independent and can be deployed on the same webserver as other applications such as Elgg, or load balanced on multiple servers.
  • #11 First and foremost Solr is a java based search engine. It's a single purpose application built for speed of searching xml documents. XML documents have arbitrary fields so they can be fit to model your data. It has a file parser that allows for indexing the content of a wide array of file types. Being java based it's OS independent and can be deployed on the same webserver as other applications such as Elgg, or load balanced on multiple servers.
  • #12 So why did I choose Solr? I didn't. The Solr plugin was originally started as a solution by Billy Gunn for one of our clients with a large database that was experiencing some major performance issues with search. It is however an official FOSS project of the Apache Software Foundation, and is used by many big players which means it's well tested, well maintained, and well supported. Those are all good qualities to look for when pulling a new service into a project. Billy did the original implementation for the client, I then took over and made some improvements, eventually rewriting it and making it generic enough for general release as an opensource Elgg plugin.
  • #13 So why did I choose Solr? I didn't. The Solr plugin was originally started as a solution by Billy Gunn for one of our clients with a large database that was experiencing some major performance issues with search. It is however an official FOSS project of the Apache Software Foundation, and is used by many big players which means it's well tested, well maintained, and well supported. Those are all good qualities to look for when pulling a new service into a project. Billy did the original implementation for the client, I then took over and made some improvements, eventually rewriting it and making it generic enough for general release as an opensource Elgg plugin.