0
CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013                         Search All the ThingsFriday, February 8, 13
Introduction      Kevin Bridges       •     Senior Software Engineer, Cloud             Systems at Acquia       •     Avid...
The Problem      Large organizations have lots of data that can be in multiple      formats. Different teams can use diffe...
Engineering Week Hackathon      We had 24 hours to solve the problem.       •     Build a Drupal 7 site       •     Integr...
The Team      We needed a few specialists to pull this off. 3 Drupal      developers, 1 Drupal themer, and 2 operations ha...
Drupal Modules      We used 6 contributed modules to accelerate our      development efforts. We needed to create 1 custom...
Drupal Modules      Contributed Modules Continued      •     Apache Solr Multisite Search - Search across multiple        ...
Custom StreamWrappers      Drupal’s StreamWrappers allow us to keep local copies of      the data we need to index while m...
Jenkins      Jenkins runs a cron that gathers all of the data we want      indexed and pushes it into the main git reposit...
Scanning Content for Indexing      Before we can index content in Solr we need to identify      what should be indexed. On...
Passing Content to Solr      For each of the scanned documents we need to build a Solr      document to be used in search ...
Create Facets with FacetAPI      The FacetAPI is used to create custom Facets. We wanted      a facet to allow filtering b...
Drush Integration      It’s always a good idea to start with Drush while building      advanced tools. This provides easie...
Custom apidocs_search Module      The bulk of our customizations were focused in the      apidocs_search module. This modu...
Resources and Links      Developers       •     cyberswat - http://drupal.org/user/27802       •     pwolanin - http://dru...
Resources and Links      Contrib Modules      •     http://drupal.org/project/acquia_connector      •     http://drupal.or...
Aquia is Hiring in Australia                              (and elsewhere)                   https://www.acquia.com/careers...
CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013                          Search All the Things                     ...
Upcoming SlideShare
Loading in...5
×

Search all the things

437

Published on

This outlines a 24 hackathon project at Acquia that addresses combining generated api documentation and docs from github hosted resources into a single indexeable interface managed by Solr and Drupal.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
437
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Search all the things"

  1. 1. CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013 Search All the ThingsFriday, February 8, 13
  2. 2. Introduction Kevin Bridges • Senior Software Engineer, Cloud Systems at Acquia • Avid technologist that believes Drupal is a component of larger systems. • http://drupal.org/user/27802 - aka cyberswat • https://twitter.com/cyberswat 2Friday, February 8, 13
  3. 3. The Problem Large organizations have lots of data that can be in multiple formats. Different teams can use different tools and services making a cohesive interface difficult. • Hosted data with services like Github • Internal API’s • Wikis • Documents and text files. This data can span multiple languages and formats. How can we combine all of these sources into a single interface that is easy to use while maintaining context? 3Friday, February 8, 13
  4. 4. Engineering Week Hackathon We had 24 hours to solve the problem. • Build a Drupal 7 site • Integrate with LDAP over SSL for secure access • Serve generated API docs like RDoc • Index generated docs and github docs for searching • Enable an effective faceted search 4Friday, February 8, 13
  5. 5. The Team We needed a few specialists to pull this off. 3 Drupal developers, 1 Drupal themer, and 2 operations hackers. • Kevin Bridges (@cyberswat) - Drupal & DevOps • Peter Wolanin (@pwolanin) - Drupal & Solr • Peter Jackson (@faoiseamh) - Drupal & DevOps • Richard Burford (@psynaptic) - Drupal Themer • Amin Astaneh (@aastaneh) - Operations • Chris Rutter (@ChrisRut) - Operations 5Friday, February 8, 13
  6. 6. Drupal Modules We used 6 contributed modules to accelerate our development efforts. We needed to create 1 custom module that currently lives in a Drupal Sandbox. Contributed Modules • Acquia Connector - Contains the Acquia Search module which provides integration between a Drupal site and Acquias hosted search service • Apache Solr - Integrates Drupal with the Apache Solr search platform • Apache Solr Attachments - Allows searching within file attachments from Solr 6Friday, February 8, 13
  7. 7. Drupal Modules Contributed Modules Continued • Apache Solr Multisite Search - Search across multiple sites with Solr • Facet API - Abstract facet API that can be used by various search backends • LDAP - Provides integration with LDAP services Custom Modules • API docs search - Search API docs with Solr 7Friday, February 8, 13
  8. 8. Custom StreamWrappers Drupal’s StreamWrappers allow us to keep local copies of the data we need to index while maintaining control over how the data is displayed to the end user. generated • Store generated content for indexing and viewing. • Allow the files to be viewable from the search results in the context of the Drupal site. • Allows us to store raw html for display from search results. github • Store github content for pre-processing and indexing. • Modify external links to this content to reference the document as it lives on github for additional context. 8Friday, February 8, 13
  9. 9. Jenkins Jenkins runs a cron that gathers all of the data we want indexed and pushes it into the main git repository as rendered content for the site. Once content is in git it is pulled onto the server for our StreamWrappers to work. • Checks out the allthethings repo that runs the main drupal install. • Loops over each of the git repositories we are interested in indexing. • Scans our standard documentation types and locations for changes and commits them to allthethings. • Runs RDoc to generate Ruby Docs and commits the documentation to allthethings if it has changed. 9Friday, February 8, 13
  10. 10. Scanning Content for Indexing Before we can index content in Solr we need to identify what should be indexed. Once identified, the file is tracked in mysql so that it can be processed efficiently. • Cron is used to pull down changes Jenkins may have pushed. • Each of the StreamWrapper file directories is scanned for valid content. • A hash of the content is generated with the timestamps to help target what should be indexed. • Database record includes uri, hash, timestamp, type, mimetype and status. 10Friday, February 8, 13
  11. 11. Passing Content to Solr For each of the scanned documents we need to build a Solr document to be used in search results. • Evaluate the content and render it using the github markup gem if necessary. • Evaluate the content for html tags to assist with surfacing content in searches. • Identify a good title for the document by searching for title and h1 tags. • Send the completed document to Solr for indexing. • Update our scanned document’s status to indicate it has been indexed. 11Friday, February 8, 13
  12. 12. Create Facets with FacetAPI The FacetAPI is used to create custom Facets. We wanted a facet to allow filtering by API Source and Content Type. • During generation of the Solr document populate the ss_apisource attribute. • FacetAPI provides a block for each content type. This corresponds with the entity_type attribute in our Solr document. • Implement hook_facetapi_facet_info to provide the definition of the facet. • Use apidocs_search_map_source to map different sources to labels. 12Friday, February 8, 13
  13. 13. Drush Integration It’s always a good idea to start with Drush while building advanced tools. This provides easier development, troubleshooting and maintenance capabilities. • apidocs-clean Removes file references from database that no longer exist in the filesystem • apidocs-index Indexes files referenced in {apidocs_search_files}. • apidocs-scan - Scans existing documentation to record references in the database. • apidocs-markup - Parses a github flavored markdown file into markup. 13Friday, February 8, 13
  14. 14. Custom apidocs_search Module The bulk of our customizations were focused in the apidocs_search module. This module is available in a sandbox on drupal.org for your inspection. • apidocs_search.index.inc - Manages Solr indexing • apidocs_search.install - Manages the apidocs_search_files schema. • apidocs_search_markup.rb - uses the github-markup gem to render github flavored markdown • apidocs_search_streamwrappers.inc - Provides a generated documentation and github stream wrapper • apidocs_search.module - Provides the necessary callbacks and methods to make it all work 14Friday, February 8, 13
  15. 15. Resources and Links Developers • cyberswat - http://drupal.org/user/27802 • pwolanin - http://drupal.org/user/49851 • faoiseamh - http://drupal.org/user/1999750 • psynaptic - http://drupal.org/user/93429 • aastaneh - http://drupal.org/user/2318122 • ChrisRut - http://drupal.org/user/597820 More Reading • https://www.acquia.com/blog/finding-all-things- engineering-hackathon • http://www.slideshare.net/cyberswat/drupalcon-sydney 15Friday, February 8, 13
  16. 16. Resources and Links Contrib Modules • http://drupal.org/project/acquia_connector • http://drupal.org/project/apachesolr • http://drupal.org/project/apachesolr_attachments • http://drupal.org/project/apachesolr_multisitesearch • http://drupal.org/project/facetapi • http://drupal.org/project/ctools • http://drupal.org/project/ldap Custom Modules • http://drupal.org/sandbox/pwolanin/1801674 16Friday, February 8, 13
  17. 17. Aquia is Hiring in Australia (and elsewhere) https://www.acquia.com/careersFriday, February 8, 13
  18. 18. CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013 Search All the Things We Need Your Feedback http://sydney2013.drupal.org/node/348Friday, February 8, 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×