Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
IntegratingGoogle Search Appliance    with Mura CMS       Ajay Sathuluri         @sathuluri
About Me∗   Ajay Sathuluri∗   Sr. Architect at ICF International∗   Using ColdFusion since ’98∗   Server Tuning, Administr...
What are we covering?∗ Google Search Appliance  ∗   Configuring a Crawl  ∗   Control Access to Content  ∗   Configuring Da...
Google Search Appliance - Home
Configuring a Crawl∗ Before starting a crawl, you must configure the crawl path so  that it only includes information that...
Google Search Appliance – Crawl URL
Configuring a Crawl∗ Demo
Control Access to Content∗ robot.txt∗ meta tag∗ no-crawl Directories
Control Access to Content (2)robot.txt∗ The Google Search Appliance always obeys the rules in robots.txt  and it is not po...
Control Access to Content (3)meta tag∗ Prevent the search appliance crawler (as well as other  crawlers) from indexing or ...
Control Access to Content (4)no-crawl Directories∗ The Google Search Appliance does not crawl any directories  named "no_c...
Configuring Database Crawl∗ Database data source information enables the search appliance  to access content stored in a d...
Google Search Appliance – Databases
Collections∗ A collection lets you search over a specific part of the index.∗ For example, you may want to create a produc...
Google Search Appliance – Collections
Front Ends∗ A front end enables you to change the look and feel of the  search and search result pages your users access.∗...
Google Search Appliance – Front Ends
Crawl Diagnostics∗ Crawl diagnostics provide detailed information about appliance  crawl status for a domain, host, direct...
Google Search Appliance - Crawl         Diagnostics
Google Search Appliance – Secret               Recipe"The appliance uses a sophisticated algorithm to             generate...
Mura – Plugin∗ Deploy Mura Plugin
GSA Plugin - Search∗ Search Code
GSA Plugin - Results∗ Search results code
GSA Plugin – DEMO∗ DEMO
Google Search Appliance – Secret            Recipe
Resources∗ http://docs.getmura.com/∗ http://www.getmura.com/marketplace/apps/fw1-plugin-  template/∗ https://developers.go...
AcknowledgementsThanks to Oğuz Demirkapi for helping to prepare the  presentation.
Q&A?
Upcoming SlideShare
Loading in …5
×

Integrating Google Search Appliance with Mura CMS

1,409 views

Published on

An overview of integrating Google Search Appliance with Mura CMS. Presented at MuraCon 2012 by Ajay Sathuluri.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Integrating Google Search Appliance with Mura CMS

  1. 1. IntegratingGoogle Search Appliance with Mura CMS Ajay Sathuluri @sathuluri
  2. 2. About Me∗ Ajay Sathuluri∗ Sr. Architect at ICF International∗ Using ColdFusion since ’98∗ Server Tuning, Administration, Load Testing∗ I like spending time with my kids and wife. 
  3. 3. What are we covering?∗ Google Search Appliance ∗ Configuring a Crawl ∗ Control Access to Content ∗ Configuring Database Crawl ∗ Collections / Front Ends ∗ Crawl Diagnostics∗ Configuring GSA with Mura CMS Plugin (FW/1)∗ Search∗ Search Results
  4. 4. Google Search Appliance - Home
  5. 5. Configuring a Crawl∗ Before starting a crawl, you must configure the crawl path so that it only includes information that you wants to make available in search results.∗ Use the Crawl and Index > Crawl URLs page in the Admin Console to enter URLs∗ URLs are case-sensitive.∗ Configure your network to disallow search appliance connectivity outside of your intranet.
  6. 6. Google Search Appliance – Crawl URL
  7. 7. Configuring a Crawl∗ Demo
  8. 8. Control Access to Content∗ robot.txt∗ meta tag∗ no-crawl Directories
  9. 9. Control Access to Content (2)robot.txt∗ The Google Search Appliance always obeys the rules in robots.txt and it is not possible to override this feature.∗ robots.txt file is not mandatory.∗ It is located in the Web servers root directory.∗ For the search appliance to be able to access the robot.txt file, the file must be public.∗ Includes one or more Disallow: or Allow:∗ User-agent: gsa-crawler∗ Disallow: /personal_records/∗ Disallow: /admin/∗ Allow: /∗ Allow: /personal_records/mypersonal.doc
  10. 10. Control Access to Content (3)meta tag∗ Prevent the search appliance crawler (as well as other crawlers) from indexing or following links in a specific HTML page.∗ Embed a robots meta tag in the head of the HTML page.∗ The search appliance crawler obeys the index, noindex, follow, and nofollow in meta tags.<meta name="robots" content="index, nofollow"><meta name="robots" content="noindex, nofollow">
  11. 11. Control Access to Content (4)no-crawl Directories∗ The Google Search Appliance does not crawl any directories named "no_crawl." You can prevent the search appliance from crawling files and directories by: Creating a directory called "no_crawl."∗ Putting the files and subdirectories you do not want crawled under the no_crawl directory.
  12. 12. Configuring Database Crawl∗ Database data source information enables the search appliance to access content stored in a database.∗ To configure a database crawl, provide database data source information.∗ Crawl and Index > Databases page in the Admin Console.∗ After you create a new database data source, click the Sync link to start a database crawl.
  13. 13. Google Search Appliance – Databases
  14. 14. Collections∗ A collection lets you search over a specific part of the index.∗ For example, you may want to create a products collection or a faq collection that supports searches that are only within the products or faqs part of your index.∗ Maximum number of collections for a search appliance is 200.∗ Use the Crawl and Index > Collections - In the Collection Name text box, type a name for the new collection.∗ Manage collection by ∗ Editing a Collection ∗ Exporting and Importing a Collection Configuration ∗ Deleting a Collection
  15. 15. Google Search Appliance – Collections
  16. 16. Front Ends∗ A front end enables you to change the look and feel of the search and search result pages your users access.∗ You can customize these pages to display your organizations colors, fonts, and design. If you have multiple collections, you can make each front end appear in a different format, and have its own configuration options.∗ Use the Serving > Front Ends - In the Front End Name field, enter a name for the new front end.∗ Manage Front End by ∗ Editing a Front End ∗ Deleting a Front End
  17. 17. Google Search Appliance – Front Ends
  18. 18. Crawl Diagnostics∗ Crawl diagnostics provide detailed information about appliance crawl status for a domain, host, directory, or URL.
  19. 19. Google Search Appliance - Crawl Diagnostics
  20. 20. Google Search Appliance – Secret Recipe"The appliance uses a sophisticated algorithm to generate the results bla… bla ..."
  21. 21. Mura – Plugin∗ Deploy Mura Plugin
  22. 22. GSA Plugin - Search∗ Search Code
  23. 23. GSA Plugin - Results∗ Search results code
  24. 24. GSA Plugin – DEMO∗ DEMO
  25. 25. Google Search Appliance – Secret Recipe
  26. 26. Resources∗ http://docs.getmura.com/∗ http://www.getmura.com/marketplace/apps/fw1-plugin- template/∗ https://developers.google.com/search- appliance/documentation/614/∗ https://developers.google.com/search- appliance/documentation/614/xml_reference∗ http://www.robotstxt.org/meta.html∗ http://muracms.com/forum/
  27. 27. AcknowledgementsThanks to Oğuz Demirkapi for helping to prepare the presentation.
  28. 28. Q&A?

×