Pubcon New Orleans 2013 on-site search with Todd Keup


Published on

My good friend Ralf Schwoebel and I discuss the importance and relevancy of having your own on-site search utility in today's online culture. Using examples of Google, Amazon and Sphinx Search we teach you how to plan, implement and use on-site search to reach and retain customers as well as convert on your sales goals. Bonus materials inside the presentation! ;)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Good Morning! We want to thank Brett Tabke and his organization for all their hard work in putting a conference like this together. Each time we attend we find ourselves beneficiaries of the knowledge shared at this gathering. Thanks Brett, for the opportunity to not only be here, but to be here once again as speakers. We would also like to thank Ben Cook of Direct Match Media for volunteering to facilitate this session. But most of all thank you for being here today. We are honored by your presence and the privilege to share what we are able regarding on-site search. For those of you that are familiar with the WebmasterWorld web site and the forums at WebmasterWorld, Ralf is an active member there and goes by the nickname “pontifex”. Todd is also an active member and one of the moderators of the PHP Server Side Scripting Forum. He goes by the nickname “coopster”. We want you to know that we would absolutely love the opportunity to make your personal acquaintance today. We are approachable and friendly. Please don't hesitate to introduce yourself.
  • Todd: Let me introduce ourselves. Before I expose you to the German accent of my friend Ralf, who is founder and CEO of – a huge affiliate powered download marketplace - I will present you some Wisconsin accent flavored on-site-search tidbits… We are both running small and large sites ourselves and have to implement search solutions for our own sites or for our customers.
  • On-site search is not just the input field, which you might imagine right now. As always with complex technology: there is much more to it and we will try to give you a solid overview of WHAT is really in it and how can YOU benefit from it!
  • It really isn't just about an input field, trust us. What happens when you rely on another source for your on-site search, such as the Google Custom Search (GSE: )? Have you ever tried it? If so, has it ever failed to meet your expectations, your visitors expectations? Can you think of any benefits to having your own on-site search? How about: Performance/Latency. No need to ask another server. Controlled indexes. You custom index your own content. SE Referrer Queries. You can include search from multiple engines in your index. Customized experience. You control the output format! That last one is huge, mutating the landing page. More on that later. But if you answered with any of these responses, you are in the right session today. We are going to share what we can regarding on-site search. You don’t need to install an outside search engine’s technology, relying on their servers, bandwidth, internet connections, their logo … etc. People use a search engine to locate and get to your site, now let’s keep them here and use our own internal search technology to help our user locate the information on our site that they desire.
  • Basic search. Typos. Data sources? Let's start out with a search on Google. Why? Why not. They haven't become one of the leading search engines for nothing! We can learn from Google and apply the same to our own on-site search. Notice how typographical errors are handled. Auto-correction is being applied on-the-fly and the user experience is being maintained. We see what we have typed in so far, where Google thinks we have made an error and an alternative list of known resources. And it is stated that way. Plus a nice option asking if you *really* want to search for the misspelled word instead. And in the meantime the search page is actually being rendered so you have a visual. Ah, yes, I *DID* mean Elvis. Thank you Mr. Web Site Programmer. Wait a minute, stick with me … there's even more …
  • We see songs we can listen to and if we take that one more click we are presented with purchase options. If it were our own site, we would like to have the purchase ability right here though … hey wait a minute, that's what this presentation is all about! Also, have a look at the lower right of the display where you can see what "People also search for". Promote other products based on data that has been analyzed. How can we capture that type of information? The same way Google, Amazon and others that have it right are doing it, you analyze your own logs and start making decisions and attempts based on known criteria and tracking patterns. But I'm getting ahead of myself, Ralf is going to cover more of that shortly. Back to basics …
  • In order to have on-site search you need something to search! You have your content, yes, but is that enough? Where does the data reside? Is it in database tables? Text files? Static HTML? A combination of these resources? Plan how you will index your data. This is important. You might have a product database table, you might have recipes, you might have … the list goes on. Or all your site pages may be in a single database table. You can build a single index or multiple indexes. Or you may need to pull data in from multiple resource types as we just mentioned. How will you do so? Does the search technology you intend to implement allow you to do so? Does it have an indexer you can use, or will you write your own? I mean, somehow you have to get your data searched, right? What about your links? More than likely any of these resources is available on a unique url, a unique resource. And depending on how you build your index you may end up serving duplicate content. How? Well don't forget that your index builds the way you instruct it to do so. You could end up building your index using a different url than the original which offers the same exact page. Consider query strings, etc. especially for items like news or calendars, etc. How will you update your index? Manually? When will you update your index? Hourly? Monthly? How much of your data will you be updating in your index? All 500,000 pages at once? Does your search technology allow you to use a main and delta approach? How will it perform? Will it scale? If you are a MySQL user, are you still running FULLTEXT queries with MATCH AGAINST? If so, you need to talk to my friend Ralf after the presentation. He can offer you some enlightenment. Lastly, how will you search against your index? Can you use Perl, PHP, or some other type of API? A custom approach? Does the technology you intend to implement support your intent?
  • Your own search feature should be capturing data and you should be combining that with search engine referral data. Why? Find out what people are searching! Why was that search important? Use it! Use it for profiling behavior. Learn from Amazon! They do this well. Plus there are additional benefits, such as fraud protection. What?! Fraud protection? Yes. As part of our session today I had an inclination to interview Ralf with some on-site search questions. And I learned a neat little trick from him regarding fraud protection. He's going to share that today as well as quite a bit more on this profiling idea. For example, if I knew you visited my site in Safari, then should I present Windows software to you? Considerations.
  • Do you have multiple links to the same resource on any of your site pages? If so, maybe you don't need them both. Start using this information in your own design and search attack. Track and learn from this data. Also note that second bullet. We touched on this earlier. Free tools for tracking destinations. Follow the link here in the slide or search for "enhanced link attribution" for more information.
  • Sphinx is free! Open Source! Sphinx was created by Andrew Aksyonoff. The company, Sphinx Technologies is a privately held US company, created in late 2007 by Andrew Aksyonoff, creator and primary developer of Sphinx, and Peter Zaitsev, former head of High Performance group in MySQL AB, and a world-class expert in database technologies. Today, we are an office-less company with about 10 employees spanning across all the time zones, working online. A colleague, no a dear friend, introduced me to Sphinx in 2006 and I have been using it on client sites ever since. It is stable, fast, and open source. And you can configure and tweak it to your specific needs. And today I am presenting with that very same friend, Ralf Schwoebel. Ralf was dealing with "Big Data" long before the term was coined and before most of the world outside of search engines themselves were thinking about the idea. And guess what? What works and applies to big data users applies to small customers. Why not? Have you met an organization yet that doesn't want to be treated the same as "the big guys"? Exactly.
  • I did my first presentation on Sphinx search in 2010 and I showed how to implement it on your site, step-by-step – particularly for non-technical folks. A lot has changed since then but the same concepts still hold true and you can get on board! You can do this! Download and installation For Windows users, I recommend compiling with libstemmer. If you are downloading the binary be sure to select the binary which includes libstemmer. This additional library includes features that are going to come in handy. Open up the documentation and follow the installation steps for your platform. You may want to set up a separate folder for your configuration details as well as the data details (indexer files and logs). Configuration. Start with a basic configuration and use the test database provided. Once you are up and running and are more comfortable you can begin your own custom set up. For now, finish these steps using the test data and test configuration. Run the indexer. This step builds the index from your sample database table and your configuration, sphinx.conf. Test your installation from the command line. That's it. You will want to start the service daemon upon system boot.
  • No, we are not supporting Nike ID customized shoes. But I went out to their web site to use a similar slide from the past and their current Green Bay Packer colors are hideous  But I want you to remember customization , that's the key word here. Magnifisites is a custom software development corporation supporting and hosting multiple domains. We host sites that run Sphinx implementations but with separate indexers, data and logs and each site implements different search strategies. We have learned a lot from this type of service provision. For example, some sites have acronyms so we may want to set our minimum word limit low, say to 3 characters. Another site provides products and recipes to go along with those products. We have created separate indexes to maximize the performance and landing page manipulation for these products which in turn increases sales and additional item sales because the recipe calls for a certain product. All can be added to the cart at the same time and delivered to the customers door in a single click. The last time I checked Go ogle Custom Search (GSE) doesn't offer this feature  The point? If the shoe fits, wear it. If not, tweak it to meet your needs. Customizing your installation isn't quite as easy as customizing your shoe at However, whether you decide to customize it yourself or if you hire somebody else to do the modifications, there are a few simple steps you might follow to make the process more manageable. If you are not using version system software to commit and rollback updates, then at least do yourself a favor ... Make a backup copy of each document you intend to modify. Make a backup copy of any database table you intend to modify. Make your modifications on a development implementation first. Test, test, test. Plan the appropriate implementation date and time. If necessary, let your user base know about the planned system downtime or update implementation period. Perform the update. Test, test, test. Watch your logs for errors or issues.
  • Pubcon New Orleans 2013 on-site search with Todd Keup

    1. 1. @trabit@toddkeupMastering On-Site SearchTodd Keup, CEOMagnifisitesRalf Schwoebel, CEOTradebit.comNOLA 2013
    2. 2. @trabit@toddkeupMeet the geeksRalf Schwoebelwww.tradebit.comTodd
    3. 3. @trabit@toddkeupon-site search is notsimply an input field!
    4. 4. @trabit@toddkeupScope of search technology• Searching for content (the basics)• Related Products, tag clouds, etc.• Geo-IP sensitive content (& fraud control)• User behavioral targeting⇒ Your choice of technology potentiallyopens or hinders many options!
    5. 5. @trabit@toddkeupSearch: the basics
    6. 6. @trabit@toddkeupSearch: bonus material
    7. 7. @trabit@toddkeupSearch: getting startedContent is in database, text, html … mixed?Plan your index, build your index(es).Plan your links, no duplicate content!How will you update the index?How will it perform?Will it scale?How will you search it?
    8. 8. @trabit@toddkeupSearch: tricks of the tradeUse your own index logsandsearch engine referralsto gather dataPersonalEnvironmental
    9. 9. @trabit@toddkeupSearch: worth mentioningGoogle: Enhanced Link AttributionYou can tag your pages to implement an enhanced link-trackingfunctionality that lets you:• See separate information for multiple links on a page that allhave the same destination. For example, if there are two linkson the same page that both lead to the Contact Us page, thenyou see separate click information for each link.• See when one page element has multiple destinations. Forexample, a Search button on your page is likely to lead tomultiple destinations.• Track buttons, menus, and actions driven by JavaScript.Source:
    10. 10. @trabit@toddkeupIntroducing …Technology that might help you …
    11. 11. @trabit@toddkeupSphinxSphinx is an open source search server
    12. 12. @trabit@toddkeupHow to use Sphinx• Download and install• Configure• Index• Test• Implement
    13. 13. @trabit@toddkeupHow we are using Sphinx
    14. 14. @trabit@toddkeupBut theres more!Stay tuned for part 2....