•Welcome to 
              KMWorld Magazine
               Sponsored Event



© 2008‐2009         Lucid Imagination, Inc. ...
Moderator


                • Andy Moore
                • Publisher
                • KMWorld




  © 2008‐2009          ...
Open Source for Enterprise Search: 
       Breaking Down the 
     Barriers to Information
Speakers




  © 2008‐2009   Lucid Imagination, Inc.   4
Going into enterprise search with
your eyes open
Susan Feldman, Vice President
Search and Discovery Technologies
IDC

    ...
Outline

    Search defined

    The searching process

    Today’s search platforms

    Types of search applications...
Search: The Status Quo




                         Is luck enough?


© 2009 IDC
Uses of Search Today
    Intranet search                     Publishing applications
    Web search                    ...
Information Access Technology Map

                                                                                       ...
Characteristics

Language is the right vehicle for human interaction, but it is
imprecise.
    Fuzzy matching.

    Dial...
Today’s Search/Discovery Platform

       Disambiguate                                          Visualize


              ...
Today’s Search/Discovery Platform

       Disambiguate                                          Visualize


              ...
Today’s Search/Discovery Platform

     Disambiguate                                          Visualize


                ...
Today’s Search/Discovery Platform

       Disambiguate                                          Visualize


              ...
© 2009 IDC
Today’s Search/Discovery Platform

       Disambiguate                                          Visualize


              ...
SPSS
     Concepts and Categories




 © 2009 IDC
Types of Search Products

Analysis and                                                                                    ...
Types of Search Products

Analysis and                                                                           Volume
  ...
Features that first time buyers look for

Search features ranked by priority from our 2008 Survey
1. Relevance based searc...
Experienced search buyers differ

1. Relevance based search           But, after experience, add:
                        ...
Directions for NextGen Information Access

 Integration of multiple technologies required

 Integrated platforms for div...
Contact Information




                Susan Feldman
                VP, Search and Discovery Technologies
              ...
Ranga Muvavarirwa
Director 
Product Planning & Development
Comcast Interactive Media


Search for New Business Models:
Set...
Comcast Interactive Media
• Division of Comcast 
• Dedicated to 
  online/cross‐platform 
  entertainment and media 
  bus...
Fancast.com
Search: Business‐critical
Need: 
• Customizable 
• Scalable for volume:
  both traffic 
  and content         ...
Search Use Cases
• Comprehensive, relevant, up‐to‐date and authoritative
   – Movies, TV shows, clips, celebrities and oth...
Architecting for Scale
• Scaling metric: operationally 
  simple, scalable and stable.                       +
• Search mu...
Performance Test: Solr vs “X”
Open source Solr vs. leading commercial search vendor (Brand “X”)
Measured/compared  query r...
Do’s and Don’ts of Open Source
1. DON’T abandon structured analysis of Business 
   or Technical Requirements
     –   Ope...
Open Source: Risks & Mitigation
RISKS                                LUCENE/SOLR:
(1) SUFFICIENT COMMUNITY OF          Hi...
Tom Morton
Search Architect
Comcast Interactive Media




Improving Search with Solr/Lucene
How you can use this even 
if ...
Document Boost

• Allows pages to be 
  assigned inherent 
  results relevancy 
• Boost is computed 
  using related data
...
Indexing Related Content
• Allows related terms to match a query even if 
  terms don’t need to be surfaced on a page.
   ...
Type‐ahead
   A few small XML changes 
      turns on “type ahead” feature
<fieldtype name="ngramUntokenized" class="solr....
Generating Content from 
              Relationships
• Using relationships to generate descriptions of 
  search entities....
Generating Content from 
              Relationships
• Using relationships to generate descriptions of 
  search entities....
More Like This
• Using more‐like‐this 
  functionality to produce 
  recommendations.
  – Based on relationships: 
    mov...
Key Solr Search Strategies
• Metadata holds great value for both:
  – Improved Relevancy
     • Take a broad view of “cont...
Q & A


                •Question and Answer Session
                  •(please submit questions)




  © 2008‐2009       ...
Archive


      Please use the same URL you used to view today’s live event 
     for the archive event, plus we will be s...
Thank You
                   Thank you for participating in
                        today’s web event

    Just by attendi...
Thank you
Upcoming SlideShare
Loading in …5
×

Open Source for Enterprise Search: Breaking Down the Barriers to Information

1,424 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,424
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Open Source for Enterprise Search: Breaking Down the Barriers to Information

  1. 1. •Welcome to  KMWorld Magazine Sponsored Event © 2008‐2009 Lucid Imagination, Inc. 1
  2. 2. Moderator • Andy Moore • Publisher • KMWorld © 2008‐2009 Lucid Imagination, Inc. 2
  3. 3. Open Source for Enterprise Search:  Breaking Down the  Barriers to Information
  4. 4. Speakers © 2008‐2009 Lucid Imagination, Inc. 4
  5. 5. Going into enterprise search with your eyes open Susan Feldman, Vice President Search and Discovery Technologies IDC Webcast June 23, 2009, sponsored by Lucid Imagination Copyright 2009 IDC. Reproduction is forbidden unless authorized. All rights reserved.
  6. 6. Outline  Search defined  The searching process  Today’s search platforms  Types of search applications  Features to look for  What’s next?  Grand challenges © 2009 IDC
  7. 7. Search: The Status Quo Is luck enough? © 2009 IDC
  8. 8. Uses of Search Today  Intranet search  Publishing applications  Web search  Rich media search  Call centers  Web advertising platforms  Enterprise Applications like BI, ERP and CRM  Recommendation engines  eDiscovery and litigation  Reputation and opinion support applications monitoring applications  Compliance applications  Social media applications  Predictive analytics  Fraud detection applications  Product early warning applications  Border security applications  Ecommerce applications  Spam detection applications © 2009 IDC
  9. 9. Information Access Technology Map Conversational Decision support systems inference engines data plus content: Business Gov’t Intelligence -text analytics Voice of Customer Apps Intelligence Unified access -BI number & complexity of technologies -Reporting tools Ad Matching -data mining Reputation Competitive management Customer support Image search Intelligence Sentiment extraction Trend Analysis eDiscovery Fact/event extraction Brand Question management relationship extraction Search for ideas, answering Geo-tagging Geo-specific search not words Online concept extraction tech support Find people, places entity extraction and things Alerting multilingual support Tag data and Categorization and browsing content Rich media Phrase eCommerce part of speech tagging search identification Speech to text Keyword Retrieve Search and relevance ranking search Audio files Accuracy required © 2009 IDC
  10. 10. Characteristics Language is the right vehicle for human interaction, but it is imprecise.  Fuzzy matching.  Dialogue and interaction to define the information need  Disambiguation of text—context  Linguistic patterns are predictable and computable: – Syntax for context – Dictionaries for meaning, semantics  Relevance ranking to help manage large results sets  Ad hoc searching © 2009 IDC
  11. 11. Today’s Search/Discovery Platform Disambiguate Visualize Enrich Cluster Query Filter Search Engine Interface Document Categorize BI/Data Language Analysis Extract Apps © 2009 IDC
  12. 12. Today’s Search/Discovery Platform Disambiguate Visualize Enrich Cluster Query Filter Search Engine Interface Document Categorize BI/Data Language Apps Analysis Extract © 2009 IDC
  13. 13. Today’s Search/Discovery Platform Disambiguate Visualize Enrich Cluster Query Filter Search Engine Interface Document Categorize BI/Data Language Apps Analysis Extract © 2009 IDC
  14. 14. Today’s Search/Discovery Platform Disambiguate Visualize Enrich Cluster Query Filter Search Engine Interface Document Categorize BI/Data Language Apps Analysis Extract © 2009 IDC
  15. 15. © 2009 IDC
  16. 16. Today’s Search/Discovery Platform Disambiguate Visualize Enrich Cluster Query Filter Search Engine Interface Document Categorize BI/Data Language Apps Analysis Extract © 2009 IDC
  17. 17. SPSS Concepts and Categories © 2009 IDC
  18. 18. Types of Search Products Analysis and Volume reporting Customizable Integrated Platforms -Multiple Text analytics Sources Intelligence Multipurpose Intranet Call centers Languages Search -Multiple Navigation ecommerce Apps: Relevance BI, CRM, tuning ERP, Finance, Site Inventory, Security Email Search Voice of Customer eDiscovery UI Single Purpose Features Reputation Monitoring Integrated Desktop work Search environments Search-Based Out of the Box Applications Search Important Strategic © 2009 IDC
  19. 19. Types of Search Products Analysis and Volume Customizable Integrated Platforms & reporting -Multiple Text analytics ed Intranet & d Intelligence Sources dd uilSearch de uilt Multipurpose t Call centers edecommerce Languages be B -Multiple b mB Em m e Navigation Apps: Em sto BI, CRM, Ho Site Relevance Cu tuning ERP, Finance, Inventory, Security Email Search Voice of ed Customer ed UI dd eDiscovery dd Single Purpose Monitoring be Features Reputation be Integrated m Em Desktop work SearchE Search-Based environments Out of the Box Applications Search Important Strategic © 2009 IDC
  20. 20. Features that first time buyers look for Search features ranked by priority from our 2008 Survey 1. Relevance based search 2. Browsing and navigation (categorization) 3. Taxonomies/ontologies 4. Parametric search 5. Concept search 6. Auto tagging 7. Visualization by clustering Source: IDC 2008 © 2009 IDC
  21. 21. Experienced search buyers differ 1. Relevance based search But, after experience, add: •Customer service 2. Browsing and navigation (categorization) •Ease of implementation, •Unified access, 3. Taxonomies/ontologies •Usability, 4. Parametric search •Auto tagging, 5. Concept search •Better search features like 6. Auto tagging stemming and best bets, •Security 7. Visualization by clustering •Entity extraction •Rights management Source: IDC 2008 © 2009 IDC
  22. 22. Directions for NextGen Information Access  Integration of multiple technologies required  Integrated platforms for diverse, multiple information access requirements  Search-based apps to address specialized workflows and tasks like eDiscovery  Web scale processing  Rich media and social media add new challenges for search  Mobile search applications will explode © 2009 IDC
  23. 23. Contact Information Susan Feldman VP, Search and Discovery Technologies sfeldman@idc.com © 2009 IDC
  24. 24. Ranga Muvavarirwa Director  Product Planning & Development Comcast Interactive Media Search for New Business Models: Setting the requirements  and choosing the technology
  25. 25. Comcast Interactive Media • Division of Comcast  • Dedicated to  online/cross‐platform  entertainment and media  businesses • Develop and grow Internet businesses with  compelling technology  and product innovations • Targeting broadband  users, customers and  non‐subscribers alike
  26. 26. Fancast.com Search: Business‐critical Need:  • Customizable  • Scalable for volume: both traffic  and content  5‐6 million  unique monthly users   4 million+ records 200,000+ assets • Economics:  • 9K+ hours online video New business model,  • 55K+ videos • 10K+ full‐length shows sensitive to fixed  • ~150K other assets and operating costs (photos, tidbits, etc.)  100+ content providers
  27. 27. Search Use Cases • Comprehensive, relevant, up‐to‐date and authoritative – Movies, TV shows, clips, celebrities and other media info • Seamless merge of multiple, heterogenous sources – Metadata each with own  format, content refresh timing ?simpson – Spider‐Man vs. “spiderman” • Must Have:  – Accurate results in the  mind of the user Jessica = Homer /
  28. 28. Architecting for Scale • Scaling metric: operationally  simple, scalable and stable.  + • Search must be as fast as anything else on the site <20ms at peak per server instance.  + • Data‐center operations gets simple  rules for sizing “x” users        “y” application servers • Provided linear scalability with  traffic growth from 50K to ~1M  peak uniques/day over 16 months Users now visit site >1x week on average
  29. 29. Performance Test: Solr vs “X” Open source Solr vs. leading commercial search vendor (Brand “X”) Measured/compared  query response rates  At multiple load levels  Range: 100 to 1500 requests/second TEST BED  Tested stress failure points • Avalanche load generators   Peak queries per second  • multiple instances of Sun x64  multi‐core 1~2RU servers   Failure characteristics • Red Hat Linux Vendor “X” & Lucid Imagination (Solr) INDEXES   Tuned test bed • 2 million documents  Validated results • 4 million documents Solr meaningfully outperformed “X” • Each deployed on each  of  the competing servers   Response Rates  (Solr vs. “X”)  Failure‐handling characteristics
  30. 30. Do’s and Don’ts of Open Source 1. DON’T abandon structured analysis of Business  or Technical Requirements – Open source must still fit business needs – DO a bake‐off to drive your decision 2. DO ensure a fit between development culture  and business objectives: – Do you Integrate or Develop?  – Do you develop or innovate based on your data? – Do you have a Source for expertise you lack? 
  31. 31. Open Source: Risks & Mitigation RISKS LUCENE/SOLR: (1) SUFFICIENT COMMUNITY OF  Highly active community; ad‐hoc  DEVELOPERS?  support and answers online community  SLA based support from Lucid  Imagination (2) COMMERCIAL ORGANIZATIONS “BET  Similar businesses: CNET, Netflix;  THE COMPANY” ON THIS?  Dissimilar businesses: MySpace, Orbitz (3) ALIGNMENT WITH INTERNAL  Premium at CIM on software  RESOURCES? ENABLES PRODUCT  engineering talent  DEVELOPMENT AGILITY? Flexibility to support innovation without  steep learning curves Mutually reinforcing benefits of product  development culture and highly  engaged human capital
  32. 32. Tom Morton Search Architect Comcast Interactive Media Improving Search with Solr/Lucene How you can use this even  if you’re not in the entertainment business
  33. 33. Document Boost • Allows pages to be  assigned inherent  results relevancy  • Boost is computed  using related data  e.g., box office  receipts, recency • Boost value set when indexing. • Similar concept to PageRank, but set based on  business rules, not just popularity 
  34. 34. Indexing Related Content • Allows related terms to match a query even if  terms don’t need to be surfaced on a page. – Add fields and weights to XML. <str name="qf"> nameExact^6.5 name^2.0 alias^1.1 related^0.5  description^0.1 </str> – Similar to how web‐search indexes link terms.
  35. 35. Type‐ahead A few small XML changes  turns on “type ahead” feature <fieldtype name="ngramUntokenized" class="solr.TextField"  positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory"/> <filterclass="solr.EdgeNGramFilterFactory"minGramSize="2” maxGramSize="20"/> </analyzer>    <analyzer type="query"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory"/> </analyzer> </fieldtype>
  36. 36. Generating Content from  Relationships • Using relationships to generate descriptions of  search entities. – Allows description results to be displayed even if  data is unavailable.
  37. 37. Generating Content from  Relationships • Using relationships to generate descriptions of  search entities. – Allows description results to be displayed even if  data is unavailable.
  38. 38. More Like This • Using more‐like‐this  functionality to produce  recommendations. – Based on relationships:  movie, TV series, actor,  and tag – Specify fields to use and  weights in XML.
  39. 39. Key Solr Search Strategies • Metadata holds great value for both: – Improved Relevancy • Take a broad view of “content”, not just text – Better Search Experience • Search is only as good as the users think it is • Solr/Lucene can accomplish much of this  with just a dab of XML – Little real programming required
  40. 40. Q & A •Question and Answer Session •(please submit questions) © 2008‐2009 Lucid Imagination, Inc. 40
  41. 41. Archive Please use the same URL you used to view today’s live event  for the archive event, plus we will be sending you a follow‐up  email with that URL once the archive is posted! © 2008‐2009 Lucid Imagination, Inc. 41
  42. 42. Thank You Thank you for participating in today’s web event Just by attending this event you could win this TomTom GPS car navigation system Winner to be announced June 30th © 2008‐2009 Lucid Imagination, Inc. 42
  43. 43. Thank you

×