Morgan Floyd - Intuit's Live Community

1,825 views

Published on

TurboTax Live Community is a large scale web application that uses user contribution and open source technology to assist millions of TurboTax users complete their tax returns. Other benefits from Live Community include reducing support calls, highly effective advertising campaigns, usability engineering and new for this year conversion prediction analytics. I will present how Solr/Lucene powers the many facets of TurboTax Live Community now in the future.

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,825
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
20
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Morgan Floyd - Intuit's Live Community

  1. 1. Floyd MorganFloyd_Morgan@intuit.com @fmorgan Lucene Revolution, 2011
  2. 2. Agenda•  About Me•  About Live Community•  Live Community Search•  NLP•  Next Steps•  Questions? Answers?
  3. 3. About Me•  Principal Software Engineer at Intuit  
  4. 4. Intuit QuickBaseIntuit Inc. is a leading provider of business and financial management solutions for small and mid-sized businesses; financial institutions, including banks and credit unions; consumers and accounting professionals. More  than  200  applica0ons  and  7700  employees  worldwide.  
  5. 5. About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering  
  6. 6. TurboTax is the nation’s No. 1 rated, best-selling, do-it-yourself taxpreparation software. TurboTax helps more than 20 million people a year. $1 billion in revenue
  7. 7. About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering –  Core tax engine  
  8. 8. About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering –  Core tax engine –  TurboTax Online
  9. 9. About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering –  Core tax engine –  TurboTax Online –  TurboTax Live Community
  10. 10. About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering –  Core tax engine –  TurboTax Online –  TurboTax Live Community•  Central Technology Organization –  Live Community Platform
  11. 11. About Live Community•  It’s a user contribution system –  Q&A
  12. 12. About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance
  13. 13. About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance•  We use social, technology and data –  To create our value proposition…assisting users
  14. 14. About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance•  We use social, technology and data –  To create our value proposition…assisting users•  We launched our Beta in 2007 –  TurboTax Online Home & Business
  15. 15. About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance•  We use social, technology and data –  To create our value proposition…assisting users•  We launched our Beta in 2007 –  TurboTax Online Home & Business•  We use open source…primarily open source –  Apache HTTP, Ruby on Rails, MySQL, memcached ...
  16. 16. About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance•  We use social, technology and data –  To create our value proposition…assisting users•  We launched our Beta in 2007 –  TurboTax Online Home & Business•  We use open source…primarily open source –  Apache HTTP, Ruby on Rails, MySQL, memcached ...•  It’s a platform –  APIs, skinning, dynamic provisioning (AWS in progress)
  17. 17. Intuit Money Manager, India
  18. 18. QuickBooks Online, UK
  19. 19. devZone, Intuit dev
  20. 20. QuickBooks Online, US
  21. 21. TurboTax Desktop & Online, US
  22. 22. Terminology
  23. 23. Consumers (in the millions)
  24. 24. Contributors (in the thousands)
  25. 25. Top Contributors (in the hundreds)
  26. 26. Employees (contribute too)
  27. 27. Tax SeasonOfficially begins on December 1 and ends on April 15.
  28. 28. About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users
  29. 29. About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users•  Over 23 million users have used the service –  Over 8 million last tax season alone
  30. 30. About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users•  Over 23 million users have used the service –  Over 8 million last tax season alone•  Over 32 million pages views last tax season –  In-product views in the billions
  31. 31. About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users•  Over 23 million users have used the service –  Over 8 million last tax season alone•  Over 32 million pages views last tax season –  In-product views in the billions•  Over 750 thousand answered questions –  10 thousand questions asked on peak day
  32. 32. About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users•  Over 23 million users have used the service –  Over 8 million last tax season alone•  Over 32 million pages views last tax season –  In-product views in the billions•  Over 750 thousand answered questions –  10 thousand questions asked on peak day•  Our contributors answers thousands of questions –  Top contributor – 70 thousand answers
  33. 33. Demo
  34. 34. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  35. 35. Why Solr?•  Lots of features/functionality  
  36. 36. Why Solr?•  Lots of features/functionality•  Ease of integration  
  37. 37. Why Solr?•  Lots of features/functionality•  Ease of integration•  We can scale it independently  
  38. 38. Why Solr?•  Lots of features/functionality•  Ease of integration•  We can scale it independently•  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!  
  39. 39. Why Solr?•  Lots of features/functionality•  Ease of integration•  We can scale it independently•  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!•  Search is really important –  Search everywhere…  
  40. 40. Why Solr?•  Lots of features/functionality•  Ease of integration•  We can scale it independently•  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!•  Search is really important –  Search everywhere…  
  41. 41. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  42. 42. Auto suggest•  Provides a glimpse of our vast content
  43. 43. Auto suggest•  Provides a glimpse of our vast content•  facet query (Solr 1.2)
  44. 44. Auto suggest•  Provides a glimpse of our vast content•  facet query (Solr 1.2)•  We use NLP…
  45. 45. Auto suggest•  Provides a glimpse of our vast content•  facet query (Solr 1.2)•  We use NLP…•  It’s used on every search touch point
  46. 46. Auto suggest•  Provides a glimpse of our vast content•  facet query (Solr 1.2)•  We use NLP…•  It’s used on every search touch point•  Second most frequent request
  47. 47. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  48. 48. In-product “mini” search•  Primary search interface for consumers  
  49. 49. In-product “mini” search•  Primary search interface for consumers•  It appears integrated  
  50. 50. In-product “mini” search•  Primary search interface for consumers•  It appears integrated•  Now the most utilized search interface  
  51. 51. In-product “mini” search•  Primary search interface for consumers•  It appears integrated•  Now the most utilized search interface•  It makes all content available  
  52. 52. In-product “mini” search•  Primary search interface for consumers•  It appears integrated•  Now the most utilized search interface•  It makes all content available•  Over 3 million users last tax season  
  53. 53. # using Solr is easy!  require solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( "how do i input 1099”, :filter_queries => "post_status: # {Post::ANSWERED}" )
  54. 54. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  55. 55. Web-site “full” search•  Primary search interface for contributors and employees
  56. 56. Web-site “full” search•  Primary search interface for contributors and employees•  More real estate, more facets, more suggestions ...
  57. 57. Web-site “full” search•  Primary search interface for contributors and employees•  More real estate, more facets, more suggestions ...•  Faceted search empowers development teams to narrow on issues
  58. 58. Web-site “full” search•  Primary search interface for contributors and employees•  More real estate, more facets, more suggestions ...•  Faceted search empowers development teams to narrow on issues•  200+ TurboTax issues discovered last tax season
  59. 59. # using Solr is easy!  require solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( ”bug”, :filter_queries => "post_status: # {Post::OPEN}" )
  60. 60. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  61. 61. Instant answer•  Present similar answered question
  62. 62. Instant answer•  Present similar answered question•  Search with the terms of the new question
  63. 63. Instant answer•  Present similar answered question•  Search with the terms of the new question•  Narrow the focus to the subject
  64. 64. Instant answer•  Present similar answered question•  Search with the terms of the new question•  Narrow the focus to the subject•  Show snippet of a recommended answer
  65. 65. Instant answer•  Present similar answered question•  Search with the terms of the new question•  Narrow the focus to the subject•  Show snippet of a recommended answer•  Accidental A/B test
  66. 66. Demo
  67. 67. # using Solr is easy!  require solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( "how do i input 1099”, { :query_fields => "subject", :filter_queries => "post_status: #{Post::ANSWERED}" } )
  68. 68. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  69. 69. Instant question•  Present similar unanswered questions
  70. 70. Instant question•  Present similar unanswered questions•  Answer reuse
  71. 71. Instant question•  Present similar unanswered questions•  Answer reuse•  Search with the terms of the answered question
  72. 72. Instant question•  Present similar unanswered questions•  Answer reuse•  Search with the terms of the answered question•  Narrow the focus to the subject
  73. 73. Instant question•  Present similar unanswered questions•  Answer reuse•  Search with the terms of the answered question•  Narrow the focus to the subject•  We also use a date filter
  74. 74. “Aren’t  we  addicted   enough!”  
  75. 75. Demo
  76. 76. # using Solr is easy!  require solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) today = DateTime.now.at_beginning_of_day.utc.to_time date_from = 7.to_i.days.ago ( today ).getutc.iso8601 c.search( "how do i input 1099", { :query_fields => "subject", :filter_queries => "post_status: #{Post::OPEN} AND created_at_d:[#{date_from} TO *]" } )
  77. 77. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  78. 78. Answer bot•  We continue to search for you –  The day after you ask
  79. 79. Answer bot•  We continue to search for you –  The day after you ask•  Send an email
  80. 80. Answer bot•  We continue to search for you –  The day after you ask•  Send an email•  Runs for 7 days
  81. 81. Answer bot•  We continue to search for you –  The day after you ask•  Send an email•  Runs for 7 days•  We only send another email if the results have changed
  82. 82. Answer bot•  We continue to search for you –  The day after you ask•  Send an email•  Runs for 7 days•  We only send another email if the results have changed•  From our explicit feedback –  39% answered question
  83. 83. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  84. 84. Advertising•  We use our user generated content in advertising  
  85. 85. Advertising•  We use our user generated content in advertising•  Has 300% higher click through rate than static banner ads  
  86. 86. Advertising•  We use our user generated content in advertising•  Has 300% higher click through rate than static banner ads•  Ads displayed throughout the tax season on many ad networks  
  87. 87. Advertising•  We use our user generated content in advertising•  Has 300% higher click through rate than static banner ads•  Ads displayed throughout the tax season on many ad networks•  Content selection is automated and continuous  
  88. 88. Logs Logs LogsMapReduce Carrot2 SolrHeuristics
  89. 89. <?xml version="1.0" encoding="UTF-8"?> 
 <lc_trending end_date="2011-05-21" include_popular="true" type="queries" duration="day"> 
 <topic> 
 <rank>1</rank> 
 <text>Ptp</text> 
 <post> 
 <post_id>aBHMBWxzar4lKMacfArRo0</post_id> 
 <subject>Final K-1 Disposition of PTP Units</subject> 
 <detail>I bought units in a PTP in five separate transactions in 2008; I sold all my units in five separate transactions in 2010. TT does not allow me to report all 5 transactions while stepping through the K-1 form -- these transactions are reported on Schedule D, but also need to be on Form 4797, Part II, Box 10. I cant seem to make the linkage work. I would appreciate some guidance on how to make this happen.</detail> 
 <response>OK, several steps needed for your situation:
 1) on the K-1 on the screen entitled Describe the Partnership Disposal, choose "Disposition was not via a sale"
 2) Then search for the topic "sale of business property" - you will be taked to a topic entitled "Any Other Property Sales?" - select the first option. Ove rthe next few screens here you will have the opportunityut to enter the sale amounts associated witht he Form 4797.
 
 3) then choose the topic on the income landing table for "Stocke, Mutual Funds, Bonds, other - here you will enter the rest of the sale, that portion attributable to capital gains.
 
 Hope this helps you,
 </response> 
 <viewsCount>60</viewsCount> 
 <answersCount>2</answersCount> 
 <asker>Xuxan</asker> 
 <display_post_url>https://ttlc.intuit.com/post/show_full/aBHMBWxzar4lKMacfArRo0? rmode=ad</display_post_url> 
 </post> 
  
  90. 90. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  91. 91. Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!
  92. 92. Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!•  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s
  93. 93. Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!•  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s•  Search requests –  900 % increase
  94. 94. Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!•  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s•  Search requests –  900 % increase•  Questions asked –  50 % decrease…is that good?
  95. 95. Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!•  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s•  Search requests –  900 % increase•  Questions asked –  50 % decrease…is that good?•  Increased consumption –  38% users, 43% content…very good!
  96. 96. Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
  97. 97. Search clusterApp server Indexing server Database cluster
  98. 98. NLP•  Search is not enough…unfortunately
  99. 99. NLP•  Search is not enough…unfortunately•  Our domain is noisy…ugly at times
  100. 100. Uh, what?
  101. 101. Too much what!
  102. 102. ?
  103. 103. I wish NLP could help!
  104. 104. NLP•  Search is not enough…unfortunately•  Our domain is noisy…ugly at times•  How it works…
  105. 105. HwO do iput 10 99 i don,tknow what to do need help help me.
  106. 106. Where do I enter a 1099?
  107. 107. schema.xml<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
 <analyzer type="query">
 <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer> </fieldtype>
  
  108. 108. dictionary<?xml version="1.0" encoding="US-ASCII"?>
 <dictionary>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="suitcas">suitcase</entry>
 <entry score="10" root="form" synonym="none" domain="ttlc" id="2210"></entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="xrai">x-ray</ entry>
 <entry score="10" root="none" synonym="townhom" domain="ttlc" id="townhous">townhouse</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="grosssal">gross sale</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="trinidad">Trinidad</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="home"></entry>
 <entry score="10" root="none" synonym="know" domain="ttlc" id="knew"></entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="massachusett">Massachusetts</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="denver">Denver</entry>
 <entry score="5" root="none" synonym="none" domain="ttlc" id="instead"></ entry>
 <entry score="10" root="none" synonym="unallow" domain="ttlc" id="disallow">not allowed</entry>
 <entry score="5" root="none" synonym="see" domain="ttlc" id="saw"></entry>
 
  
  109. 109. regular expressions (many)if text =~ / any/ text.gsub!(/ any where /, anywhere )
 text.gsub!(/ any(body| body| one) /, anyone )
 text.gsub!(/ any( thing| things|things) /, anything )
 text.gsub!(/ any(one|thing|where) else /, any1 ’) end if text =~ / don / text.gsub!(/ don i /, do not i )
 text.gsub!(/ don (have|know|see|want) /, do not 1 )
 text.gsub!(/ (are|be|have|is|was|were) don /, 1 done ’) text.gsub!(/ don (not|nt|t) /, do not ’) end
 text.gsub!(/ (do|can) (ai|ii) /, 1 i ’) text.gsub!(/ d (oyou|you) /, do you )
 text.gsub!(/ (1|ai|ii|my) (did|do|had|have|was) /, i 2 ’) text.gsub!(/ crap{1,10} /, crap ’) text.gsub!(/ gr{1,} /, ) 
 

  110. 110. Spell Checker Stemmer (Porter) Word CollocationStop Phrase Correction Stop Word RemovalSynonyms SubstitutionTax Domain Correction Phrase Encoding
  111. 111. # NLP is not easy!  # this class wraps our NLP sf = SemanticFilter.new # does it work? sf.act_on_post( "HwO do iput 10 99 i don,t know what to do need help help me." ) =>[" wheretoent 1099 ”] sf.act_on_post( "Where do I enter a 1099?" ) =>[" wheretoent 1099 ”]  
  112. 112. NLP•  Search is not enough…unfortunately•  Our domain is noisy…ugly at times•  How it works…•  It works well, but it’s not perfect
  113. 113. “Stop guessing what I’m looking for!”
  114. 114. NLP•  Search is not enough…unfortunately•  Our domain is noisy…ugly at times•  How it works…•  It works well, but it’s not perfect•  Not just for search…
  115. 115. Recommendations•  Deliver unanswered questions to contributors
  116. 116. Recommendations•  Deliver unanswered questions to contributors•  Too much content to scan manually
  117. 117. Recommendations•  Deliver unanswered questions to contributors•  Too much content to scan manually•  Based on past answering behavior
  118. 118. Recommendations•  Deliver unanswered questions to contributors•  Too much content to scan manually•  Based on past answering behavior•  Recommend a question to multiple contributors
  119. 119. Recommendations•  Deliver unanswered questions to contributors•  Too much content to scan manually•  Based on past answering behavior•  Recommend a question to multiple contributors•  Uses Mahout machine learning library
  120. 120. Answered Unanswered NLP NLP User Post vectors vectors Mahout Heuristics
  121. 121. Next Steps•  We’re going to rewrite it!
  122. 122. Next Steps•  We’re going to rewrite it! … most of it ;)
  123. 123. Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing
  124. 124. Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing•  Question vs. Query
  125. 125. Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing•  Question vs. Query•  Social feedback – Page ranking
  126. 126. Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing•  Question vs. Query•  Social feedback – Page ranking•  Social dictionaries – Content classification
  127. 127. Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing•  Question vs. Query•  Social feedback – Page ranking•  Social dictionaries – Content classification•  Beer?!
  128. 128. Thank  you.    Floyd_Morgan@intuit.com   @fmorgan  
  129. 129. Appendix  •  User  search  •  SEO  

×