Morgan Floyd - Intuit's Live Community
Upcoming SlideShare
Loading in...5
×
 

Morgan Floyd - Intuit's Live Community

on

  • 1,800 views

TurboTax Live Community is a large scale web application that uses user contribution and open source technology to assist millions of TurboTax users complete their tax returns. Other benefits from ...

TurboTax Live Community is a large scale web application that uses user contribution and open source technology to assist millions of TurboTax users complete their tax returns. Other benefits from Live Community include reducing support calls, highly effective advertising campaigns, usability engineering and new for this year conversion prediction analytics. I will present how Solr/Lucene powers the many facets of TurboTax Live Community now in the future.

Statistics

Views

Total Views
1,800
Views on SlideShare
1,800
Embed Views
0

Actions

Likes
2
Downloads
18
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Morgan Floyd - Intuit's Live Community Morgan Floyd - Intuit's Live Community Presentation Transcript

    • Floyd MorganFloyd_Morgan@intuit.com @fmorgan Lucene Revolution, 2011
    • Agenda•  About Me•  About Live Community•  Live Community Search•  NLP•  Next Steps•  Questions? Answers?
    • About Me•  Principal Software Engineer at Intuit  
    • Intuit QuickBaseIntuit Inc. is a leading provider of business and financial management solutions for small and mid-sized businesses; financial institutions, including banks and credit unions; consumers and accounting professionals. More  than  200  applica0ons  and  7700  employees  worldwide.  
    • About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering  
    • TurboTax is the nation’s No. 1 rated, best-selling, do-it-yourself taxpreparation software. TurboTax helps more than 20 million people a year. $1 billion in revenue
    • About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering –  Core tax engine  
    • About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering –  Core tax engine –  TurboTax Online
    • About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering –  Core tax engine –  TurboTax Online –  TurboTax Live Community
    • About Me•  Principal Software Engineer at Intuit•  TurboTax Engineering –  Core tax engine –  TurboTax Online –  TurboTax Live Community•  Central Technology Organization –  Live Community Platform
    • About Live Community•  It’s a user contribution system –  Q&A
    • About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance
    • About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance•  We use social, technology and data –  To create our value proposition…assisting users
    • About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance•  We use social, technology and data –  To create our value proposition…assisting users•  We launched our Beta in 2007 –  TurboTax Online Home & Business
    • About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance•  We use social, technology and data –  To create our value proposition…assisting users•  We launched our Beta in 2007 –  TurboTax Online Home & Business•  We use open source…primarily open source –  Apache HTTP, Ruby on Rails, MySQL, memcached ...
    • About Live Community•  It’s a user contribution system –  Q&A•  It can be integrated into an application, contextually –  Page-to-page relevance•  We use social, technology and data –  To create our value proposition…assisting users•  We launched our Beta in 2007 –  TurboTax Online Home & Business•  We use open source…primarily open source –  Apache HTTP, Ruby on Rails, MySQL, memcached ...•  It’s a platform –  APIs, skinning, dynamic provisioning (AWS in progress)
    • Intuit Money Manager, India
    • QuickBooks Online, UK
    • devZone, Intuit dev
    • QuickBooks Online, US
    • TurboTax Desktop & Online, US
    • Terminology
    • Consumers (in the millions)
    • Contributors (in the thousands)
    • Top Contributors (in the hundreds)
    • Employees (contribute too)
    • Tax SeasonOfficially begins on December 1 and ends on April 15.
    • About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users
    • About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users•  Over 23 million users have used the service –  Over 8 million last tax season alone
    • About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users•  Over 23 million users have used the service –  Over 8 million last tax season alone•  Over 32 million pages views last tax season –  In-product views in the billions
    • About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users•  Over 23 million users have used the service –  Over 8 million last tax season alone•  Over 32 million pages views last tax season –  In-product views in the billions•  Over 750 thousand answered questions –  10 thousand questions asked on peak day
    • About TurboTax Live Community•  Largest community –  150+ servers, 200 thousand concurrent users•  Over 23 million users have used the service –  Over 8 million last tax season alone•  Over 32 million pages views last tax season –  In-product views in the billions•  Over 750 thousand answered questions –  10 thousand questions asked on peak day•  Our contributors answers thousands of questions –  Top contributor – 70 thousand answers
    • Demo
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Why Solr?•  Lots of features/functionality  
    • Why Solr?•  Lots of features/functionality•  Ease of integration  
    • Why Solr?•  Lots of features/functionality•  Ease of integration•  We can scale it independently  
    • Why Solr?•  Lots of features/functionality•  Ease of integration•  We can scale it independently•  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!  
    • Why Solr?•  Lots of features/functionality•  Ease of integration•  We can scale it independently•  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!•  Search is really important –  Search everywhere…  
    • Why Solr?•  Lots of features/functionality•  Ease of integration•  We can scale it independently•  You’ll need some search expertise…that’s ok –  Community and Lucid Imagination!•  Search is really important –  Search everywhere…  
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Auto suggest•  Provides a glimpse of our vast content
    • Auto suggest•  Provides a glimpse of our vast content•  facet query (Solr 1.2)
    • Auto suggest•  Provides a glimpse of our vast content•  facet query (Solr 1.2)•  We use NLP…
    • Auto suggest•  Provides a glimpse of our vast content•  facet query (Solr 1.2)•  We use NLP…•  It’s used on every search touch point
    • Auto suggest•  Provides a glimpse of our vast content•  facet query (Solr 1.2)•  We use NLP…•  It’s used on every search touch point•  Second most frequent request
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • In-product “mini” search•  Primary search interface for consumers  
    • In-product “mini” search•  Primary search interface for consumers•  It appears integrated  
    • In-product “mini” search•  Primary search interface for consumers•  It appears integrated•  Now the most utilized search interface  
    • In-product “mini” search•  Primary search interface for consumers•  It appears integrated•  Now the most utilized search interface•  It makes all content available  
    • In-product “mini” search•  Primary search interface for consumers•  It appears integrated•  Now the most utilized search interface•  It makes all content available•  Over 3 million users last tax season  
    • # using Solr is easy!  require solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( "how do i input 1099”, :filter_queries => "post_status: # {Post::ANSWERED}" )
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Web-site “full” search•  Primary search interface for contributors and employees
    • Web-site “full” search•  Primary search interface for contributors and employees•  More real estate, more facets, more suggestions ...
    • Web-site “full” search•  Primary search interface for contributors and employees•  More real estate, more facets, more suggestions ...•  Faceted search empowers development teams to narrow on issues
    • Web-site “full” search•  Primary search interface for contributors and employees•  More real estate, more facets, more suggestions ...•  Faceted search empowers development teams to narrow on issues•  200+ TurboTax issues discovered last tax season
    • # using Solr is easy!  require solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( ”bug”, :filter_queries => "post_status: # {Post::OPEN}" )
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Instant answer•  Present similar answered question
    • Instant answer•  Present similar answered question•  Search with the terms of the new question
    • Instant answer•  Present similar answered question•  Search with the terms of the new question•  Narrow the focus to the subject
    • Instant answer•  Present similar answered question•  Search with the terms of the new question•  Narrow the focus to the subject•  Show snippet of a recommended answer
    • Instant answer•  Present similar answered question•  Search with the terms of the new question•  Narrow the focus to the subject•  Show snippet of a recommended answer•  Accidental A/B test
    • Demo
    • # using Solr is easy!  require solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) c.search( "how do i input 1099”, { :query_fields => "subject", :filter_queries => "post_status: #{Post::ANSWERED}" } )
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Instant question•  Present similar unanswered questions
    • Instant question•  Present similar unanswered questions•  Answer reuse
    • Instant question•  Present similar unanswered questions•  Answer reuse•  Search with the terms of the answered question
    • Instant question•  Present similar unanswered questions•  Answer reuse•  Search with the terms of the answered question•  Narrow the focus to the subject
    • Instant question•  Present similar unanswered questions•  Answer reuse•  Search with the terms of the answered question•  Narrow the focus to the subject•  We also use a date filter
    • “Aren’t  we  addicted   enough!”  
    • Demo
    • # using Solr is easy!  require solr’ c = Solr::Connection.new( "http://localhost:8090/solr/posts" ) today = DateTime.now.at_beginning_of_day.utc.to_time date_from = 7.to_i.days.ago ( today ).getutc.iso8601 c.search( "how do i input 1099", { :query_fields => "subject", :filter_queries => "post_status: #{Post::OPEN} AND created_at_d:[#{date_from} TO *]" } )
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Answer bot•  We continue to search for you –  The day after you ask
    • Answer bot•  We continue to search for you –  The day after you ask•  Send an email
    • Answer bot•  We continue to search for you –  The day after you ask•  Send an email•  Runs for 7 days
    • Answer bot•  We continue to search for you –  The day after you ask•  Send an email•  Runs for 7 days•  We only send another email if the results have changed
    • Answer bot•  We continue to search for you –  The day after you ask•  Send an email•  Runs for 7 days•  We only send another email if the results have changed•  From our explicit feedback –  39% answered question
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Advertising•  We use our user generated content in advertising  
    • Advertising•  We use our user generated content in advertising•  Has 300% higher click through rate than static banner ads  
    • Advertising•  We use our user generated content in advertising•  Has 300% higher click through rate than static banner ads•  Ads displayed throughout the tax season on many ad networks  
    • Advertising•  We use our user generated content in advertising•  Has 300% higher click through rate than static banner ads•  Ads displayed throughout the tax season on many ad networks•  Content selection is automated and continuous  
    • Logs Logs LogsMapReduce Carrot2 SolrHeuristics
    • <?xml version="1.0" encoding="UTF-8"?> 
 <lc_trending end_date="2011-05-21" include_popular="true" type="queries" duration="day"> 
 <topic> 
 <rank>1</rank> 
 <text>Ptp</text> 
 <post> 
 <post_id>aBHMBWxzar4lKMacfArRo0</post_id> 
 <subject>Final K-1 Disposition of PTP Units</subject> 
 <detail>I bought units in a PTP in five separate transactions in 2008; I sold all my units in five separate transactions in 2010. TT does not allow me to report all 5 transactions while stepping through the K-1 form -- these transactions are reported on Schedule D, but also need to be on Form 4797, Part II, Box 10. I cant seem to make the linkage work. I would appreciate some guidance on how to make this happen.</detail> 
 <response>OK, several steps needed for your situation:
 1) on the K-1 on the screen entitled Describe the Partnership Disposal, choose "Disposition was not via a sale"
 2) Then search for the topic "sale of business property" - you will be taked to a topic entitled "Any Other Property Sales?" - select the first option. Ove rthe next few screens here you will have the opportunityut to enter the sale amounts associated witht he Form 4797.
 
 3) then choose the topic on the income landing table for "Stocke, Mutual Funds, Bonds, other - here you will enter the rest of the sale, that portion attributable to capital gains.
 
 Hope this helps you,
 </response> 
 <viewsCount>60</viewsCount> 
 <answersCount>2</answersCount> 
 <asker>Xuxan</asker> 
 <display_post_url>https://ttlc.intuit.com/post/show_full/aBHMBWxzar4lKMacfArRo0? rmode=ad</display_post_url> 
 </post> 
  
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!
    • Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!•  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s
    • Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!•  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s•  Search requests –  900 % increase
    • Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!•  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s•  Search requests –  900 % increase•  Questions asked –  50 % decrease…is that good?
    • Search everywhere•  Search first, ask second –  Used to be ask first, search later or never!•  Auto complete everywhere too –  64 bit Linux, 10 (8 core) slaves, 300 req/s•  Search requests –  900 % increase•  Questions asked –  50 % decrease…is that good?•  Increased consumption –  38% users, 43% content…very good!
    • Live Community Search•  Why Solr?•  Auto suggest•  In-product search•  Web-site search•  Instant answer•  Instant question•  Answer bot•  Advertising•  Search everywhere•  Architecture
    • Search clusterApp server Indexing server Database cluster
    • NLP•  Search is not enough…unfortunately
    • NLP•  Search is not enough…unfortunately•  Our domain is noisy…ugly at times
    • Uh, what?
    • Too much what!
    • ?
    • I wish NLP could help!
    • NLP•  Search is not enough…unfortunately•  Our domain is noisy…ugly at times•  How it works…
    • HwO do iput 10 99 i don,tknow what to do need help help me.
    • Where do I enter a 1099?
    • schema.xml<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
 <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
 <analyzer type="query">
 <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
 <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
 <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
 <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer> </fieldtype>
  
    • dictionary<?xml version="1.0" encoding="US-ASCII"?>
 <dictionary>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="suitcas">suitcase</entry>
 <entry score="10" root="form" synonym="none" domain="ttlc" id="2210"></entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="xrai">x-ray</ entry>
 <entry score="10" root="none" synonym="townhom" domain="ttlc" id="townhous">townhouse</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="grosssal">gross sale</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="trinidad">Trinidad</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="home"></entry>
 <entry score="10" root="none" synonym="know" domain="ttlc" id="knew"></entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="massachusett">Massachusetts</entry>
 <entry score="10" root="none" synonym="none" domain="ttlc" id="denver">Denver</entry>
 <entry score="5" root="none" synonym="none" domain="ttlc" id="instead"></ entry>
 <entry score="10" root="none" synonym="unallow" domain="ttlc" id="disallow">not allowed</entry>
 <entry score="5" root="none" synonym="see" domain="ttlc" id="saw"></entry>
 
  
    • regular expressions (many)if text =~ / any/ text.gsub!(/ any where /, anywhere )
 text.gsub!(/ any(body| body| one) /, anyone )
 text.gsub!(/ any( thing| things|things) /, anything )
 text.gsub!(/ any(one|thing|where) else /, any1 ’) end if text =~ / don / text.gsub!(/ don i /, do not i )
 text.gsub!(/ don (have|know|see|want) /, do not 1 )
 text.gsub!(/ (are|be|have|is|was|were) don /, 1 done ’) text.gsub!(/ don (not|nt|t) /, do not ’) end
 text.gsub!(/ (do|can) (ai|ii) /, 1 i ’) text.gsub!(/ d (oyou|you) /, do you )
 text.gsub!(/ (1|ai|ii|my) (did|do|had|have|was) /, i 2 ’) text.gsub!(/ crap{1,10} /, crap ’) text.gsub!(/ gr{1,} /, ) 
 

    • Spell Checker Stemmer (Porter) Word CollocationStop Phrase Correction Stop Word RemovalSynonyms SubstitutionTax Domain Correction Phrase Encoding
    • # NLP is not easy!  # this class wraps our NLP sf = SemanticFilter.new # does it work? sf.act_on_post( "HwO do iput 10 99 i don,t know what to do need help help me." ) =>[" wheretoent 1099 ”] sf.act_on_post( "Where do I enter a 1099?" ) =>[" wheretoent 1099 ”]  
    • NLP•  Search is not enough…unfortunately•  Our domain is noisy…ugly at times•  How it works…•  It works well, but it’s not perfect
    • “Stop guessing what I’m looking for!”
    • NLP•  Search is not enough…unfortunately•  Our domain is noisy…ugly at times•  How it works…•  It works well, but it’s not perfect•  Not just for search…
    • Recommendations•  Deliver unanswered questions to contributors
    • Recommendations•  Deliver unanswered questions to contributors•  Too much content to scan manually
    • Recommendations•  Deliver unanswered questions to contributors•  Too much content to scan manually•  Based on past answering behavior
    • Recommendations•  Deliver unanswered questions to contributors•  Too much content to scan manually•  Based on past answering behavior•  Recommend a question to multiple contributors
    • Recommendations•  Deliver unanswered questions to contributors•  Too much content to scan manually•  Based on past answering behavior•  Recommend a question to multiple contributors•  Uses Mahout machine learning library
    • Answered Unanswered NLP NLP User Post vectors vectors Mahout Heuristics
    • Next Steps•  We’re going to rewrite it!
    • Next Steps•  We’re going to rewrite it! … most of it ;)
    • Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing
    • Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing•  Question vs. Query
    • Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing•  Question vs. Query•  Social feedback – Page ranking
    • Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing•  Question vs. Query•  Social feedback – Page ranking•  Social dictionaries – Content classification
    • Next Steps•  We’re going to rewrite it! … most of it ;)•  Real-time indexing•  Question vs. Query•  Social feedback – Page ranking•  Social dictionaries – Content classification•  Beer?!
    • Thank  you.    Floyd_Morgan@intuit.com   @fmorgan  
    • Appendix  •  User  search  •  SEO