NetDocuments- Journey from FAST to Solr

1,216 views
1,049 views

Published on

Presented by David Hamson & Mou Nandi, NetDocuments - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012

NetDocuments, a SaaS document management company, is migrating their large document repository from Microsoft FAST to Solr. During this presentation, the speakers will discuss the the entire process, including major decision points and lessons learned. The migration is a two-phase implementation: The first being a short-cut of moving the FAST xml data directly to Solr to get a Solr meta-data index available quickly and the second phase implements the full architecture, including both meta-data and full text processing and search. The presenters will talk about architecting Solr to meet the company's requirements of scaling to billions of work-product documents, low indexing latency, and high availability. NetDocuments uses the search engine to build the user experience and also for document discovery by users. Solr was architected to scale and perform in order to address these two very different needs and also to match all the features and functionality available with FAST. Finally, the presenters will share the benchmark results from tests run on various hardware configurations and on different file systems, and also share results from search quality testing as the capabilities of Solr were tested on a single server, both single Solr core as well as multiple Solr cores.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,216
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NetDocuments- Journey from FAST to Solr

  1. 1. Journey from FAST to Solr Presented By :David Hamson , Mou Nandi
  2. 2. Goal of the Session•  NetDocuments  •  Why  move  to  Solr  from  FAST  •  Architec8ng  Solr  to  work  as  a  core  module  for  a  Cloud  Document   Management  product  user  interface  building  and  document   discovery  •  Tes8ng  and  benchmarking  Solr  to  scale  and  perform  for  billions  of   documents  with  200  QPS  and  200  DPS  •  Lessons  learned/  shortcuts  found  migra8ng  from  FAST  to  Solr   2/14
  3. 3. Who We AreA  Leading  cloud  content  management  and  collabora8on  service  for  small  to  medium  businesses  (SMB)  and  professional  services  firms   2/14
  4. 4. Who We ServeWe  service  over  1,000  customers  across  128  countries  worldwide  and  host  over  250+million  documents.     2/14
  5. 5. Why Migrate to Solr•  Product  roadmap  does  not  fit  with  company  roadmap  •  Large  hardware  footprint  ,  expensive  to  scale  •  High  indexing  latency    •  Unpredictable  and  untraceable  document  loss    •  A  black  box  search  engine,  dependency  on  MicrosoT  FAST  support  team    •  No  control  over  new  features  •  Expensive  license         •  Solr  supports  massive  index   •  Ac8ve  hardworking  development  community   •  Access  to  what’s  happening  under  the  hood   •  Improved  hardware  footprint     •  Reduced  licensing  cost     2/14
  6. 6. Migration to Solr FAST Instance 1 •  95  %  of  searches  are   metadata  search  -­‐  Metadata   FIXML Fast MDI + FTI index  does  not  need  rich  text   Indexer Fast Doc Processors processing     FAST Instance 2 •  Flexibility  to  implement   different  architecture  for   ND Document FIXML Fast Indexer MDI + FTI MDI  and  FTI   Fast Doc Processors •  Highest  level  of  logging  can   not  trace  the  document  loss   More FAST Instances during  a  heavy  feeding  traffic   2/14
  7. 7. Migration to Solr – Solr Indexing Solr MD Instance 1 Solr MDI MDI MD Solr MD XML Solr MD Instance 1 Solr MDI MDI NDDocument Solr FT Instance ND Pipeline Solr FTI FTI FT Solr FT XML Solr FT Instance Aspire Solr FTI FTI 2/14
  8. 8. The Migration Project •  Only create MDI Phase 1 - MDI •  Use FAST data to prototype Solr •  Use the fixmls to build the Solr index •  Use 100% filter queries Phase 2 – FTI •  Build a robust feeding pipeline to handle both MD FT •  Building a text processing pipeline Phase 3 •  Implement new Solr features 2/14
  9. 9. Some ft. view of NetDocuments Search Architecture Web Queue Solr MDI NDPipeline    -­‐     Administration ( monitoring, debugging, stats) MDH1 FTP1 D1 FT Processor pool MD Handler Pool Dispatcher queue Dispatcher pool MDH2 FTP2 D2 Query FT Queue Web App Web App MDH3 FTP3 D3 Distributor MDH4 FTP4 D4 MDH5 FTP5 D5 File Solr FTI System 2/14
  10. 10. Benchmarking Solr Config Parameter for indexing•  Created  Solr  index  from  fixmls  with  different  ram  buffer,  merge  factor   and  auto  commit  configura8on   Testing with HDD and SSD •  We  did  not  see  any  performance  difference  between  HDD  (  15k  rpm)  and   the  iodrive2  with  ND  documents   •  15  threads  running  at  a  8me  from  client  feeder  applica8on   2/14
  11. 11. Testing using different file system •  We  did  not  see  huge  performance  difference  between  ext3  and  xfs  on   HDD  or  SSD,  with  ND  Documents   •  We  chose  to  use  ext3  for  FTI    with  15K  HDD  on  RAID10     •  We  are  using  xfs  for  iodrive  for  MDI  as  suggested  by  fusion  Io   2/14
  12. 12. Benchmarking Solr Indexing and Query Process search  going  to  10   search  going  to  5  shards   shards   5  solr  meter  instances   10    Solr  meter  instances   Each  shard  serving    3000  queries  per  min   Each  shard  serving    1500  queries/min   Total  15000  queries/min   Total  15000  queries/min   Implemented  and  compared   mul8-­‐core  index  processing   avg  response  8me  8  ms   avg  response  8me  12  ms   and  query    performance   cpu  20  %   cpu  32  %   compared  to  single  core  index   ram  -­‐  52  G   ram  -­‐  53  G   cache  warmup  8me  2.5  S   cache  warmup  8me  2.7  S   cachehit  ra8o  .98   cachehit  ra8o  .98   cache  size  2276   cache  size  2276   no  evic8on   no  evic8on   index  updated  every  7  sec   index  updated  every  7  sec   test  ran  5  min   test  ran  8  min   2/14
  13. 13. Benchmark qtime increase as Solr scales and start row increases qTime does not vary much with start row increase. 6/14
  14. 14. Tuning System queries for Solr•  System  searches  are  metadata  searches  •  Thousands  of  real-­‐life  queries  were  extracted  from  FAST  query  log  •   Extensive  use  of  filter  queries  and  filter  cache  give  excellent  response  8me  for  complex   queries  •  Example  queries:  FAST  Query  :  ANDNOT(ANDNOT(ANDNOT(AND(AND(ndcabinets:string(“cab1",  mode="and"),ndcredate:range(2011-­‐09-­‐26T00:00:00,2012-­‐04-­‐13T23:59:59)),FILTER(ndacl:string(“acl1  acl2  acl3  ",mode="OR"))),nddeletedcabs:string(“cab1",  mode="and")),ndexten:string("ndws",  mode="and")),ndexten:string("ndflt",  mode="and"))    Solr  Query:  hlp://solrserver:port/solrSearch/core0/select?shards=solrserver:port/solrSearch/core0,1solrserver:port/solrSearch/core1&start=0&rows=500&fl=ndenvurl,nddocmodnum_s_std,nd8tle_t_idx_std&sort=ndlastmoddate_tdt_idx+desc&q=ndenvurl:*&fq=ndcabinets_smul8_idx:cab1&fq=ndcredate_tdt_idx:[2011-­‐09-­‐26T00:00:00Z  TO  2012-­‐04-­‐13T23:59:59Z]&fq={!cache=false  cost=100}(ndacl_smul8_idx:acl1  OR  ndacl_smul8_idx:acl2  OR  ndacl_smul8_idx:acl3)&fq=-­‐nddeletedcabs_smul8_idx:cab1&fq=-­‐ndexten_s_idx:ndws&fq=-­‐ndexten_s_idx:ndflt   2/14
  15. 15. THANK YOU

×