The	     Power	            Tareque	  Hossain	            Sr.	  Software	  Engineer	                                      	  
What	  about	  it?	  •  We	  always	  associate	  solr	  with	  searching	  •  solr	  can	  also	  serve	  as	  your	  non...
NoSQL	  ?	  solr	  ?	  
Why	  solr?	  •  Hey	  solr	  is	  already	  part	  of	  my	  stack	  •  I	  love	  solr	  •  It’s	  fast,	  scalable	  an...
When	  would	  you	  consider	  it?	  •  You	  have	  a	  DB	  that’s	  frequently	  read	  and	     infrequently	  writte...
What’s	  not	  so	  cool?	  •  Doesn’t	  support	  transactions	  •  Not	  all	  SQL	  queries	  can	  be	  translated	  i...
But..	  •  You	  don’t	  have	  to	  give	  up	  your	  relational	     data	  layer	  •  Create	  a	  non-­‐relational	  ...
So	  what’s	  the	  use	  case?	  •  We	  deal	  with	  medical	  survey	  data	  •  Say:	      –  About	  300	  multiple	...
What	  a	  survey	  question	  looks	  like	   When	  were	  you	  diagnosed	  with	  the	  following	  types	  of	   Arth...
Storing	  a	  single	  response	   When	  were	  you	  diagnosed	  with	  the	  following	  types	  of	   Arthri5s?	      ...
Aggregating	  over	  2000	  responses	   When	  were	  you	  diagnosed	  with	  the	  following	  types	  of	   Arthri5s?	...
The	  Document	  Structure	  •  Each	  survey	  response	  =	  solr	  document	  •  Up	  to	  3000	  boolean	  variables	 ...
Querying	  •  Filter	  by	  age,	  interest,	  profession	  •  Facet	  across	  boolean	  field	  •  Result:	  what	  group...
Why	  solr	  is	  awesome..	  •  Faceting	  across	  boolean	  field	  uses	  very	  little	       memory	  •  Combining	  ...
Good	  to	  know..	  •  sunburnt:	  Awesome	  python	  solr	  interface	     	   	   	   	  github.com/tow/sunburnt	  •  P...
Questions?	  •  wisertogether.com	  •  slideshare.net/tarequeh/the-­‐solr-­‐power	  •  @tarequeh	  	  
The solr power
Upcoming SlideShare
Loading in …5
×

The solr power

1,205 views

Published on

Motivation for using solr as a NoSQL backend

Published in: Technology, Health & Medicine
2 Comments
1 Like
Statistics
Notes
No Downloads
Views
Total views
1,205
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
4
Comments
2
Likes
1
Embeds 0
No embeds

No notes for slide
  • Good afternoon everyone! Welcome to my lightning talk: The Solr Power. My name is Tareque and I work for a small health industry startup named wisertogether. As you have noticed from this corny title, my talk is about solr.
  • This could be turned into a most interesting man joke.
  • As you might have already guessed I’m talking about using solr as a NoSQL backend. This approach is not novel in anyway. But I wanted to discuss the use case that brought it about. First of all… NoSQL.
  • We got to a point where retrieving data from a SQL layer just wasn’t an option. The arrow came in form of performance hit from querying a complex relational model.
  • Well why not? Now on to more specific reasons for using solr as a NoSQL backend.
  • I emphasize on the word infrequently.
  • So there are a lot of answer options
  • What were you diagnosed with previously and what you got diagnosed with recently.
  • When you start combining all the survey responses, you start getting some really useful information because it exposes common trends, idiosyncrasies etc. We use these numbers to generate pretty graphs
  • Solr stores everything in the form of a document
  • We used sunburnt to interface with solr. If you only need the facets, no reason to retrieve the documents unless necessary and you can save a lot of memory
  • The solr power

    1. 1. The   Power   Tareque  Hossain   Sr.  Software  Engineer    
    2. 2. What  about  it?  •  We  always  associate  solr  with  searching  •  solr  can  also  serve  as  your  non-­‐relational   data  layer  
    3. 3. NoSQL  ?  solr  ?  
    4. 4. Why  solr?  •  Hey  solr  is  already  part  of  my  stack  •  I  love  solr  •  It’s  fast,  scalable  and  there  are  some  great   python              interfaces  out  there  
    5. 5. When  would  you  consider  it?  •  You  have  a  DB  that’s  frequently  read  and   infrequently  written  •  You  want  robust  search  &  filtering  on  your   data  •  You  want  to  leverage  the  faceting  feature  •  You  want  a  decently  scalable  data  layer  
    6. 6. What’s  not  so  cool?  •  Doesn’t  support  transactions  •  Not  all  SQL  queries  can  be  translated  into   solr  queries  •  Generating  indices  can  take  a  long  time  •  Searching  and  indexing  at  the  same  time   brings  down  performance  
    7. 7. But..  •  You  don’t  have  to  give  up  your  relational   data  layer  •  Create  a  non-­‐relational  layer  on  top  of  your   relational  data  layer  •  Get  best  of  the  both  worlds  
    8. 8. So  what’s  the  use  case?  •  We  deal  with  medical  survey  data  •  Say:   –  About  300  multiple  choice  questions   –  Responses  can  be  multi-­‐dimensional   –  7000+  different  answer  choices  per  question   –  2000+  respondents  per  survey   –  15+  surveys  and  growing  
    9. 9. What  a  survey  question  looks  like   When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   Rheumatoid   Traumatic   Psoriatic   Osteoarthritis   Other   Arthritis   Arthritis   Arthritis  Less  than  a   þ   ☐   ☐   ☐   ☐   year  ago  More  than  a   ☐   ☐   þ   ☐   ☐   year  ago  
    10. 10. Storing  a  single  response   When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   Rheumatoid   Traumatic   Psoriatic   Osteoarthritis   Other   Arthritis   Arthritis   Arthritis  Less  than  a   1   0   0   0   0   year  ago  More  than  a   0   0   1   0   0   year  ago  
    11. 11. Aggregating  over  2000  responses   When  were  you  diagnosed  with  the  following  types  of   Arthri5s?   Rheumatoid   Traumatic   Psoriatic   Osteoarthritis   Other   Arthritis   Arthritis   Arthritis  Less  than  a   63   155   19   27   268   year  ago  More  than  a   190   46   8   213   325   year  ago  
    12. 12. The  Document  Structure  •  Each  survey  response  =  solr  document  •  Up  to  3000  boolean  variables  per  document   indicating  chosen  answers  •  Added  meta  information:  age,  profession,   interests  
    13. 13. Querying  •  Filter  by  age,  interest,  profession  •  Facet  across  boolean  field  •  Result:  what  group  of  people  chose  what   group  of  answers    
    14. 14. Why  solr  is  awesome..  •  Faceting  across  boolean  field  uses  very  little   memory  •  Combining  3000  fields  for  2000  documents   takes  1  ~  2  ms  •  Allowed  us  to  reduce  API  response  time   from  a  variable  of  2  ~  15  seconds  (sucked!)  to   an  almost  constant  ~50  ms    
    15. 15. Good  to  know..  •  sunburnt:  Awesome  python  solr  interface          github.com/tow/sunburnt  •  Programmatic  querying  as  well  as  raw   queries  •  Supports  most  advanced  solr  options  •  If  you  only  required  facets,  specify  rows=0  
    16. 16. Questions?  •  wisertogether.com  •  slideshare.net/tarequeh/the-­‐solr-­‐power  •  @tarequeh    

    ×