Accessing	  Your	  Library	  Book	          Collec5ons	  Using	  Solr	                    By: Engy Morsy     Software proj...
               BA	  &	  Solr	  5/14/12	        h(p://dar.bibalex.org	     2	  
h(p://bibalex.org	  5/14/12	           h(p://dar.bibalex.org	     3	  
h(p://wamcp.bibalex.org	  5/14/12	               h(p://dar.bibalex.org	     4	  
h(p://ssc.bibalex.org	  5/14/12	             h(p://dar.bibalex.org	     5	  
h(p://dar.bibalex.org	  5/14/12	             h(p://dar.bibalex.org	     6	  
Introductory	  Video	  5/14/12	             h(p://dar.bibalex.org	     7	  
Agenda	  •      Brief	  introducFon	  to	  DAR	  architecture	  •      Indexing	  books’	  collecFon	  •      Searching	  ...
About	  1.5	  Million	  books	  5/14/12	                  h(p://dar.bibalex.org	     9	  
Digital	  Assets	  Repository	  5/14/12	       h(p://dar.bibalex.org	     10	  
Digital	  Assets	  Repository	  5/14/12	               h(p://dar.bibalex.org	     11	  
Book	  site	  •      Approximately	  260,000	  books	  	  •      Nearly	  220,000	  	  books	  published	  online	  	  •  ...
What	  do	  we	  want…?	  •  Allow	  simple	  and	  advanced	  search	  across	     metadata	  and	  content	  in	  5	  la...
Simple	  Search	  5/14/12	          h(p://dar.bibalex.org	     14	  
What	  do	  we	  want…?	  •  Allow	  simple	  and	  advanced	  search	  across	       metadata	  and	  content	  in	  5	  ...
What	  do	  we	  want…?	  •  Allow	  simple	  and	  advanced	  search	  across	       metadata	  and	  content	  in	  5	  ...
Text	  Underlining	  
Text	  Highligh5ng	  
Adding	  S5cky	  Notes	  
What	  do	  we	  want…?	  •  Allow	  simple	  and	  advanced	  search	  across	       metadata	  and	  content	  in	  5	  ...
Arranging	  Books	  in	     Bookshelves	  
SubmiIng	  Comments	  
Ra5ng	  
Embedding	  
Sharing	  the	  book	  link	  in	  other	  social	  networks	  
What	  lies	  beneath!!	  5/14/12	              h(p://dar.bibalex.org	     31	  
Book	  site	  indices	                                                Query	                  AR	          EN	  	         ...
 	  	  	  	  	  	  	  	  	  	  	  	  Indexing	  Book	  CollecFon	  •  Index	  per	  language	  •  A	  Document	  in	  the	...
What	  is	  the	  problem	  with	  this	                                solu5on?	  5/14/12	                       h(p://da...
Problem	  for	  content	  search	  Example	  :	  Advanced	  Search	      	  search	  for	  	      	          	  Title:	  M...
Proposed	  soluFon	                          SolrType	  Title:	  Mobile	                                                  ...
The	  problem	  is…	  •  Can’t	  get	  the	  faceFng	  result	  directly	  from	  the	     content	  index	  •  Need	  to	...
SoluFon…!	  •  Metadata	  denormalizaFon	           –  Denormalize	  metadata	  into	  content	  index	  5/14/12	         ...
Proposed	  soluFon	                          SolrType	  Title:	  Mobile	                                                  ...
 Problem	  for	  content	  search	  •  Metadata	  denormalizaFon…..	  	                                 Worst	  choice!	  ...
New	  Solu5on	  5/14/12	         h(p://dar.bibalex.org	     41	  
Indexing	  Metadata 	  	  •  Index	  per	  language	  	  •  Separate	  content	  and	  metadata	  index	  •  	  Text	  fiel...
Back	  to	  the	  example	  Example	  :	  Advanced	  Search	      	  search	  for	  	      	          	  Title:	  Mobile	 ...
SoluFon	                Title:	  Mobile	                Technology	                                                       ...
soluFon	  Title:	  Mobile	  Technology	           Meta	                         index	                                    ...
               	  Separate	  indexes	  Vs.	  All	  in	  one	                                     	  •  Separate	  indexes	...
                 	  Separate	  indexes	  Vs.	  All	  in	  one	                                       	  •  Separate	  inde...
Book	  content	  index	                  AR	            EN	  	               FR	                IT	        SP	            ...
5/14/12	     h(p://dar.bibalex.org	     49	  
Searching	  •  Simple	  and	  	  advanced	  search	           –  Cache	  the	  resulted	  IDs	  only	  •  HighlighFng	  se...
Book	  Content	  Search	  •  Search	  using	           –  Search	  query	           –  Book	  ID	           –  List	  of	 ...
FaceFng	  •  Fixed	  facet	  fields	  	           –  Category,	  sub-­‐category,	  language..etc.	           –  Stored,	  i...
PersonalizaFon	  •  Using	  separate	  index	  of	  personalizaFon	  	           –  Different	  Solr	  fields	  for	  differe...
Future	  •  Book	  mobile	  applicaFon	  using	  Solr	  •  Using	  Hadoop	  	  •  Indexing	  other	  digital	  media	  (Ma...
Contact	  	                                   	                                   	                   engy.morsy	  @bibale...
5/14/12	     h(p://dar.bibalex.org	     56	  
Thank	  you…	  5/14/12	         h(p://dar.bibalex.org	     57	  
How to Access Your Library Book Collections Using Solr
How to Access Your Library Book Collections Using Solr
How to Access Your Library Book Collections Using Solr
How to Access Your Library Book Collections Using Solr
How to Access Your Library Book Collections Using Solr
Upcoming SlideShare
Loading in …5
×

How to Access Your Library Book Collections Using Solr

1,462 views

Published on

Presented by Engy Ali | The Library of Alexandria - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012

Do you have a large collection of text content that you want to search? Facing challenges on how to facet after performing a full text search across metadata and content? Do you want to use Solr with personalization? Bibliotheca Alexandrina provides public access to digitized book collections that exceed 220,000 books, through a web-based search and browsing facility. The facility is completely built on Solr in five different languages. The website provides full text morphological search within the books’ metadata and content with result highlighting. Different personalization features like annotation tools and tagging are also implemented using Solr. This presentation will cover how Bibliotheca Alexandrina uses Solr to implement full text indexing and searching across the entire collection, faceting, search within the content of a book and result highlighting and techniques used for personalization.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,462
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

How to Access Your Library Book Collections Using Solr

  1. 1. Accessing  Your  Library  Book   Collec5ons  Using  Solr   By: Engy Morsy Software project manager, Bibliotheca Alexandrina engy.morsy@bibalex.org5/14/12     h(p://dar.bibalex.org   1  
  2. 2.   BA  &  Solr  5/14/12   h(p://dar.bibalex.org   2  
  3. 3. h(p://bibalex.org  5/14/12   h(p://dar.bibalex.org   3  
  4. 4. h(p://wamcp.bibalex.org  5/14/12   h(p://dar.bibalex.org   4  
  5. 5. h(p://ssc.bibalex.org  5/14/12   h(p://dar.bibalex.org   5  
  6. 6. h(p://dar.bibalex.org  5/14/12   h(p://dar.bibalex.org   6  
  7. 7. Introductory  Video  5/14/12   h(p://dar.bibalex.org   7  
  8. 8. Agenda  •  Brief  introducFon  to  DAR  architecture  •  Indexing  books’  collecFon  •  Searching  across  Metadata  and  Content  •  FaceFng    •  Searching  Book  Content  •  Solr  with  personalizaFon  •  Future  •  Q&A  5/14/12   h(p://dar.bibalex.org   8  
  9. 9. About  1.5  Million  books  5/14/12   h(p://dar.bibalex.org   9  
  10. 10. Digital  Assets  Repository  5/14/12   h(p://dar.bibalex.org   10  
  11. 11. Digital  Assets  Repository  5/14/12   h(p://dar.bibalex.org   11  
  12. 12. Book  site  •  Approximately  260,000  books    •  Nearly  220,000    books  published  online    •  About  1.5  TB  of  content  •  Average  book  size  6  MB    •  Daily  indexing  rate  is  about  150  books.  5/14/12   h(p://dar.bibalex.org   12  
  13. 13. What  do  we  want…?  •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages  5/14/12   h(p://dar.bibalex.org   13  
  14. 14. Simple  Search  5/14/12   h(p://dar.bibalex.org   14  
  15. 15. What  do  we  want…?  •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages  •  FaceFng    5/14/12   h(p://dar.bibalex.org   15  
  16. 16. What  do  we  want…?  •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages  •  FaceFng  •  AnnotaFons    5/14/12   h(p://dar.bibalex.org   20  
  17. 17. Text  Underlining  
  18. 18. Text  Highligh5ng  
  19. 19. Adding  S5cky  Notes  
  20. 20. What  do  we  want…?  •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages  •  FaceFng  •  AnnotaFons  •  PersonalizaFon    5/14/12   h(p://dar.bibalex.org   25  
  21. 21. Arranging  Books  in   Bookshelves  
  22. 22. SubmiIng  Comments  
  23. 23. Ra5ng  
  24. 24. Embedding  
  25. 25. Sharing  the  book  link  in  other  social  networks  
  26. 26. What  lies  beneath!!  5/14/12   h(p://dar.bibalex.org   31  
  27. 27. Book  site  indices   Query   AR   EN     FR   IT   SP   Index   Index   Index   Index   Index  5/14/12   h(p://dar.bibalex.org   32  
  28. 28.                          Indexing  Book  CollecFon  •  Index  per  language  •  A  Document  in  the  content  index  correspond   to  a  page  in  a  book  •  Maintain  a  field  to  disFnguish  between   metadata  record  and  content  record  (e.g.   SolrType)  •  Use  staFc  fields  for  all  content  index  (e.g.   PageID..etc)  5/14/12   h(p://dar.bibalex.org   33  
  29. 29. What  is  the  problem  with  this   solu5on?  5/14/12   h(p://dar.bibalex.org   34  
  30. 30. Problem  for  content  search  Example  :  Advanced  Search    search  for        Title:  Mobile  Technology      And        Content  :  “cloud  compuFng”  5/14/12   h(p://dar.bibalex.org   35  
  31. 31. Proposed  soluFon   SolrType  Title:  Mobile   Result    Technology   ..  index   IDs    Meta   Get   Final   intersecFon   ..  index   result  Content  :   SolrType       Facet   Parent  Book  IDs  “cloud     ..  index   result  compuFng”   Content  5/14/12   h(p://dar.bibalex.org   36  
  32. 32. The  problem  is…  •  Can’t  get  the  faceFng  result  directly  from  the   content  index  •  Need  to  query  the  metadata  index  in  order  to   get  the  final  facet  result   processing  Fme!!!  5/14/12   h(p://dar.bibalex.org   37  
  33. 33. SoluFon…!  •  Metadata  denormalizaFon   –  Denormalize  metadata  into  content  index  5/14/12   h(p://dar.bibalex.org   38  
  34. 34. Proposed  soluFon   SolrType  Title:  Mobile   Result    Technology   ..  index   IDs    Meta   Get   Final   intersecFon   result  Content  :   SolrType  “cloud       Facet     ..  index   result  compuFng”   Content  5/14/12   h(p://dar.bibalex.org   39  
  35. 35.  Problem  for  content  search  •  Metadata  denormalizaFon…..     Worst  choice!     •  Re-­‐indexing  for  changes  in   metadata   •  Data  processing  is  required.    5/14/12   h(p://dar.bibalex.org   40  
  36. 36. New  Solu5on  5/14/12   h(p://dar.bibalex.org   41  
  37. 37. Indexing  Metadata    •  Index  per  language    •  Separate  content  and  metadata  index  •   Text  field  holds  the  whole  book  content  in   the  metadata  index   –  The  maxFieldLength  has  been  set  to  maximum.   •  e.g:  2147483647  5/14/12   h(p://dar.bibalex.org   42  
  38. 38. Back  to  the  example  Example  :  Advanced  Search    search  for        Title:  Mobile  Technology      And        Content  :  “cloud  compuFng”  5/14/12   h(p://dar.bibalex.org   43  
  39. 39. SoluFon   Title:  Mobile   Technology   Meta   Facet   index   result   Content  :   “cloud   compuFng”  5/14/12   h(p://dar.bibalex.org   44  
  40. 40. soluFon  Title:  Mobile  Technology   Meta   index   Get   Meta   Facet   intersecFon   index   result  Content  :  “cloud   Content  compuFng”   index  5/14/12   h(p://dar.bibalex.org   45  
  41. 41.    Separate  indexes  Vs.  All  in  one    •  Separate  indexes   +  Indexing  Fme   +  Index  size   -­‐  Processing  results  (facets..)   -­‐  Scoring  5/14/12   h(p://dar.bibalex.org   46  
  42. 42.    Separate  indexes  Vs.  All  in  one    •  Separate  indexes   +  Indexing  Fme   +  Index  size   -­‐  Processing  results  (facets..)   -­‐  Scoring  •  One  index   –  Index  size   –  Indexing  Fme   + Scoring   + Processing  Fme  5/14/12   h(p://dar.bibalex.org   47  
  43. 43. Book  content  index   AR   EN     FR   IT   SP   Index   Index   Index   Index   Index  5/14/12   h(p://dar.bibalex.org   48  
  44. 44. 5/14/12   h(p://dar.bibalex.org   49  
  45. 45. Searching  •  Simple  and    advanced  search   –  Cache  the  resulted  IDs  only  •  HighlighFng  search  result   –  Get  the  full  search  result  and  highlight  per  page   result      5/14/12   h(p://dar.bibalex.org   50  
  46. 46. Book  Content  Search  •  Search  using   –  Search  query   –  Book  ID   –  List  of  pages’  IDs  •  Highlights  •  AnnotaFons   –  Saved  currently  in  DB  5/14/12   h(p://dar.bibalex.org   51  
  47. 47. FaceFng  •  Fixed  facet  fields     –  Category,  sub-­‐category,  language..etc.   –  Stored,  indexed,  exact  fields  •  Process  facets  from  different  indices  5/14/12   h(p://dar.bibalex.org   52  
  48. 48. PersonalizaFon  •  Using  separate  index  of  personalizaFon     –  Different  Solr  fields  for  different  languages.   –  Search  across  all  fields.  •  Saving  in  both  Solr  and  DB  •  Indexing  tags,  raFng  and  comments  using  type   field    5/14/12   h(p://dar.bibalex.org   53  
  49. 49. Future  •  Book  mobile  applicaFon  using  Solr  •  Using  Hadoop    •  Indexing  other  digital  media  (Maps,  audio,   video)  5/14/12   h(p://dar.bibalex.org   54  
  50. 50. Contact         engy.morsy  @bibalex.org   Library  website:  h(p://bibalex.org   Digital  Asset  Repository:  h(p://dar.bibalex.org    5/14/12   h(p://dar.bibalex.org   55  
  51. 51. 5/14/12   h(p://dar.bibalex.org   56  
  52. 52. Thank  you…  5/14/12   h(p://dar.bibalex.org   57  

×