• Like
How to Access Your Library Book Collections Using Solr
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

How to Access Your Library Book Collections Using Solr

  • 695 views
Published

Presented by Engy Ali | The Library of Alexandria See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 …

Presented by Engy Ali | The Library of Alexandria See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012

Do you have a large collection of text content that you want to search? Facing challenges on how to facet after performing a full text search across metadata and content? Do you want to use Solr with personalization? Bibliotheca Alexandrina provides public access to digitized book collections that exceed 220,000 books, through a web-based search and browsing facility. The facility is completely built on Solr in five different languages. The website provides full text morphological search within the books’ metadata and content with result highlighting. Different personalization features like annotation tools and tagging are also implemented using Solr. This presentation will cover how Bibliotheca Alexandrina uses Solr to implement full text indexing and searching across the entire collection, faceting, search within the content of a book and result highlighting and techniques used for personalization.

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
695
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Accessing  Your  Library  Book   Collec5ons  Using  Solr   By: Engy Morsy Software project manager, Bibliotheca Alexandrina engy.morsy@bibalex.org5/14/12     h(p://dar.bibalex.org   1  
  • 2.   BA  &  Solr  5/14/12   h(p://dar.bibalex.org   2  
  • 3. h(p://bibalex.org  5/14/12   h(p://dar.bibalex.org   3  
  • 4. h(p://wamcp.bibalex.org  5/14/12   h(p://dar.bibalex.org   4  
  • 5. h(p://ssc.bibalex.org  5/14/12   h(p://dar.bibalex.org   5  
  • 6. h(p://dar.bibalex.org  5/14/12   h(p://dar.bibalex.org   6  
  • 7. Introductory  Video  5/14/12   h(p://dar.bibalex.org   7  
  • 8. Agenda  •  Brief  introducFon  to  DAR  architecture  •  Indexing  books’  collecFon  •  Searching  across  Metadata  and  Content  •  FaceFng    •  Searching  Book  Content  •  Solr  with  personalizaFon  •  Future  •  Q&A  5/14/12   h(p://dar.bibalex.org   8  
  • 9. About  1.5  Million  books  5/14/12   h(p://dar.bibalex.org   9  
  • 10. Digital  Assets  Repository  5/14/12   h(p://dar.bibalex.org   10  
  • 11. Digital  Assets  Repository  5/14/12   h(p://dar.bibalex.org   11  
  • 12. Book  site  •  Approximately  260,000  books    •  Nearly  220,000    books  published  online    •  About  1.5  TB  of  content  •  Average  book  size  6  MB    •  Daily  indexing  rate  is  about  150  books.  5/14/12   h(p://dar.bibalex.org   12  
  • 13. What  do  we  want…?  •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages  5/14/12   h(p://dar.bibalex.org   13  
  • 14. Simple  Search  5/14/12   h(p://dar.bibalex.org   14  
  • 15. What  do  we  want…?  •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages  •  FaceFng    5/14/12   h(p://dar.bibalex.org   15  
  • 16. What  do  we  want…?  •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages  •  FaceFng  •  AnnotaFons    5/14/12   h(p://dar.bibalex.org   20  
  • 17. Text  Underlining  
  • 18. Text  Highligh5ng  
  • 19. Adding  S5cky  Notes  
  • 20. What  do  we  want…?  •  Allow  simple  and  advanced  search  across   metadata  and  content  in  5  languages  •  FaceFng  •  AnnotaFons  •  PersonalizaFon    5/14/12   h(p://dar.bibalex.org   25  
  • 21. Arranging  Books  in   Bookshelves  
  • 22. SubmiIng  Comments  
  • 23. Ra5ng  
  • 24. Embedding  
  • 25. Sharing  the  book  link  in  other  social  networks  
  • 26. What  lies  beneath!!  5/14/12   h(p://dar.bibalex.org   31  
  • 27. Book  site  indices   Query   AR   EN     FR   IT   SP   Index   Index   Index   Index   Index  5/14/12   h(p://dar.bibalex.org   32  
  • 28.                          Indexing  Book  CollecFon  •  Index  per  language  •  A  Document  in  the  content  index  correspond   to  a  page  in  a  book  •  Maintain  a  field  to  disFnguish  between   metadata  record  and  content  record  (e.g.   SolrType)  •  Use  staFc  fields  for  all  content  index  (e.g.   PageID..etc)  5/14/12   h(p://dar.bibalex.org   33  
  • 29. What  is  the  problem  with  this   solu5on?  5/14/12   h(p://dar.bibalex.org   34  
  • 30. Problem  for  content  search  Example  :  Advanced  Search    search  for        Title:  Mobile  Technology      And        Content  :  “cloud  compuFng”  5/14/12   h(p://dar.bibalex.org   35  
  • 31. Proposed  soluFon   SolrType  Title:  Mobile   Result    Technology   ..  index   IDs    Meta   Get   Final   intersecFon   ..  index   result  Content  :   SolrType       Facet   Parent  Book  IDs  “cloud     ..  index   result  compuFng”   Content  5/14/12   h(p://dar.bibalex.org   36  
  • 32. The  problem  is…  •  Can’t  get  the  faceFng  result  directly  from  the   content  index  •  Need  to  query  the  metadata  index  in  order  to   get  the  final  facet  result   processing  Fme!!!  5/14/12   h(p://dar.bibalex.org   37  
  • 33. SoluFon…!  •  Metadata  denormalizaFon   –  Denormalize  metadata  into  content  index  5/14/12   h(p://dar.bibalex.org   38  
  • 34. Proposed  soluFon   SolrType  Title:  Mobile   Result    Technology   ..  index   IDs    Meta   Get   Final   intersecFon   result  Content  :   SolrType  “cloud       Facet     ..  index   result  compuFng”   Content  5/14/12   h(p://dar.bibalex.org   39  
  • 35.  Problem  for  content  search  •  Metadata  denormalizaFon…..     Worst  choice!     •  Re-­‐indexing  for  changes  in   metadata   •  Data  processing  is  required.    5/14/12   h(p://dar.bibalex.org   40  
  • 36. New  Solu5on  5/14/12   h(p://dar.bibalex.org   41  
  • 37. Indexing  Metadata    •  Index  per  language    •  Separate  content  and  metadata  index  •   Text  field  holds  the  whole  book  content  in   the  metadata  index   –  The  maxFieldLength  has  been  set  to  maximum.   •  e.g:  2147483647  5/14/12   h(p://dar.bibalex.org   42  
  • 38. Back  to  the  example  Example  :  Advanced  Search    search  for        Title:  Mobile  Technology      And        Content  :  “cloud  compuFng”  5/14/12   h(p://dar.bibalex.org   43  
  • 39. SoluFon   Title:  Mobile   Technology   Meta   Facet   index   result   Content  :   “cloud   compuFng”  5/14/12   h(p://dar.bibalex.org   44  
  • 40. soluFon  Title:  Mobile  Technology   Meta   index   Get   Meta   Facet   intersecFon   index   result  Content  :  “cloud   Content  compuFng”   index  5/14/12   h(p://dar.bibalex.org   45  
  • 41.    Separate  indexes  Vs.  All  in  one    •  Separate  indexes   +  Indexing  Fme   +  Index  size   -­‐  Processing  results  (facets..)   -­‐  Scoring  5/14/12   h(p://dar.bibalex.org   46  
  • 42.    Separate  indexes  Vs.  All  in  one    •  Separate  indexes   +  Indexing  Fme   +  Index  size   -­‐  Processing  results  (facets..)   -­‐  Scoring  •  One  index   –  Index  size   –  Indexing  Fme   + Scoring   + Processing  Fme  5/14/12   h(p://dar.bibalex.org   47  
  • 43. Book  content  index   AR   EN     FR   IT   SP   Index   Index   Index   Index   Index  5/14/12   h(p://dar.bibalex.org   48  
  • 44. 5/14/12   h(p://dar.bibalex.org   49  
  • 45. Searching  •  Simple  and    advanced  search   –  Cache  the  resulted  IDs  only  •  HighlighFng  search  result   –  Get  the  full  search  result  and  highlight  per  page   result      5/14/12   h(p://dar.bibalex.org   50  
  • 46. Book  Content  Search  •  Search  using   –  Search  query   –  Book  ID   –  List  of  pages’  IDs  •  Highlights  •  AnnotaFons   –  Saved  currently  in  DB  5/14/12   h(p://dar.bibalex.org   51  
  • 47. FaceFng  •  Fixed  facet  fields     –  Category,  sub-­‐category,  language..etc.   –  Stored,  indexed,  exact  fields  •  Process  facets  from  different  indices  5/14/12   h(p://dar.bibalex.org   52  
  • 48. PersonalizaFon  •  Using  separate  index  of  personalizaFon     –  Different  Solr  fields  for  different  languages.   –  Search  across  all  fields.  •  Saving  in  both  Solr  and  DB  •  Indexing  tags,  raFng  and  comments  using  type   field    5/14/12   h(p://dar.bibalex.org   53  
  • 49. Future  •  Book  mobile  applicaFon  using  Solr  •  Using  Hadoop    •  Indexing  other  digital  media  (Maps,  audio,   video)  5/14/12   h(p://dar.bibalex.org   54  
  • 50. Contact         engy.morsy  @bibalex.org   Library  website:  h(p://bibalex.org   Digital  Asset  Repository:  h(p://dar.bibalex.org    5/14/12   h(p://dar.bibalex.org   55  
  • 51. 5/14/12   h(p://dar.bibalex.org   56  
  • 52. Thank  you…  5/14/12   h(p://dar.bibalex.org   57