The New Library of Alexandria 
Overview 
Bibliotheca Alexandrina (BA)
Ø Center of excellence in the production 
and dissemination of knowledge 
Ø Place of dialogue, learning and 
understanding between cultures and 
peoples
Ø The World’s Window on Egypt 
Ø Egypt’s Window on the World 
Ø Instrument for Rising to the Challenges of 
the Digital Age 
Ø Center for Dialogue Between Peoples and 
Civilizations
Not just a Library of Books but rather a vast cultural and 
scientific complex
A library that can accommodate millions of books
7 
http://archive.bibalex.org
8
14
15 
http://descegy.bibalex.org
16 
http://lartarab.bibalex.org
17 
More than 230,000 Arabic books are 
freely available online for Arabic 
readers worldwide
18 
http://suezcanal.bibalex.org
19
20 
http://naguib.bibalex.org/
21 
http://nasser.bibalex.org
22 
http://sadat.bibalex.org
Ø Project Overview 
Ø Collection Overview 
Ø Data Representation 
Ø System Workflow 
— DAF (Digital Assets Factory) 
— Cataloguing 
— Website 
§ Solr search Engine 
§ Article Viewer 
24
25
Ø Centre for Economic, Judicial, and Social 
Study and Documentation (CEDEJ) 
collaborated with Bibliotheca Alexandrina 
(BA) for the digitization of its archive of 
massive press articles collection 
Ø The project consists of multiple modules to: 
— Index the Press Archive Collection 
— Control data entry workflow 
— Digitize and process data 
— Catalogue and review Articles 
— Archive Web Publishing 
26
27
Ø Package of press archive 
— 800,000+ press clips varying between 
§ Press 
§ Reports 
— 500+ publishers 
— 60,000+ writers and reporters 
— 200 Different subjects 
§ Economic, politics, social life, etc… 
— Archive Languages: 
§ Arabic, English and French 
— Date range from 1966 to 2009 
28
Ø Finished so far 
— 115,000 press clips varying between 
§ Press 
§ Reports 
— 200 publishers 
— 14,000 writers and reporters 
— 100 Different subjects 
§ Economic, politics, social life, etc… 
— Archive Languages: 
§ Arabic, English and French 
— Date range from 1966 to 2009 
29
30
Ø A list of packaged press archive is submitted to 
Bibliotheca Alexandrina to be scanned and 
catalogued 
Ø Source of data is a collection of boxes 
Ø The box is organized on the following 
hierarchy 
— Folder 
— File 
— Sub-File 
— Document 
Ø Document represents a single page of press 
31
32
33
34
35
36
37
38
Article Creation 
39
Article Metadata 
40
Lookups Management 
41
Reports 
42
43
44
45
Ø Based on Apache Lucene project v4.1 
Ø SolrNet API is used to connect to Solr 
server 
Ø Features 
— Simple/Advanced search 
— Results Highlighting 
— Fields AutoComplete 
— Text search (Article Viewer) 
46
47
48
49
50
51
52
53
Ø Article viewer is used for previewing articles 
— It is one of multiple viewers developed at BA 
Ø Architecture 
— Server Side: RESTful services 
— Client Side: JavaScript using JSONP 
Ø Features 
— Image preview 
— Metadata preview 
— Text selection 
— Searching/highlighting 
— Zooming options: fit width/height 
54
Ø Viewer Web Services 
— Metadata Web Service: 
§ Retrieve article catalogue metadata 
§ Return technical information (width, height, page 
count..) 
— Content Web Service: 
§ Retrieve the image of each single page in the article 
applying scaling to custom width and height 
responsively 
§ Return the selected text based on the user highlighted 
area 
— Search Web Service: 
§ Perform the search using Solr engine APIs in the 
content of the articles 
§ Highlight the matching phrases in the article image 
55
56
57
58

Managing the Digitization of Large Press Archives

  • 2.
    The New Libraryof Alexandria Overview Bibliotheca Alexandrina (BA)
  • 3.
    Ø Center ofexcellence in the production and dissemination of knowledge Ø Place of dialogue, learning and understanding between cultures and peoples
  • 4.
    Ø The World’sWindow on Egypt Ø Egypt’s Window on the World Ø Instrument for Rising to the Challenges of the Digital Age Ø Center for Dialogue Between Peoples and Civilizations
  • 5.
    Not just aLibrary of Books but rather a vast cultural and scientific complex
  • 6.
    A library thatcan accommodate millions of books
  • 7.
  • 8.
  • 14.
  • 15.
  • 16.
  • 17.
    17 More than230,000 Arabic books are freely available online for Arabic readers worldwide
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 24.
    Ø Project Overview Ø Collection Overview Ø Data Representation Ø System Workflow — DAF (Digital Assets Factory) — Cataloguing — Website § Solr search Engine § Article Viewer 24
  • 25.
  • 26.
    Ø Centre forEconomic, Judicial, and Social Study and Documentation (CEDEJ) collaborated with Bibliotheca Alexandrina (BA) for the digitization of its archive of massive press articles collection Ø The project consists of multiple modules to: — Index the Press Archive Collection — Control data entry workflow — Digitize and process data — Catalogue and review Articles — Archive Web Publishing 26
  • 27.
  • 28.
    Ø Package ofpress archive — 800,000+ press clips varying between § Press § Reports — 500+ publishers — 60,000+ writers and reporters — 200 Different subjects § Economic, politics, social life, etc… — Archive Languages: § Arabic, English and French — Date range from 1966 to 2009 28
  • 29.
    Ø Finished sofar — 115,000 press clips varying between § Press § Reports — 200 publishers — 14,000 writers and reporters — 100 Different subjects § Economic, politics, social life, etc… — Archive Languages: § Arabic, English and French — Date range from 1966 to 2009 29
  • 30.
  • 31.
    Ø A listof packaged press archive is submitted to Bibliotheca Alexandrina to be scanned and catalogued Ø Source of data is a collection of boxes Ø The box is organized on the following hierarchy — Folder — File — Sub-File — Document Ø Document represents a single page of press 31
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
    Ø Based onApache Lucene project v4.1 Ø SolrNet API is used to connect to Solr server Ø Features — Simple/Advanced search — Results Highlighting — Fields AutoComplete — Text search (Article Viewer) 46
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
    Ø Article vieweris used for previewing articles — It is one of multiple viewers developed at BA Ø Architecture — Server Side: RESTful services — Client Side: JavaScript using JSONP Ø Features — Image preview — Metadata preview — Text selection — Searching/highlighting — Zooming options: fit width/height 54
  • 55.
    Ø Viewer WebServices — Metadata Web Service: § Retrieve article catalogue metadata § Return technical information (width, height, page count..) — Content Web Service: § Retrieve the image of each single page in the article applying scaling to custom width and height responsively § Return the selected text based on the user highlighted area — Search Web Service: § Perform the search using Solr engine APIs in the content of the articles § Highlight the matching phrases in the article image 55
  • 56.
  • 57.
  • 58.