SlideShare a Scribd company logo
Medici 2: A Scalable CMS for Cultural Heritage
Datasets
Motivation, Capabilities, Future Directions
Constantinos Sophocleous
Computation-based Science and
Technology Research Center
The Cyprus Institute
Nicosia, Cyprus
Luigi Marini
National Center for Supercomputing
Applications
University of Illinois at Urbana-
Champaign
Urbana, Illinois, USA
Ropertos Georgiou
Science and Technology in
Archaeology Research Center
The Cyprus Institute
Nicosia, Cyprus
Mohammed Elfarargy
International School of Information Science
Bibliotheca Alexandrina
Alexandria, Egypt
Abstract— Digitizing large collections of Cultural Heritage
(CH) resources and providing tools for their management,
analysis and visualization is critical to CH research. A key
element in achieving the above goal is to provide user-friendly
software offering an abstract interface for interaction with a
variety of digital content types. To address these needs, the
Medici content management system is being developed in a
collaborative effort between the National Center for
Supercomputing Applications (NCSA) at the University of
Illinois at Urbana-Champaign, Bibliotheca Alexandrina (BA) in
Egypt, and the Cyprus Institute (CyI). The project is pursued in
the framework of European Project LinkSCEEM2 and
supported by work funded through the U.S. National Science
Foundation (NSF), the U.S. National Archives and Records
Administration (NARA), the U.S. National Institutes of Health
(NIH), the U.S. National Endowment for the Humanities (NEH),
the U.S. Office of Naval Research (ONR), the U.S. Environmental
Protection Agency (EPA) as well as other private sector efforts.
Medici is a Web 2.0 environment integrating analysis tools for
the auto-curation of un-curated digital data, allowing automatic
processing of input (CH) datasets, and visualization of both data
and collections. It offers a simple user interface for dataset
preprocessing, previewing, automatic metadata extraction, user
input of metadata and provenance support, storage, archiving
and management, representation and reproduction.
Building on previous experience (Medici 1), NCSA, and CyI
are working towards the improvement of the technical,
performance and functionality aspects of the system. The current
version of Medici (Medici 2) is the result of these efforts. It is a
scalable, flexible, robust distributed framework with wide data
format support (including 3D models and Reflectance
Transformation Imaging-RTI) and metadata functionality.
We provide an overview of Medici 2’s current features
supported by representative use cases as well as a discussion of
future development directions.
Kenton McHenry
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
Urbana, Illinois, USA
Index Terms—Medici, Archaeology, Cultural Heritage,
Content Based Analysis, Un-curated Data, Auto-curation, Web
2D, Web 3D, Reflectance Transformation Imaging (RTI), RTI-
to-3D.
I. MOTIVATION – INTRODUCTION
CH research is producing huge amounts of data at an ever-
increasing rate and size due to the continuous development of
more advanced and detailed data extraction technologies and
methods. Such research efforts include the pilot projects
undertaken by the Cyprus Institute during the establishment of
an advanced imaging center for CH and archaeology as part of
the LinkSCEEM-2 Cultural Heritage Work Package. The three
pilot projects used cutting-edge visualization technologies such
as RTI. They generated scientific digital data of archaeological
objects and artifacts, such as cylinder seals, ceramics, jewelry
and other museum collections, as well as wall paintings and
icons from the world known medieval churches of Cyprus.
In these contexts, the digital documentation, storage and
management of the aforementioned data as well as providing
easy access to researchers is becoming ever-more critical.
Flexible database storage, processing, metadata extraction and
presentation frameworks are needed in order to satisfy these
requirements. The development of such database/data
repository systems establishes a long-term solution for
accessing and storing various kinds of CH data and metadata.
These systems must be flexible in terms of being able to have
additional data formats support incorporated over time, thus
ensuring their continuous update with latest technological
developments; thus maintaining the CH scientific
communities’ interest.
The developed framework will be complementary to
repositories such as the Inscriptifact database and online library
[1]. Having a specialized resource repository for the needs of
specific datasets (such as Inscriptifact) must be coupled with a
general digital storage and management framework also
capable of handling types of data that are not the focus of
Inscriptifact. This collaboration will immediately allow the
scientific exploration of newly generated CH datasets by an
already established user community.
Medici has and is being developed to address the above. It
is a Web 2.0-based general multimedia content management
system capable of semantic content management and
service/cloud-based workflows. It supports a broad range of
throughput-intensive research techniques and community data
management. Medici provides scalable storage and media
processing, straightforward user interfaces, social annotation
capabilities, preprocessing and previewing, metadata extension
and manipulation, provenance support and citable persistent
data references [2].
Through Medici, users can upload datasets in a variety of
formats, including 3D and RTI formats widely used in cultural
heritage research and collections [3][4]. Dataset metadata is
obtained by extraction services run as data preprocessors and
presented to the user along with metadata input by the
community for each dataset. Utilizing these extractors Medici
is capable of deriving semantically meaningful features for
content-based comparison between datasets (e.g. textual data
from handwritten text). Previews of large datasets in a variety
of formats are also extracted and viewed to avoid the need of
(down) loading the whole content on the user’s system or
finding the needed software to examine the contents of a file.
Medici 2.0’s scalability/parallelization, flexibility and
robustness as well as its overall performance are improved by
decoupling the preprocessor software from the main server
(with extractors being allowed to run on different machines in a
distributed architecture). A powerful and flexible RDF-enabled
database management system is used for storing datasets.
Dataset files can be uploaded using a variety of interfaces. Use
of the RDF standard makes the datasets’ metadata open and
portable. Medici uses the latest Web technologies to display
large images smoothly (Seadragon, zoom.it [5]), 3D graphics
(X3DOM [6]), video (FFMPEG[7]) and RTI (SpiderGL[8])
along with a special feature added to the RTI web visualization
developed by Bibliotheca Alexandrina to extract 3D models
from RTI files (RTI-to-3D) in order to enhance the possibilities
of the technology.
Different deployments of Medici can be hosted by different
institutions and parts, thus allowing the hosting party better
control of access to resources.
II. FUNCTIONS AND ARCHITECTURE
A diagram describing the dataset upload,preprocessing and
previewing process is displayed in Fig. 1.
The systemis composed of the following parts:
A. Web server
a. Functions and Architecture
Datasets and files are uploaded to the web serverusing one
of an array of methods. One of them is regular HTML forms
(both when creating a new dataset to upload its first file as well
when viewing a dataset to add files to the dataset). However,
files can also be uploaded in other ways, including uploading
individual files that do not belong to a dataset.
The web server is responsible for sending a request to the
database to store the file or dataset and send a message to the
RabbitMQ message broker [9] defined in the server’s
configuration. This message will contain the necessary
information pertaining to the file or dataset that are needed by
the RabbitMQ broker to distribute the file or dataset’s
preprocessing jobs to the available extractors/preprocessors as
well as by the extractors/preprocessors themselves to decide
how to process the file or dataset.Datasets contained as files in
a zipped archive are inspected by the web server. These files
are unzipped to identify the type of dataset whose
preprocessing will be handled by extractors on the RabbitMQ
bus.
The web server can then passively await for any auto-
extracted file or dataset metadata or previews (generated by the
extractors-preprocessors that received a job for the file or
dataset) to be uploaded to the server through a REST API [10].
When that happens, the web server calls the database to save
the previews/metadata and associate them with the file or
dataset.
Extractors can be chained. This means that the resulting file
output from an extractor may be fed as input to another
extractor, thus allowing even greater flexibility regarding
execution of preprocessing subtasks common to the
preprocessing of different file or dataset types. For this to
happen, an extractor that outputs an intermediate result can
upload it as an intermediate result for the file or dataset using
the server’s REST API. The server then calls the database to
save the intermediate result and a new job description is sent to
RabbitMQ, now containing the identifiers of both the original
file or dataset (with which the final result will be associated)
and the intermediate file (which the extractor(s) working on the
new job need to download and process).
The web server is also responsible for selecting and setting
up the previewers that will display the contents of a file or
dataset based on the type (i.e. file format) of a file or dataset.
Other functions include working with custommetadata and
searches,an interface for adding metadata to a dataset by users
based on a mapping dependent on the institution managing the
Medici implementation, searching for datasets having metadata
satisfying a query and text-based search.The server also allows
social annotation of datasets (i.e. tagging, notes, comments,
custommetadata).
b. Technologies used
The web server is built on the Play web application
framework [11][12], written for allowing coding in Scala and
Java. As supported by the framework, the server uses the
model-view-controller (MVC) architectural pattern [13]. It
relies on a number of plugins for communicating with the
RabbitMQ broker, communicating with the database and
authenticating users. It uses dynamic HTML (HTML version
5) webpages generation (views) according to the results of the
processing of the input (Scala-based controllers). The models
(defined in Scala) are closely associated with collections in the
database.
Preprocessors and scripts (i.e. previewers) running on
users’ browsers communicate with the server using a REST
API.
B. Preprocessors / Extractors
Preprocessors are used for both extracting metadata from
datasets and files as well as generating previews for them.
Each extractor listens to a RabbitMQ broker via its own
named job queue, which is bound to particular routing keys. It
is by these keys that RabbitMQ forwards the extraction job
messages to the extractors’ queues. Each job message is
handled separately. Whenever a job is received, the extractor
first downloads the dataset or file having the identifier stated in
the job description from the web server via the server’s REST
API.
What follows depends on the task each extractor is
responsible for. Extractors in general use integrated libraries or
external system calls to third-party software to process dataset
files to generate the previews or extract the metadata. The
third-party software include for example 3D model and video
processing programs that are installed to the extractor’s
environment and are called through Java command-line calls.
Finally, the extractor returns the result back to the web
server:
 If the result is an intermediate file for use in extractor
chaining, the file is uploaded to the server as an
intermediate result to be forwarded to other extractors
through RabbitMQ to continue processing.
 If the result is a final preview, it is returned to the
server as a preview and then a command is sent to the
serverto associate the preview with the original dataset
or file.
 If the result is a set of dataset or file metadata, a
command is sent to the server to associate the
metadata with the original dataset or file.
Many extractors of the same type can be run
simultaneously on the same or different machines and have the
jobs sent by a Medici web server be distributed among themby
RabbitMQ as long as they are registered with the same
RabbitMQ routing exchange. Also, an extractor may receive
messages from more than one Medici web server instance as
long as the web servers are registered with the same RabbitMQ
Fig. 1. Dataset upload, preprocessing and previewing process.
routing exchange. This allows for great flexibility, scalability
and robustness through a distributed architecture.
Extractors can be implemented using Java or Python.
C. Previewers
Previewers are used for viewing the contents of a dataset or
a file from within the user’s browser.
On the server side, the server selects which previewer to be
launched for each preview of each file in the dataset when a
dataset page is requested fromthe web server. The selection is
done by comparing each preview’s file type with the file types
accepted by each previewer. The latter information is held as
JavaScript Object Notation (JSON) data in a public file for
each previewer.
Before the previewer scripts are sent to the browser along
with the rest of the dataset or file page to be executed, certain
variables are set dynamically by the server, mostly for the
user’s browser to be able to find the files needed by the
previewer on the server (e.g. the dataset files’ URLs).
The previewers use dynamically-added JavaScript scripts
downloaded from the web server at runtime and added to the
webpage’s DOM (Document Object Model) using jQuery [14].
The previews to be displayed,as well as any other needed files,
are also downloaded at runtime using AJAX calls. After the
preview is ready to be displayed, the preview HTML is
appended to the webpage’s HTML.
The previewers use browser-side JavaScript and jQuery.
D. Database
The NoSQL MongoDB is used as the system’s database
management system (DBMS). Selecting a NoSQL DBMS
offers advantages important for Medici. Flexibility is a key
attribute for Medici, as CH community requirements change
constantly as described in the Introduction. This gives value to
the horizontal scalability of NoSQL. Simplicity of design of
NoSQL databases allows the addition of complex features to
Medici (such as community-generated dataset metadata)
without too much difficulty.
E. Job broker
If the web server is the heart of Medici 2, the job broker is
its veins. The role of the RabbitMQ message broker is to take
preprocessing messages fromthe web server that are sent once
a dataset or file is uploaded and distribute them to the
extractors that can handle the jobs.
Each extractor, once activated registers one or more
delivery queues on RabbitMQ, on which it listens. Each queue
is associated with a particular routing key set, which defines
which routing keys a job can have in order for it to be routed to
that queue (and thus to the extractor listening to the queue).
Each routing key set is defined by a four-field termlike the
following:
localhost.file.model.obj.#
In the above, the first field defines the URL or IP of the
machine on which the Medici web server implementation
sending job messages for the extractors is based. The second
field defines whether the extractor takes jobs pertaining to
preprocessing a file or a whole dataset. The third and
fourth field are for the MIME types of the files (or the types of
datasets) that the extractor accepts, and the fifth the possible
subtype ofthe datasets orfiles. The syntaxof the set definitions
uses the conventions specified in the RabbitMQ tutorials [9].
III. METADATA
Medici can accept, persist and process two kinds of
metadata for each dataset or file:
 Automatically-generated. Generated by metadata
extractors and associated with the dataset or file
through a REST API call to the web server by the
extractors. They usually pertain to the more technical
aspects of datasets and files. Their subjects, predicates
and objects can vary from dataset orfile type to dataset
or file type and from Medici implementation to Medici
implementation depending on what metadata are
available for each dataset type and dataset, as well as
the type of the extractor which extracted the metadata.
They cannot be modified by users.
 Community-generated. Generated by the users
through completion of an HTML formfor each dataset.
Their allowed subjects, predicates and objects are
defined on an institutional schema that can differ for
each Medici deployment. The schema is formulated by
each deployer or institution at its discretion, however it
is recommended for the schema to be compatible with
the Dublin Core content description model [15] to
facilitate metadata sharing between different
repositories. The CIDOC Conceptual Reference Model
is based on Dublin Core and is specialized for CH
metadata and can also be used to facilitate metadata
sharing among the CH scientific community [16].
The input metadata, either coming from an extractor or an
HTML form, are turned to JSON structures by the extractor or
a browser-side script respectively and sent to the web server,
where they are parsed back.
Metadata is understood by the web server and the
JavaScript script manipulating them on the browser-side (for
community-generated metadata) as a tree structure (for
community-generated metadata) or a list of tree structures (for
auto-generated metadata). This distinction applies because
different extractors may generate different sets of metadata
with no contact points between them for the same dataset or
file and so the reasonable option here is to present them as a
list. On the other hand, community-generated metadata are
generated by users by modifying the whole metadata structure
of a dataset using an HTML form at the same time, meaning
that they can be merged.
Metadata are formulated as tree structures because each
predicate of a metadata subject is either a property of the file or
dataset or a subproperty of another property, all describing the
same file or dataset. An example of a simple metadata tree
(here for community-generated metadata) is presented in Fig.
2. The tree structures (one for each dataset) are stored on
the web server’s application layer as Scala maps.
A. Automatically-generated metadata
There are currently metadata extractors for images and
RTIs, with more to be added in the future. For a discussion of
the metadata extractors for each file type, see the sections
discussing Medici’s features for their respective types.
B. Community-generated metadata
The schema for the community-generated metadata is
defined in two comma-separated value files.
 The first file stores the name of each metadata node
along with whether the node is a leaf in the metadata
tree or a higher-level node. Leaves consist of string
data which are input by users and serve as the objects
of the description of the dataset, answering the
questions generated by the tree paths leading to them.
Branches are subproperties of higher-level properties
or top-level properties. In the example of Fig. 2, the
leaf node ―Title: Ancient Mari‖ is a leaf answering the
question ―what is a/the Name of a/the Project Location
of a/the Project associated with this dataset?‖.
 The second file stores the possible relationships
associating nodes with nodes or leaves, along with the
cardinalities of each relationship.
The above are automatically enforced by the systemwhen
the user adds or modifies a dataset’s community-generated
metadata.
C. Metadata search
Users can search for datasets that contain user-entered
metadata that satisfy formulated queries.
The queries are formulated by filling out an HTML form
similar to the HTML form for adding/ modifying community-
generated metadata. For each level of the metadata tree, the
user can select which properties must be satisfied by selected
nodes at that level. For example, search for datasets having a
Project having Cyprus as a Country and having a Project
Location, with the Project Location having ―Choirokitia‖ as a
Name, and the Project also having Responsible Institution
having a Name of ―Example Institution‖, as in the query
shown in Fig. 3.
Users will be able to use a logical expression-like query
formulation to generate queries. For example, he/she may wish
to search for datasets satisfying the query above but having
Cyprus as a Country OR having a Project Location for which
the rest apply, or having Cyprus as a Country and NOT having
any Project Locations for which the rest apply.
The JSON-formulated query is sent to the web server,
which then parses it as a Scala map, compares it to the
community-generated metadata maps of each existing dataset,
and returns the list of datasets satisfying the query. The
comparison is done using a Depth-First-Search-esque
algorithm [17] that recursively checks for all paths in the query
tree where there is a path in the dataset’s community-generated
metadata tree that satisfies the path. The server formats the
result list as HTML and returns it to the client through AJAX
to be displayed.
IV. IMAGES
The support of multi-format and multi-resolution image
storage and presentation is highly beneficial to CH
communities due to the development of image acquisition
instruments, that satisfy current demands of research analysis
in archaeology. Medici currently supports the JPEG/JPG, PNG,
TIFF, GIF and BMP image formats. Regarding low-resolution
images, the image is displayed by directly embedding it in the
image previewer’s webpage. However, for high-resolution
images, a dedicated gigaimaging view extractor is used.
A. Deep Zoom
Viewing of large images efficiently and without loss of
quality is achieved by constructing an image pyramid using a
technique similar to frustum culling. Specifically, when a large
image is to be viewed in a limited screen space, only a part of it
can be viewed in full quality and the rest of the information in
the image that have no interest for the user is essentially
occluded and thus it is not necessary for it to be transmitted.
Consequently, data transmission size is limited by tiling the
Fig. 2. A simple metadata tree for community-generated metadata.
Fig. 3. Example community-generated metadata search.
image at various scales in a preprocessing step executed
server-side so that only the tiles needed for the currently shown
scale and position are transmitted. Deep Zoom, a technology
developed by Microsoft as part of Microsoft Silverlight [18],
works exactly like that- with the user initially seeing a lower-
resolution view of the entire image which they can then zoom
in to see portions of the image in more detail.
B. Zoom.it
Zoomit.js1
is a JavaScript library also used by the Zoom.it
web service [5]. It relies on the technology behind the
Seadragon gigaimaging viewer, which utilizes Deep Zoom.
This library is used by Medici 2 to view large images that have
already been split up into various levels of Deep Zoom tiles
(i.e. the image pyramid) by a dedicated image pyramid
extractor. The pyramids are loaded on the web server together
with an XML file (DZI) containing basic image metadata
needed by zoom.it.
C. Gigaimage processing
When an image is uploaded on the web server, a message is
sent to RabbitMQ for any extractors that can process images. If
there are any active gigaimaging previewers registered with
RabbitMQ, a gigaimaging previewer will receive the job
message for the image.
The job message will contain the image’s size. If the
image’s size is below a threshold input by whoever is
responsible for calling/ activating the extractor program as an
input parameter, the job is rejected and no XML file for
zoom.it or the image pyramid tiles are generated. This means
that when a user requests to view the dataset containing the
image, no zoom.it preview will be found by the web server and
therefore the original image will be shown, embedded with an
HTML tag.
If the size of the image exceeds the threshold, first the
zoom.it XML file is generated and uploaded to the web server
as a preview, it being given an identifier. That identifier will be
used to associate the image tiles with the preview and thus with
the original image. Then the image tiles for all the image zoom
levels are generated, concurrently uploaded back to the web
server and associated with the preview.
Zoomit.js accesses the image’s pyramid tiles via a virtual
directory structure generated on the web server for each
preview. This structure is implemented with Scala functions
that take the parts of the virtual directory path as parameters
and return the requested tile after querying the database where
the tiles are stored. These parameters include the tile’s level
and 2D-coordinates of its position on the level.
An example of a zoom.it preview can be seen in Fig. 4 and
Fig. 5, where it is used for visualizing ancient cylinder seals
and byzantine wall paintings fromCyprus.
D. Image technical metadata
There are currently two different image metadata
extractors, any of which can be used at the discretion of the
deploying institution, each extracting different sets of
1
http://zoom.it/h.js
Fig. 4. A two-dimensional giga-image of a cylinder seal dating to the 13th
century BC taken with a large format camera system and previewed in
Medici’s 2.0 web viewer. Courtesy of the Bank of Cyprus Cultural
Foundation.
metadata. The first uses the ImageMagick image processing
suite [19] and the second uses the Java-basedimage metadata
extractor library developed by Drew Noakes [20]. The
ImageMagick extractor extracts many more details, even
though it needs ImageMagick to be installed on the machine
running the ImageMagick extractor.
V. 3D MODELS
Complete digital documentation in archaeological and
Cultural Heritage research is a multidimensional process. New
opportunities and challenges for the development of web-based
Virtual Reality (VR) applications in these research fields have
been the direct consequence of advances in the field of three-
dimensional representation and Internet-related technology
[21]. High accuracy and multi level of detail (LOD) models
can be obtained by using various methods for 3D model
creation such as laser scanning technology, photogrammetry
and 3D
Fig. 5. Giga-image of byzantinewall paintings takeninside the monastery of
St. John Lambadistis in Cyprus andpreviewedin the webviewer. Courtesy of
the Department of Antiquities in Cyprus.
modeling (architectural reconstruction techniques) [22].
Nonetheless, the fusion of these techniques during post-
processing may occur, thus leading to the creation of highly
complex models with a diversity of information encapsulated
within a single file. The development of a web-based
application for user access and interactive exploration of three-
dimensional models has been studied and worked upon since
1995. The WEB 3D consortium composes open standards for
real time 3D data and models exchange over the Internet.
VRML became the first web 3D format [23]. The latest
successor of VRML is X3D, providing a flexible solution for
real time 3D representation and communication for Medici.
The system architecture of the 3D processing pipeline
consists ofopen source tools and free software, thus providing
full transparency on adopted methodology and data processing
methods providing a cost effective solution both for server and
client. The main feature of this web VR system previewer is to
provide the user with a completely new visit experience based
on a free interactive exploration interface of the object (i.e., not
constrained by any predefined pathway) and on the opportunity
to get more detailed information on specific parts of interest.
Medici currently supports 3D file formats that are widely
used in CH such as Wavefront (obj), Polygon File Format
(ply), X3D (x3d) format and 3D models embedded in PDF
(3DPDF). Models in the obj or ply format are converted to x3d
and then translated to their center of rotation. ―Virtual scenes‖
containing multiple objects can also be previewed in 3D fly-
through and walk-through mode (primary mode is ―examine‖
mode). If the model needs optimization for smooth
transmission over the internet, it is decimated in an adequate
LOD, before being converted to the X3DOM format [6] (that
is, HTML5 files) and sent back to the server as a preview to be
added to the DOM of the 3D previewer and displayed upon
request by the user.
There are also previewers for 3DPDF (also used for general
PDF viewing). One of them was written by NCSA and uses
Mozilla’s PDF.js library [24], while the other is a simple
integration of Adobe Acrobat using the associated Acrobat
browser plugin.
A. X3D extraction
The X3D extractor is responsible for converting obj and ply
models to X3D and decimating them if needed through
external calls to MeshLab’s Meshlabserver [25] and setting
job identity flags on each of the models uploaded by the client.
For the X3D extractor to work, MeshLab must be installed
on the extractor’s machine. The extractor accepts jobs from
RabbitMQ for obj, ply and X3D. However, if the 3D model is
not in X3D form, MeshLab’s meshlabserver command-line
interface is called by the extractor to generate the equivalent
X3D model.
On the other hand, if the model’s file size is above a bound
set by the client for calling/activating the extractor programas
an input parameter, a dynamically-generated meshlabserver re-
meshing/simplification script is used. A new call to decimate
the model to the adequate LOD is now generated according to
the client’s preset boundaries in order to reduce its complexity
and thus its size.
Finally, a flag is set on the job received by the extractor
indicating that the first phase of the processing of the model
was completed and the post-processed model is uploaded back
to the web server as an intermediate result together with the
model’s new job flag. The web server then transmits the
intermediate result to the HTML5-X3DOM extractor for
preparation for front-end display.
B. HTML5-X3DOM extraction
Converting X3D models to X3DOM/HTML5 facilitates
and optimizes their integration with the X3D previewer’s
HTML DOM structure. This is done by the X3DOM extractor,
which uses aopt, a command-line tool bundled with the
InstantReality framework [26].
For the HTML5-X3DOM extractor to work, InstantReality
must be installed on the extractor’s machine.
C. Models with separate materials/texture files
In special cases and in order to have obj and X3D files
display their geometry with the adequate color information,
models at times have separate material/texture files. Medici
2.0 considers this case. Those files can be still previewed
correctly if they are uploaded contained in a ZIP file with their
corresponding materials/textures. The web server uses a utility
function that unzips the file and uses rules regarding the
existence of files of certain formats in the zipped dataset to
resolve the type of model contained in the ZIP file (whether it
is a zipped obj, a zipped X3D or something different). The type
discovered is sent as part of the routing key to RabbitMQ when
the file’s extraction job is sent.
After the automated post-processing of the model with its
textures from the X3D extractor, the HTML5-X3DOM
extractor completes work on the X3D preview and sends back
the HTML5 X3DOM as the file’s final preview.
D. X3DOM previewer
The X3DOM previewer works by dynamically
downloading the HTML5 file in which the X3D preview of the
model is encapsulated and embedding it into an X3D HTML5
element. Afterwards, events thrown by the new elements
activate functions in the X3DOM JavaScript library,
initializing the model’s preview through an X3D scene.
The preview allows (among others) rotating the model,
zooming, panning, displaying model statistics, changing
rendering geometry mode from default view to wireframe or
vertex view, with or without texture and surface depth map
simulation.
X3dom.js2
accesses any image textures via a virtual
directory structure generated on the web server for each
preview, which for files uploaded as ZIP files simulates the
original ZIP file’s internal directory structure. This structure is
implemented with Scala functions that take the parts of the
virtual directory path as parameters and return the requested
image texture after querying the database where the image
textures are stored. These parameters include the (virtual) path
to the image texture, as it was in the original ZIP file
containing the textures of the original model.
3D model preview examples (with the depth maps and
model statistics also being displayed) are shown in Fig. 6, 7.
VI. RTI
The application of Reflectance Transformation Imaging
Technology (RTI) has offered great possibilities for research as
well as for the documentation and preservation of cultural
heritage objects and works of art. Polynomial Texture Map
(PTM) is a subset of the RTI method and was developed by
Hewlett-Packard Imaging Labs at the beginning of the past
decade [27][28] and enhanced by the West Semitic Project at
University of Southern California (USC) and Cultural Heritage
Imaging (CHI) [4]. PTM addresses the challenges in the
photography of objects’ faded, damaged or badly preserved
surfaces especially when they feature inscription, decorative
2
http://www.x3dom.org/download/dev/x3dom.js
Fig. 6. Hellenistic-Romantheater,300BC-365AD, previewedin the web 3D
viewer. Courtesy Cyprus Dept. of Antiquities-University of Sydney.
Fig. 7. Ancient vessel fromthe Pyrgos area of Cyprus, previewed in the web
viewer. Courtesy of the 3D-coform EU project.
Fig. 8. An ancient replicacoin that shows the capabilities of RTI photography.
Self-shadows andinterreflections are such derivatives presentedon the upper-
left cornerin comparison withthe default mode onthe lower-left side that has
no such properties.
Fig. 9. This snapshot is taken from the Inscriptifact desktop RTI viewer
developedby West Semitic Project at USC where the user is able to interact
with the artifact usinga ―virtual torch‖fordynamic illumination and surface
enhancement.
patterns and designs. They surfaces of an array of
archaeological objects and works of art such as stone or clay,
marble, plaques, coins, paintings, mosaics, sculpture, jewellery
and other small objects (see Fig. 8), present optimal study cases
for the utilization of RTI technology.
A. RTI/PTM viewer
The PTM algorithm synthesizes the data from images taken
under varying lighting directions to create a single image that
can be examined on a RTI/PTM viewer with a ―virtual torch‖.
The viewer allows the user to move the light angle intuitively
in real time, so that the combination of light and shadow
representing the relief features of the object’s surface can be
freely altered. RTI also permits the enhancement of the
subject’s surface shape, color and luminance attributes, which
allows one to extract detail out of the surface that cannot be
otherwise derived (see Fig. 9) [1].
Medici currently supports viewing real-time web RTI
through a Java applet developed by Clifford Lyon [29] (see
Fig. 10). There are three stand-alone desktop viewers that can
interpret RTI/PTM files but for the web just the viewer
mentioned above [30][31][32] is used. The current online RTI
viewer has pitfalls concerning incompatibility with current
W3C standards.
In the future, the current viewer will be replaced by
one based on the SpiderGL library [8], which is based on
WebGL. This will allow direct execution of the viewer by the
user’s browser without the need of using a third-party plugin,
improving cohesion between the user’s Graphics Processing
Unit, the viewer and the user’s browser, and also improving
security.
Medici 2.0 will currently avoid the use of this viewer, but
instead will use an innovative method for extracting 3D models
from RTI files for front-end previewing. Based on this the user
can recognize the object and then (down)load the original high-
resolution RTI file to his/her desktop for actual interaction.
B. RTI metadata
In the same manner as traditional web images, RTI
technical metadata is extracted by a standalone preprocessor.
The preprocessor downloads the RTI file from the web server
and then reads its topmost lines for the types of metadata
included in the most common RTI file formats as defined in the
PTM file format specification by Hewlett-Packard [28]. The
metadata is parsed to JSON and returned to the web server to
be associated with the RTI graphic.
Fig. 10. Replica of anancient coinfrom Petra, Jordan, viewed using the
current PTM viewer.
VII. EXTRACTING 3D MODELS FROM RTI
The ability to interactively change light direction and apply
various filters makes RTI one of the best techniques for
examining archaeological artifacts. Human perception of
highlights and shadows in a 2D image of a surface helps the
viewer interpret the surface topology. However, this ability to
perceive surface topology varies from person to person. Also,
some surfaces with varying contrast and surface characteristics
might be difficult to perceive even with dynamic lighting.
Further studies on that issue can be found in [33].
RTIs capture both color and geometric properties of the
object which makes it possible to use them to reconstruct 3D
approximations of the original surfaces. Doing this greatly
improves the perception of the artifacts’ surface details. The
approach used is based on an approach similar to that of
photometric stereo [34] and uses three 2D texture maps that are
extracted from the RTI to then reconstruct the final 3D surface
given known lighting directions. Luminance info is used to
estimate a normal map which is integrated to obtain a height
map. The height map is used to build a 3D surface by shifting
vertices of a Delaunay mesh. Color info is used to extract a
uniformly lit diffuse map that is used for surface texture
mapping.
A. Maps used
a. Diffuse Map
Diffuse maps (a.k.a. albedo maps) define the main color of
the surface. A good diffuse map should contain only color
information without any directional light effects, inter-
reflections, specular highlights or self-shadowing. Having any
of above in a diffuse map means that the object will respond in
an incorrect way to virtual incident light; such as casting
shadows in the wrong directions or showing highlights where
no direct incident light hits the object.
PTM allows the extraction of accurate diffuse maps
because the luminance and chromaticity information are stored
separately. In LRGB PTM format the color information exists
out of the box. For RGB PTM format, similar results can be
obtained by casting light perpendicular to surface per texel to
make sure all texels get the same amount of light. This results
in a uniformly lit texture that is ideal for use as a diffuse map.
b. Normal Map
A Normal Map is a texture map containing surface normals
at each texel stored as RGB. Directional Lighting information
for each pixel is already stored in a PTM, which makes it
possible to get a very good estimate of the surface normal at
that pixel. Since Luminance is maximum when incident light is
perpendicular to surface (i.e. incident light vector = surface
normal), the surface normal at each texel can be estimated as
the orientation maximizing light response.
c. Height Map
Height maps are grayscale texture maps in which each
pixel’s white level corresponds to the height value of a vertice
of a 3D grid mesh (usually in the Z direction). Height maps
used here are generated by integrating normal maps obtained as
described in the previous section. Normals at every surface
point are perpendicular to the height map gradient. Integration
is not always guaranteed to yield precise results since it is
based on an estimated normal map. Also, information about
surface discontinuities is lost in normal maps. The 3D models
generated are good approximations of the real surface.
Height map generation is implemented in an iterative
manner in which each iteration improves contrast between low
and high points. Height map pixels are initiated to zero height.
On each new iteration, each pixel’s height is slightly modified
according to the slopes of the surrounding pixels’ normals and
their heights in the current iteration. Contributions from all
eight surrounding pixels will be averaged and added to the
current height. Fig. 11 shows a summary of surrounding pixels
contributions and associated signs, where Nxand Ny are the X
and Y components, respectively, of the surface normal.
-Nx
+Ny
+Ny
+Nx
+Ny
-Nx
Current
Pixel
+Nx
-Nx
-Ny
-Ny
+Nx
-Ny
Fig. 11. Contributions of surroundingpixels tothe current pixel height when
generating the height map from the normal map.
B. Testing of algorithm
In testing done by BA, a 2D Delaunay triangulation was
used to generate rectangular grids that were deformed using
resulting height maps and texture-mapped using diffuse and
normal maps. The quality of the generated models improved
proportionally with number of iterations used. Iterations
beyond 100 000 iterations had no noticeable effect.
Further testing of the algorithm took place at CyI. The
testing showed that for low and mezzo-relief RTI the algorithm
generated 3D models of good quality with no more than 10 000
iterations needed.
An example of testing the algorithmis shown in Fig. 12.
C. Implementation
The algorithm will be implemented as a standalone
command-line program that will take as input the desired
number of height map generation iterations, the desired height
modifier for the generated 3D model and the RTI file and
output the generated 3D model. This program will be called
using a standard extractor taking the above input parameters as
Fig. 12. Example ofa PTMfile andits 3D model generatedby the RTI-to-3D
algorithm.
its input. The result of the extraction will be sent to the 3D
extractors for further processing, as it is for all 3D models.
VIII. FUTURE DIRECTIONS
A. Immediate plans
The community-generated metadata search will be
enhanced. The user will be able to use logical expression-like
query formulation to generate queries with AND, OR, NOT
operators. Technical metadata will also be searchable.
Integration of more MeshLab features in Medici will be
attempted (if supported by X3DOM). These may include
shading modes, a measuring caliber, community-generated
annotations on the model display and being able to change the
view’s light source.
Support for more file formats and model types will be
added, including printer-ready stereolithography (STL)
diagrams, Virtual Reality spherical panoramas and videos.
B. Further designs
Medici metadata will be able to be integrated with
Inscriptifact [1] and other Cultural Heritage repositories.
Concerning 2D high-resolution imaging, the IIPImage
framework [35] will serve as a mediator for flexibility on the
presentation of more specialized CH imaging (e.g. able to
support multispectral imaging and real time annotation).
Regarding 3D, technical metadata extractors may be added
and also the capability to generate 3D models from image
datasets produced by photogrammetry similar to ARC 3D web
service [36]. Furthermore, the use of Medici’s interactive
system could therefore be potentially extended to more
complex virtual exploration such as a digitized archaeological
site, to serve not just as a previewer but as an intuitive highly
VR environment for web. An interactive learning environment,
a “virtual world” [37].
A SpiderGL-based one will replace the current java-based
RTI viewer once SpiderGL is developed and made stable by
the digital CH communities.
Medici will also be able to extract semantically-meaningful
features for content-based comparison (e.g. textual data from
handwritten text). This will be made possible by constructing
descriptors of each file according to derivations of
semantically-meaningful features from a file’s
data[38][39][40][41]. The generated descriptors for each file
can be compared with descriptors of the same type generated
from search query data.
IX. CONCLUSION
Though still in development, Medici 2 already supports a
broad range of throughput-intensive research techniques and
community data management. Many file and dataset formats
can be uploaded, analyzed and visualized. These include the
latest formats used by the CH scientific community, i.e. large,
detailed 3D models and RTI, as well as the ever-present large
images and audio/video. Not only can Medici 2 extract
technical metadata from files and datasets, it also allows
searching of metadata and social metadata generation
according to each implementer’s user-input metadata schema.
The above, together with the many important additions
scheduled for the future, make it clear that Medici 2 will be a
CMS more than worth considering for satisfying the CH
scientific community’s dataset management, analysis and
visualization needs.
ACKNOWLEDGMENTS
The authors thank the Cyprus Institute, the LinkSCEEM-2
project and its partners, the National Centre for
Supercomputing Applications (University of Illinois) (NCSA)
and Bibliotheca Alexandrina as organizations for providing
them the means and guidelines enabling their contributions to
the development of Medici. Moreover, they thank every
member of staff in CyI, NCSA and Bibliotheca Alexandrina
who provided them with user requirements and technical
assistance.
SOURCE CODE
The current Medici source code is available from the
NCSA Medici repository
https://opensource.ncsa.illinois.edu/stash/projects/MED
REFERENCES
[1] L. Hunt, M. Lundberg, and B. Zuckerman, ―InscriptiFact: A
virtual archive of ancient inscriptions from the Near East‖,
International Journal on Digital Libraries, vol. 5, no. 3, pp.
153-166, May 2005.
[2] L. Marini et al, ―Medici: A Scalable Multimedia Environment
for Research‖, white paper, presented at Microsoft eScience
Workshop 2010, Berkeley, CA, Oct. 2010.
[3] The International Committee for Documentation of Cultural
Heritage (CIPA). http://cipa.icomos.org
[4] M. Mudge et al, “Image-Based Empirical Information
Acquisition, Scientific Reliability, and Long-Term Digital
Preservation for the Natural Sciences and Cultural Heritage”, in
Eurographics 2008, Hersonissos, Greece, 2008.
[5] Zoom.it – About. http://zoom.it/pages/about/
[6] J. Behr, P. Eschler, Y. Jung, and M. Zöllner, ―X3DOM:aDOM-
based HTML5/X3D integration model‖, in Proc.of the 14th Int.
Conf. on 3D Web Technology, Darmstadt, Germany, Jun. 2009,
pp. 129-135.
[7] S. Tomar, ―Converting video formats with FFmpeg‖, Linux J.,
vol.2006, no.146, pp. 10, Jun. 2006.
[8] M. Di Benedetto, F. Ponchio, F. Ganovelli, and R. Scopigno,
―SpiderGL: a JavaScript 3D graphics library for next-generation
WWW‖, in Proc, of the 15th Int. Conf. on Web 3D Technology,
Los Angeles, CA, Jul. 2010, pp. 165-174.
[9] A. Videla and J. J. W. Williams, ―RabbitMQ in action:
distributed messaging for everyone‖, Shelter Island, NY:
Manning, 2012.
[10] R. T. Fielding, ―Architectural Styles and the Design of Network-
based Software Architectures‖, Ph.D. dissertation, Dept.
Comput. Sci., Univ. California Irvine, Irvine, CA, 2000.
[11] P. Hilton, E. Bakker, and F. Canedo, ―Play for Scala (Early
Access Edition)‖, Manning, 2012.
[12] N. Leroux and S. De Kaper, ―Play for Java (Early Access
Edition)‖, Manning, 2012.
[13] A. Leff and J. Watson, ―Web-application development using the
Model/View/Controller design pattern‖, in Proc. of the
Enterprise Distributed Object Comp. Conf. 2001, Seattle, WA,
Sep. 2001, pp. 118-127.
[14] JQuery. http://jquery.com/
[15] S. Weibel, ―The Dublin Core: A Simple Content Description
Model for Electronic Resources‖, Bulletin of the Amer. Soc. for
Inform. Sci. and Technology, vol.24, no.1, pp. 9-11, Oct./Nov.
1997.
[16] M. Doerr, C. E. Ore, and S. Stead, ―The CIDOC conceptual
reference model: a new standard for knowledge sharing‖, in
Tutorials, posters, panels and industrial contributions at the
26th int. conf. on Conceptual modeling, Auckland, New
Zealand, Nov. 2007, pp. 51-56.
[17] R. Tarjan, ―Depth-First Search and Linear Graph Algorithms‖,
SIAM J. Comput.., vol.1, no.2, pp.146-160, Jun. 1972.
[18] Deep Zoom-Features-Microsoft Silverlight.
http://www.microsoft.com/silverlight/deep-zoom/
[19] M. Still, ―The Definitive Guide to ImageMagick‖, New York
City, NY: Apress, 2005.
[20] Drewnoakes.com - jpeg exif / iptc metadata extraction in java.
http://drewnoakes.com/code/exif/
[21] A. Guarnieri, F. Pirotti, and A. Vettore, ―Cultural heritage
interactive 3D models on the Web: An approach using open
source and free software‖, J. of Cultural Heritage, vol. 11, no. 3,
pp. 350-353, Jul.-Sept. 2010.
[22] F. Bernardini and H. Rushmeier, ―The 3D Model Acquisition
Pipeline‖, Comput. Graph. Forum, vol. 21, no. 2, pp. 149-172,
Jun. 2002.
[23] Web3D Consortium: Open Standards for Real-Time 3D
Communication. http://www.web3d.org/realtime-3d/
[24] pdf.js by andreasgal. http://mozilla.github.io/pdf.js/
[25] P. Cignoni et al., ―MeshLab: an Open-Source Mesh Processing
Tool‖, presented at Eurographics Italian Chapter Conf. 2008,
Salerno, Italy, Jul. 2008, pp. 129-136.
[26] J. Behr, U. Bockholt, and D. Fellner, ―Instantreality — A
Framework for Industrial Augmented and Virtual Reality
Applications‖, in Virtual Reality & Augmented Reality in
Industry. Berlin, Germany: Heidelberg, 2011, ch. 5, pp. 91-99.
[27] T. Malzbender, D. Gelb, and H. Wolters, ―Polynomial texture
maps‖, in SIGGRAPH ’01 Proc. of the 28th Annu. Conf. on
Comput.. graph.. and interactive techniques, Los Angeles, CA,
Aug. 2001, pp. 519-528.
[28] T. Malzbender, D. Gelb, and H. Wolters, ―Polynomial Texture
Map (.ptm) FileFormat‖, Client and Media Systems Laboratory,
HP Laboratories, Palo Alto, CA, Tech. Rep. HPL-2001-104,
Apr. 2001.
[29] PTM Web Viewer by Clifford Lyon.
http://materialobjects.com/ptm/
[30] Hewlett-Packard labs PTM Desktop viewer.
http://www.hpl.hp.com/research/ptm/downloads/download.html
[31] Cultural Heritage Imaging (CHI) RTI Desktop viewer.
http://culturalheritageimaging.org/Technologies/RTI/
[32] Inscriptifact Desktop Viewer.
http://www.inscriptifact.com/instructions/
[33] J. T. Todd, ―The visual perception of 3D shape.‖, in Trends in
cognitive sci., vol. 8, no. 3, pp. 115-121, Mar 2004.
[34] R. J. Woodham, ―Photometric method for determining surface
orientation from multiple images.‖, in Optical Eng., vol. 19, no.
1, pp. 139-144, Feb 1980.
[35] IIPImage framework.
http://iipimage.sourceforge.net/
[36] ARC 3D Web service.
http://homes.esat.kuleuven.be/~visit3d/webservice/v2/index.php
[37] B. Harper. J. G. Hedberg and R. Wright, ―Who benefits from
virtuality?‖, Comput. & Educ., vol. 34, no. 3/4, pp. 163-176,
Apr./May 2000.
[38] L. Marini et al, ―Versus: A Framework for General Content-
Based Comparisons‖, in IEEE eScience, Chicago, IL, 2012.
[39] L. Diesendruck et al, ―Digitization and Search: A Non-
Traditional Use of HPC‖, in IEEE eScience Workshop on
Extending High Performance Computing Beyond its Traditional
User Communities, Chicago, IL, 2012.
[40] L. Diesendruck et al, ―A Framework to Access Handwritten
Information within Large Digitized Paper Collections‖, in IEEE
eScience, Chicago, IL, 2012.
[41] L. Diesendruck et al, ―Using Lucene to Index and Search the
Digitized 1940 US Census‖, in Extreme Science and
Engineering Discovery Environment, San Diego, CA, 2013.

More Related Content

What's hot

Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
ijitjournal
 
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query EngineMeasuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
parekhnikunj
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
vty
 
Ijcatr04051015
Ijcatr04051015Ijcatr04051015
Ijcatr04051015
Editor IJCATR
 
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
University of Piraeus
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
ijdms
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
Enno Meijers
 
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
ijcsit
 
Umesha naik metadata
Umesha naik metadataUmesha naik metadata
Umesha naik metadata
Umesha Naik
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
ijcsit
 
Dq36708711
Dq36708711Dq36708711
Dq36708711
IJERA Editor
 
11. grid scheduling and resource managament
11. grid scheduling and resource managament11. grid scheduling and resource managament
11. grid scheduling and resource managament
Dr Sandeep Kumar Poonia
 
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Amit Sheth
 
Dynamic Resource Provisioning with Authentication in Distributed Database
Dynamic Resource Provisioning with Authentication in Distributed DatabaseDynamic Resource Provisioning with Authentication in Distributed Database
Dynamic Resource Provisioning with Authentication in Distributed Database
Editor IJCATR
 
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
IJET - International Journal of Engineering and Techniques
 
SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING I...
SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING I...SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING I...
SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING I...
ijait
 

What's hot (18)

Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
 
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query EngineMeasuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
Ijcatr04051015
Ijcatr04051015Ijcatr04051015
Ijcatr04051015
 
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...
 
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONSDATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
DATABASE SYSTEMS PERFORMANCE EVALUATION FOR IOT APPLICATIONS
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
 
Umesha naik metadata
Umesha naik metadataUmesha naik metadata
Umesha naik metadata
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
 
Dq36708711
Dq36708711Dq36708711
Dq36708711
 
11. grid scheduling and resource managament
11. grid scheduling and resource managament11. grid scheduling and resource managament
11. grid scheduling and resource managament
 
592-1627-1-PB
592-1627-1-PB592-1627-1-PB
592-1627-1-PB
 
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...
 
Dynamic Resource Provisioning with Authentication in Distributed Database
Dynamic Resource Provisioning with Authentication in Distributed DatabaseDynamic Resource Provisioning with Authentication in Distributed Database
Dynamic Resource Provisioning with Authentication in Distributed Database
 
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
[IJCT-V3I2P32] Authors: Amarbir Singh, Palwinder Singh
 
SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING I...
SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING I...SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING I...
SYSTEMATIC LITERATURE REVIEW ON RESOURCE ALLOCATION AND RESOURCE SCHEDULING I...
 

Similar to ARTICLE_MEDICI

Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...
IRJET Journal
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
ASIS&T
 
Semantic Web concepts used in Web 3.0 applications
Semantic Web concepts used in Web 3.0 applicationsSemantic Web concepts used in Web 3.0 applications
Semantic Web concepts used in Web 3.0 applications
IRJET Journal
 
Designing and configuring context-aware semantic web applications
Designing and configuring context-aware semantic web applicationsDesigning and configuring context-aware semantic web applications
Designing and configuring context-aware semantic web applications
TELKOMNIKA JOURNAL
 
Open Source Platforms Integration for the Development of an Architecture of C...
Open Source Platforms Integration for the Development of an Architecture of C...Open Source Platforms Integration for the Development of an Architecture of C...
Open Source Platforms Integration for the Development of an Architecture of C...
Eswar Publications
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
The HDF-EOS Tools and Information Center
 
Message Oriented Middleware for Library’s Metadata Exchange
Message Oriented Middleware for Library’s Metadata ExchangeMessage Oriented Middleware for Library’s Metadata Exchange
Message Oriented Middleware for Library’s Metadata Exchange
TELKOMNIKA JOURNAL
 
Digital Preservation
Digital PreservationDigital Preservation
Digital PreservationSmita Chandra
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
smtcd
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
mantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
mantatheralyasriy
 
Digital library-overview
Digital library-overviewDigital library-overview
Digital library-overview
Ankit Dubey
 
Dp Geosc Info Presentation Final Version 2
Dp Geosc Info Presentation Final Version 2Dp Geosc Info Presentation Final Version 2
Dp Geosc Info Presentation Final Version 2Smita Chandra
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
EUCLID project
 
Digital library
Digital libraryDigital library
Digital library
Rukhsar perween
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
DigitalPreservationEurope
 

Similar to ARTICLE_MEDICI (20)

LeafletMedici_f2f
LeafletMedici_f2fLeafletMedici_f2f
LeafletMedici_f2f
 
Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...Review on an automatic extraction of educational digital objects and metadata...
Review on an automatic extraction of educational digital objects and metadata...
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Semantic Web concepts used in Web 3.0 applications
Semantic Web concepts used in Web 3.0 applicationsSemantic Web concepts used in Web 3.0 applications
Semantic Web concepts used in Web 3.0 applications
 
Designing and configuring context-aware semantic web applications
Designing and configuring context-aware semantic web applicationsDesigning and configuring context-aware semantic web applications
Designing and configuring context-aware semantic web applications
 
Open Source Platforms Integration for the Development of an Architecture of C...
Open Source Platforms Integration for the Development of an Architecture of C...Open Source Platforms Integration for the Development of an Architecture of C...
Open Source Platforms Integration for the Development of an Architecture of C...
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Message Oriented Middleware for Library’s Metadata Exchange
Message Oriented Middleware for Library’s Metadata ExchangeMessage Oriented Middleware for Library’s Metadata Exchange
Message Oriented Middleware for Library’s Metadata Exchange
 
Seminario Sobre Datasets Consorcio Madrono
Seminario Sobre Datasets Consorcio Madrono Seminario Sobre Datasets Consorcio Madrono
Seminario Sobre Datasets Consorcio Madrono
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Digital library-overview
Digital library-overviewDigital library-overview
Digital library-overview
 
Dp Geosc Info Presentation Final Version 2
Dp Geosc Info Presentation Final Version 2Dp Geosc Info Presentation Final Version 2
Dp Geosc Info Presentation Final Version 2
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Digital library
Digital libraryDigital library
Digital library
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 

ARTICLE_MEDICI

  • 1. Medici 2: A Scalable CMS for Cultural Heritage Datasets Motivation, Capabilities, Future Directions Constantinos Sophocleous Computation-based Science and Technology Research Center The Cyprus Institute Nicosia, Cyprus Luigi Marini National Center for Supercomputing Applications University of Illinois at Urbana- Champaign Urbana, Illinois, USA Ropertos Georgiou Science and Technology in Archaeology Research Center The Cyprus Institute Nicosia, Cyprus Mohammed Elfarargy International School of Information Science Bibliotheca Alexandrina Alexandria, Egypt Abstract— Digitizing large collections of Cultural Heritage (CH) resources and providing tools for their management, analysis and visualization is critical to CH research. A key element in achieving the above goal is to provide user-friendly software offering an abstract interface for interaction with a variety of digital content types. To address these needs, the Medici content management system is being developed in a collaborative effort between the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, Bibliotheca Alexandrina (BA) in Egypt, and the Cyprus Institute (CyI). The project is pursued in the framework of European Project LinkSCEEM2 and supported by work funded through the U.S. National Science Foundation (NSF), the U.S. National Archives and Records Administration (NARA), the U.S. National Institutes of Health (NIH), the U.S. National Endowment for the Humanities (NEH), the U.S. Office of Naval Research (ONR), the U.S. Environmental Protection Agency (EPA) as well as other private sector efforts. Medici is a Web 2.0 environment integrating analysis tools for the auto-curation of un-curated digital data, allowing automatic processing of input (CH) datasets, and visualization of both data and collections. It offers a simple user interface for dataset preprocessing, previewing, automatic metadata extraction, user input of metadata and provenance support, storage, archiving and management, representation and reproduction. Building on previous experience (Medici 1), NCSA, and CyI are working towards the improvement of the technical, performance and functionality aspects of the system. The current version of Medici (Medici 2) is the result of these efforts. It is a scalable, flexible, robust distributed framework with wide data format support (including 3D models and Reflectance Transformation Imaging-RTI) and metadata functionality. We provide an overview of Medici 2’s current features supported by representative use cases as well as a discussion of future development directions. Kenton McHenry National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, Illinois, USA Index Terms—Medici, Archaeology, Cultural Heritage, Content Based Analysis, Un-curated Data, Auto-curation, Web 2D, Web 3D, Reflectance Transformation Imaging (RTI), RTI- to-3D. I. MOTIVATION – INTRODUCTION CH research is producing huge amounts of data at an ever- increasing rate and size due to the continuous development of more advanced and detailed data extraction technologies and methods. Such research efforts include the pilot projects undertaken by the Cyprus Institute during the establishment of an advanced imaging center for CH and archaeology as part of the LinkSCEEM-2 Cultural Heritage Work Package. The three pilot projects used cutting-edge visualization technologies such as RTI. They generated scientific digital data of archaeological objects and artifacts, such as cylinder seals, ceramics, jewelry and other museum collections, as well as wall paintings and icons from the world known medieval churches of Cyprus. In these contexts, the digital documentation, storage and management of the aforementioned data as well as providing easy access to researchers is becoming ever-more critical. Flexible database storage, processing, metadata extraction and presentation frameworks are needed in order to satisfy these requirements. The development of such database/data repository systems establishes a long-term solution for accessing and storing various kinds of CH data and metadata. These systems must be flexible in terms of being able to have additional data formats support incorporated over time, thus ensuring their continuous update with latest technological developments; thus maintaining the CH scientific communities’ interest. The developed framework will be complementary to repositories such as the Inscriptifact database and online library [1]. Having a specialized resource repository for the needs of
  • 2. specific datasets (such as Inscriptifact) must be coupled with a general digital storage and management framework also capable of handling types of data that are not the focus of Inscriptifact. This collaboration will immediately allow the scientific exploration of newly generated CH datasets by an already established user community. Medici has and is being developed to address the above. It is a Web 2.0-based general multimedia content management system capable of semantic content management and service/cloud-based workflows. It supports a broad range of throughput-intensive research techniques and community data management. Medici provides scalable storage and media processing, straightforward user interfaces, social annotation capabilities, preprocessing and previewing, metadata extension and manipulation, provenance support and citable persistent data references [2]. Through Medici, users can upload datasets in a variety of formats, including 3D and RTI formats widely used in cultural heritage research and collections [3][4]. Dataset metadata is obtained by extraction services run as data preprocessors and presented to the user along with metadata input by the community for each dataset. Utilizing these extractors Medici is capable of deriving semantically meaningful features for content-based comparison between datasets (e.g. textual data from handwritten text). Previews of large datasets in a variety of formats are also extracted and viewed to avoid the need of (down) loading the whole content on the user’s system or finding the needed software to examine the contents of a file. Medici 2.0’s scalability/parallelization, flexibility and robustness as well as its overall performance are improved by decoupling the preprocessor software from the main server (with extractors being allowed to run on different machines in a distributed architecture). A powerful and flexible RDF-enabled database management system is used for storing datasets. Dataset files can be uploaded using a variety of interfaces. Use of the RDF standard makes the datasets’ metadata open and portable. Medici uses the latest Web technologies to display large images smoothly (Seadragon, zoom.it [5]), 3D graphics (X3DOM [6]), video (FFMPEG[7]) and RTI (SpiderGL[8]) along with a special feature added to the RTI web visualization developed by Bibliotheca Alexandrina to extract 3D models from RTI files (RTI-to-3D) in order to enhance the possibilities of the technology. Different deployments of Medici can be hosted by different institutions and parts, thus allowing the hosting party better control of access to resources. II. FUNCTIONS AND ARCHITECTURE A diagram describing the dataset upload,preprocessing and previewing process is displayed in Fig. 1. The systemis composed of the following parts: A. Web server a. Functions and Architecture Datasets and files are uploaded to the web serverusing one of an array of methods. One of them is regular HTML forms (both when creating a new dataset to upload its first file as well when viewing a dataset to add files to the dataset). However, files can also be uploaded in other ways, including uploading individual files that do not belong to a dataset. The web server is responsible for sending a request to the database to store the file or dataset and send a message to the RabbitMQ message broker [9] defined in the server’s configuration. This message will contain the necessary information pertaining to the file or dataset that are needed by the RabbitMQ broker to distribute the file or dataset’s preprocessing jobs to the available extractors/preprocessors as well as by the extractors/preprocessors themselves to decide how to process the file or dataset.Datasets contained as files in a zipped archive are inspected by the web server. These files are unzipped to identify the type of dataset whose preprocessing will be handled by extractors on the RabbitMQ bus. The web server can then passively await for any auto- extracted file or dataset metadata or previews (generated by the extractors-preprocessors that received a job for the file or dataset) to be uploaded to the server through a REST API [10]. When that happens, the web server calls the database to save the previews/metadata and associate them with the file or dataset. Extractors can be chained. This means that the resulting file output from an extractor may be fed as input to another extractor, thus allowing even greater flexibility regarding execution of preprocessing subtasks common to the preprocessing of different file or dataset types. For this to happen, an extractor that outputs an intermediate result can upload it as an intermediate result for the file or dataset using the server’s REST API. The server then calls the database to save the intermediate result and a new job description is sent to RabbitMQ, now containing the identifiers of both the original file or dataset (with which the final result will be associated) and the intermediate file (which the extractor(s) working on the new job need to download and process). The web server is also responsible for selecting and setting up the previewers that will display the contents of a file or dataset based on the type (i.e. file format) of a file or dataset. Other functions include working with custommetadata and searches,an interface for adding metadata to a dataset by users based on a mapping dependent on the institution managing the Medici implementation, searching for datasets having metadata satisfying a query and text-based search.The server also allows social annotation of datasets (i.e. tagging, notes, comments, custommetadata). b. Technologies used The web server is built on the Play web application framework [11][12], written for allowing coding in Scala and Java. As supported by the framework, the server uses the model-view-controller (MVC) architectural pattern [13]. It relies on a number of plugins for communicating with the RabbitMQ broker, communicating with the database and authenticating users. It uses dynamic HTML (HTML version
  • 3. 5) webpages generation (views) according to the results of the processing of the input (Scala-based controllers). The models (defined in Scala) are closely associated with collections in the database. Preprocessors and scripts (i.e. previewers) running on users’ browsers communicate with the server using a REST API. B. Preprocessors / Extractors Preprocessors are used for both extracting metadata from datasets and files as well as generating previews for them. Each extractor listens to a RabbitMQ broker via its own named job queue, which is bound to particular routing keys. It is by these keys that RabbitMQ forwards the extraction job messages to the extractors’ queues. Each job message is handled separately. Whenever a job is received, the extractor first downloads the dataset or file having the identifier stated in the job description from the web server via the server’s REST API. What follows depends on the task each extractor is responsible for. Extractors in general use integrated libraries or external system calls to third-party software to process dataset files to generate the previews or extract the metadata. The third-party software include for example 3D model and video processing programs that are installed to the extractor’s environment and are called through Java command-line calls. Finally, the extractor returns the result back to the web server:  If the result is an intermediate file for use in extractor chaining, the file is uploaded to the server as an intermediate result to be forwarded to other extractors through RabbitMQ to continue processing.  If the result is a final preview, it is returned to the server as a preview and then a command is sent to the serverto associate the preview with the original dataset or file.  If the result is a set of dataset or file metadata, a command is sent to the server to associate the metadata with the original dataset or file. Many extractors of the same type can be run simultaneously on the same or different machines and have the jobs sent by a Medici web server be distributed among themby RabbitMQ as long as they are registered with the same RabbitMQ routing exchange. Also, an extractor may receive messages from more than one Medici web server instance as long as the web servers are registered with the same RabbitMQ Fig. 1. Dataset upload, preprocessing and previewing process.
  • 4. routing exchange. This allows for great flexibility, scalability and robustness through a distributed architecture. Extractors can be implemented using Java or Python. C. Previewers Previewers are used for viewing the contents of a dataset or a file from within the user’s browser. On the server side, the server selects which previewer to be launched for each preview of each file in the dataset when a dataset page is requested fromthe web server. The selection is done by comparing each preview’s file type with the file types accepted by each previewer. The latter information is held as JavaScript Object Notation (JSON) data in a public file for each previewer. Before the previewer scripts are sent to the browser along with the rest of the dataset or file page to be executed, certain variables are set dynamically by the server, mostly for the user’s browser to be able to find the files needed by the previewer on the server (e.g. the dataset files’ URLs). The previewers use dynamically-added JavaScript scripts downloaded from the web server at runtime and added to the webpage’s DOM (Document Object Model) using jQuery [14]. The previews to be displayed,as well as any other needed files, are also downloaded at runtime using AJAX calls. After the preview is ready to be displayed, the preview HTML is appended to the webpage’s HTML. The previewers use browser-side JavaScript and jQuery. D. Database The NoSQL MongoDB is used as the system’s database management system (DBMS). Selecting a NoSQL DBMS offers advantages important for Medici. Flexibility is a key attribute for Medici, as CH community requirements change constantly as described in the Introduction. This gives value to the horizontal scalability of NoSQL. Simplicity of design of NoSQL databases allows the addition of complex features to Medici (such as community-generated dataset metadata) without too much difficulty. E. Job broker If the web server is the heart of Medici 2, the job broker is its veins. The role of the RabbitMQ message broker is to take preprocessing messages fromthe web server that are sent once a dataset or file is uploaded and distribute them to the extractors that can handle the jobs. Each extractor, once activated registers one or more delivery queues on RabbitMQ, on which it listens. Each queue is associated with a particular routing key set, which defines which routing keys a job can have in order for it to be routed to that queue (and thus to the extractor listening to the queue). Each routing key set is defined by a four-field termlike the following: localhost.file.model.obj.# In the above, the first field defines the URL or IP of the machine on which the Medici web server implementation sending job messages for the extractors is based. The second field defines whether the extractor takes jobs pertaining to preprocessing a file or a whole dataset. The third and fourth field are for the MIME types of the files (or the types of datasets) that the extractor accepts, and the fifth the possible subtype ofthe datasets orfiles. The syntaxof the set definitions uses the conventions specified in the RabbitMQ tutorials [9]. III. METADATA Medici can accept, persist and process two kinds of metadata for each dataset or file:  Automatically-generated. Generated by metadata extractors and associated with the dataset or file through a REST API call to the web server by the extractors. They usually pertain to the more technical aspects of datasets and files. Their subjects, predicates and objects can vary from dataset orfile type to dataset or file type and from Medici implementation to Medici implementation depending on what metadata are available for each dataset type and dataset, as well as the type of the extractor which extracted the metadata. They cannot be modified by users.  Community-generated. Generated by the users through completion of an HTML formfor each dataset. Their allowed subjects, predicates and objects are defined on an institutional schema that can differ for each Medici deployment. The schema is formulated by each deployer or institution at its discretion, however it is recommended for the schema to be compatible with the Dublin Core content description model [15] to facilitate metadata sharing between different repositories. The CIDOC Conceptual Reference Model is based on Dublin Core and is specialized for CH metadata and can also be used to facilitate metadata sharing among the CH scientific community [16]. The input metadata, either coming from an extractor or an HTML form, are turned to JSON structures by the extractor or a browser-side script respectively and sent to the web server, where they are parsed back. Metadata is understood by the web server and the JavaScript script manipulating them on the browser-side (for community-generated metadata) as a tree structure (for community-generated metadata) or a list of tree structures (for auto-generated metadata). This distinction applies because different extractors may generate different sets of metadata with no contact points between them for the same dataset or file and so the reasonable option here is to present them as a list. On the other hand, community-generated metadata are generated by users by modifying the whole metadata structure of a dataset using an HTML form at the same time, meaning that they can be merged. Metadata are formulated as tree structures because each predicate of a metadata subject is either a property of the file or dataset or a subproperty of another property, all describing the same file or dataset. An example of a simple metadata tree (here for community-generated metadata) is presented in Fig. 2. The tree structures (one for each dataset) are stored on
  • 5. the web server’s application layer as Scala maps. A. Automatically-generated metadata There are currently metadata extractors for images and RTIs, with more to be added in the future. For a discussion of the metadata extractors for each file type, see the sections discussing Medici’s features for their respective types. B. Community-generated metadata The schema for the community-generated metadata is defined in two comma-separated value files.  The first file stores the name of each metadata node along with whether the node is a leaf in the metadata tree or a higher-level node. Leaves consist of string data which are input by users and serve as the objects of the description of the dataset, answering the questions generated by the tree paths leading to them. Branches are subproperties of higher-level properties or top-level properties. In the example of Fig. 2, the leaf node ―Title: Ancient Mari‖ is a leaf answering the question ―what is a/the Name of a/the Project Location of a/the Project associated with this dataset?‖.  The second file stores the possible relationships associating nodes with nodes or leaves, along with the cardinalities of each relationship. The above are automatically enforced by the systemwhen the user adds or modifies a dataset’s community-generated metadata. C. Metadata search Users can search for datasets that contain user-entered metadata that satisfy formulated queries. The queries are formulated by filling out an HTML form similar to the HTML form for adding/ modifying community- generated metadata. For each level of the metadata tree, the user can select which properties must be satisfied by selected nodes at that level. For example, search for datasets having a Project having Cyprus as a Country and having a Project Location, with the Project Location having ―Choirokitia‖ as a Name, and the Project also having Responsible Institution having a Name of ―Example Institution‖, as in the query shown in Fig. 3. Users will be able to use a logical expression-like query formulation to generate queries. For example, he/she may wish to search for datasets satisfying the query above but having Cyprus as a Country OR having a Project Location for which the rest apply, or having Cyprus as a Country and NOT having any Project Locations for which the rest apply. The JSON-formulated query is sent to the web server, which then parses it as a Scala map, compares it to the community-generated metadata maps of each existing dataset, and returns the list of datasets satisfying the query. The comparison is done using a Depth-First-Search-esque algorithm [17] that recursively checks for all paths in the query tree where there is a path in the dataset’s community-generated metadata tree that satisfies the path. The server formats the result list as HTML and returns it to the client through AJAX to be displayed. IV. IMAGES The support of multi-format and multi-resolution image storage and presentation is highly beneficial to CH communities due to the development of image acquisition instruments, that satisfy current demands of research analysis in archaeology. Medici currently supports the JPEG/JPG, PNG, TIFF, GIF and BMP image formats. Regarding low-resolution images, the image is displayed by directly embedding it in the image previewer’s webpage. However, for high-resolution images, a dedicated gigaimaging view extractor is used. A. Deep Zoom Viewing of large images efficiently and without loss of quality is achieved by constructing an image pyramid using a technique similar to frustum culling. Specifically, when a large image is to be viewed in a limited screen space, only a part of it can be viewed in full quality and the rest of the information in the image that have no interest for the user is essentially occluded and thus it is not necessary for it to be transmitted. Consequently, data transmission size is limited by tiling the Fig. 2. A simple metadata tree for community-generated metadata. Fig. 3. Example community-generated metadata search.
  • 6. image at various scales in a preprocessing step executed server-side so that only the tiles needed for the currently shown scale and position are transmitted. Deep Zoom, a technology developed by Microsoft as part of Microsoft Silverlight [18], works exactly like that- with the user initially seeing a lower- resolution view of the entire image which they can then zoom in to see portions of the image in more detail. B. Zoom.it Zoomit.js1 is a JavaScript library also used by the Zoom.it web service [5]. It relies on the technology behind the Seadragon gigaimaging viewer, which utilizes Deep Zoom. This library is used by Medici 2 to view large images that have already been split up into various levels of Deep Zoom tiles (i.e. the image pyramid) by a dedicated image pyramid extractor. The pyramids are loaded on the web server together with an XML file (DZI) containing basic image metadata needed by zoom.it. C. Gigaimage processing When an image is uploaded on the web server, a message is sent to RabbitMQ for any extractors that can process images. If there are any active gigaimaging previewers registered with RabbitMQ, a gigaimaging previewer will receive the job message for the image. The job message will contain the image’s size. If the image’s size is below a threshold input by whoever is responsible for calling/ activating the extractor program as an input parameter, the job is rejected and no XML file for zoom.it or the image pyramid tiles are generated. This means that when a user requests to view the dataset containing the image, no zoom.it preview will be found by the web server and therefore the original image will be shown, embedded with an HTML tag. If the size of the image exceeds the threshold, first the zoom.it XML file is generated and uploaded to the web server as a preview, it being given an identifier. That identifier will be used to associate the image tiles with the preview and thus with the original image. Then the image tiles for all the image zoom levels are generated, concurrently uploaded back to the web server and associated with the preview. Zoomit.js accesses the image’s pyramid tiles via a virtual directory structure generated on the web server for each preview. This structure is implemented with Scala functions that take the parts of the virtual directory path as parameters and return the requested tile after querying the database where the tiles are stored. These parameters include the tile’s level and 2D-coordinates of its position on the level. An example of a zoom.it preview can be seen in Fig. 4 and Fig. 5, where it is used for visualizing ancient cylinder seals and byzantine wall paintings fromCyprus. D. Image technical metadata There are currently two different image metadata extractors, any of which can be used at the discretion of the deploying institution, each extracting different sets of 1 http://zoom.it/h.js Fig. 4. A two-dimensional giga-image of a cylinder seal dating to the 13th century BC taken with a large format camera system and previewed in Medici’s 2.0 web viewer. Courtesy of the Bank of Cyprus Cultural Foundation. metadata. The first uses the ImageMagick image processing suite [19] and the second uses the Java-basedimage metadata extractor library developed by Drew Noakes [20]. The ImageMagick extractor extracts many more details, even though it needs ImageMagick to be installed on the machine running the ImageMagick extractor. V. 3D MODELS Complete digital documentation in archaeological and Cultural Heritage research is a multidimensional process. New opportunities and challenges for the development of web-based Virtual Reality (VR) applications in these research fields have been the direct consequence of advances in the field of three- dimensional representation and Internet-related technology [21]. High accuracy and multi level of detail (LOD) models can be obtained by using various methods for 3D model creation such as laser scanning technology, photogrammetry and 3D
  • 7. Fig. 5. Giga-image of byzantinewall paintings takeninside the monastery of St. John Lambadistis in Cyprus andpreviewedin the webviewer. Courtesy of the Department of Antiquities in Cyprus. modeling (architectural reconstruction techniques) [22]. Nonetheless, the fusion of these techniques during post- processing may occur, thus leading to the creation of highly complex models with a diversity of information encapsulated within a single file. The development of a web-based application for user access and interactive exploration of three- dimensional models has been studied and worked upon since 1995. The WEB 3D consortium composes open standards for real time 3D data and models exchange over the Internet. VRML became the first web 3D format [23]. The latest successor of VRML is X3D, providing a flexible solution for real time 3D representation and communication for Medici. The system architecture of the 3D processing pipeline consists ofopen source tools and free software, thus providing full transparency on adopted methodology and data processing methods providing a cost effective solution both for server and client. The main feature of this web VR system previewer is to provide the user with a completely new visit experience based on a free interactive exploration interface of the object (i.e., not constrained by any predefined pathway) and on the opportunity to get more detailed information on specific parts of interest. Medici currently supports 3D file formats that are widely used in CH such as Wavefront (obj), Polygon File Format (ply), X3D (x3d) format and 3D models embedded in PDF (3DPDF). Models in the obj or ply format are converted to x3d and then translated to their center of rotation. ―Virtual scenes‖ containing multiple objects can also be previewed in 3D fly- through and walk-through mode (primary mode is ―examine‖ mode). If the model needs optimization for smooth transmission over the internet, it is decimated in an adequate LOD, before being converted to the X3DOM format [6] (that is, HTML5 files) and sent back to the server as a preview to be added to the DOM of the 3D previewer and displayed upon request by the user. There are also previewers for 3DPDF (also used for general PDF viewing). One of them was written by NCSA and uses Mozilla’s PDF.js library [24], while the other is a simple integration of Adobe Acrobat using the associated Acrobat browser plugin. A. X3D extraction The X3D extractor is responsible for converting obj and ply models to X3D and decimating them if needed through external calls to MeshLab’s Meshlabserver [25] and setting job identity flags on each of the models uploaded by the client. For the X3D extractor to work, MeshLab must be installed on the extractor’s machine. The extractor accepts jobs from RabbitMQ for obj, ply and X3D. However, if the 3D model is not in X3D form, MeshLab’s meshlabserver command-line interface is called by the extractor to generate the equivalent X3D model. On the other hand, if the model’s file size is above a bound set by the client for calling/activating the extractor programas an input parameter, a dynamically-generated meshlabserver re- meshing/simplification script is used. A new call to decimate the model to the adequate LOD is now generated according to the client’s preset boundaries in order to reduce its complexity and thus its size. Finally, a flag is set on the job received by the extractor indicating that the first phase of the processing of the model was completed and the post-processed model is uploaded back to the web server as an intermediate result together with the model’s new job flag. The web server then transmits the intermediate result to the HTML5-X3DOM extractor for preparation for front-end display. B. HTML5-X3DOM extraction Converting X3D models to X3DOM/HTML5 facilitates and optimizes their integration with the X3D previewer’s HTML DOM structure. This is done by the X3DOM extractor, which uses aopt, a command-line tool bundled with the InstantReality framework [26]. For the HTML5-X3DOM extractor to work, InstantReality must be installed on the extractor’s machine.
  • 8. C. Models with separate materials/texture files In special cases and in order to have obj and X3D files display their geometry with the adequate color information, models at times have separate material/texture files. Medici 2.0 considers this case. Those files can be still previewed correctly if they are uploaded contained in a ZIP file with their corresponding materials/textures. The web server uses a utility function that unzips the file and uses rules regarding the existence of files of certain formats in the zipped dataset to resolve the type of model contained in the ZIP file (whether it is a zipped obj, a zipped X3D or something different). The type discovered is sent as part of the routing key to RabbitMQ when the file’s extraction job is sent. After the automated post-processing of the model with its textures from the X3D extractor, the HTML5-X3DOM extractor completes work on the X3D preview and sends back the HTML5 X3DOM as the file’s final preview. D. X3DOM previewer The X3DOM previewer works by dynamically downloading the HTML5 file in which the X3D preview of the model is encapsulated and embedding it into an X3D HTML5 element. Afterwards, events thrown by the new elements activate functions in the X3DOM JavaScript library, initializing the model’s preview through an X3D scene. The preview allows (among others) rotating the model, zooming, panning, displaying model statistics, changing rendering geometry mode from default view to wireframe or vertex view, with or without texture and surface depth map simulation. X3dom.js2 accesses any image textures via a virtual directory structure generated on the web server for each preview, which for files uploaded as ZIP files simulates the original ZIP file’s internal directory structure. This structure is implemented with Scala functions that take the parts of the virtual directory path as parameters and return the requested image texture after querying the database where the image textures are stored. These parameters include the (virtual) path to the image texture, as it was in the original ZIP file containing the textures of the original model. 3D model preview examples (with the depth maps and model statistics also being displayed) are shown in Fig. 6, 7. VI. RTI The application of Reflectance Transformation Imaging Technology (RTI) has offered great possibilities for research as well as for the documentation and preservation of cultural heritage objects and works of art. Polynomial Texture Map (PTM) is a subset of the RTI method and was developed by Hewlett-Packard Imaging Labs at the beginning of the past decade [27][28] and enhanced by the West Semitic Project at University of Southern California (USC) and Cultural Heritage Imaging (CHI) [4]. PTM addresses the challenges in the photography of objects’ faded, damaged or badly preserved surfaces especially when they feature inscription, decorative 2 http://www.x3dom.org/download/dev/x3dom.js Fig. 6. Hellenistic-Romantheater,300BC-365AD, previewedin the web 3D viewer. Courtesy Cyprus Dept. of Antiquities-University of Sydney. Fig. 7. Ancient vessel fromthe Pyrgos area of Cyprus, previewed in the web viewer. Courtesy of the 3D-coform EU project.
  • 9. Fig. 8. An ancient replicacoin that shows the capabilities of RTI photography. Self-shadows andinterreflections are such derivatives presentedon the upper- left cornerin comparison withthe default mode onthe lower-left side that has no such properties. Fig. 9. This snapshot is taken from the Inscriptifact desktop RTI viewer developedby West Semitic Project at USC where the user is able to interact with the artifact usinga ―virtual torch‖fordynamic illumination and surface enhancement. patterns and designs. They surfaces of an array of archaeological objects and works of art such as stone or clay, marble, plaques, coins, paintings, mosaics, sculpture, jewellery and other small objects (see Fig. 8), present optimal study cases for the utilization of RTI technology. A. RTI/PTM viewer The PTM algorithm synthesizes the data from images taken under varying lighting directions to create a single image that can be examined on a RTI/PTM viewer with a ―virtual torch‖. The viewer allows the user to move the light angle intuitively in real time, so that the combination of light and shadow representing the relief features of the object’s surface can be freely altered. RTI also permits the enhancement of the subject’s surface shape, color and luminance attributes, which allows one to extract detail out of the surface that cannot be otherwise derived (see Fig. 9) [1]. Medici currently supports viewing real-time web RTI through a Java applet developed by Clifford Lyon [29] (see Fig. 10). There are three stand-alone desktop viewers that can interpret RTI/PTM files but for the web just the viewer mentioned above [30][31][32] is used. The current online RTI viewer has pitfalls concerning incompatibility with current W3C standards. In the future, the current viewer will be replaced by one based on the SpiderGL library [8], which is based on WebGL. This will allow direct execution of the viewer by the user’s browser without the need of using a third-party plugin, improving cohesion between the user’s Graphics Processing Unit, the viewer and the user’s browser, and also improving security. Medici 2.0 will currently avoid the use of this viewer, but instead will use an innovative method for extracting 3D models from RTI files for front-end previewing. Based on this the user can recognize the object and then (down)load the original high- resolution RTI file to his/her desktop for actual interaction. B. RTI metadata In the same manner as traditional web images, RTI technical metadata is extracted by a standalone preprocessor. The preprocessor downloads the RTI file from the web server and then reads its topmost lines for the types of metadata included in the most common RTI file formats as defined in the PTM file format specification by Hewlett-Packard [28]. The metadata is parsed to JSON and returned to the web server to be associated with the RTI graphic. Fig. 10. Replica of anancient coinfrom Petra, Jordan, viewed using the current PTM viewer.
  • 10. VII. EXTRACTING 3D MODELS FROM RTI The ability to interactively change light direction and apply various filters makes RTI one of the best techniques for examining archaeological artifacts. Human perception of highlights and shadows in a 2D image of a surface helps the viewer interpret the surface topology. However, this ability to perceive surface topology varies from person to person. Also, some surfaces with varying contrast and surface characteristics might be difficult to perceive even with dynamic lighting. Further studies on that issue can be found in [33]. RTIs capture both color and geometric properties of the object which makes it possible to use them to reconstruct 3D approximations of the original surfaces. Doing this greatly improves the perception of the artifacts’ surface details. The approach used is based on an approach similar to that of photometric stereo [34] and uses three 2D texture maps that are extracted from the RTI to then reconstruct the final 3D surface given known lighting directions. Luminance info is used to estimate a normal map which is integrated to obtain a height map. The height map is used to build a 3D surface by shifting vertices of a Delaunay mesh. Color info is used to extract a uniformly lit diffuse map that is used for surface texture mapping. A. Maps used a. Diffuse Map Diffuse maps (a.k.a. albedo maps) define the main color of the surface. A good diffuse map should contain only color information without any directional light effects, inter- reflections, specular highlights or self-shadowing. Having any of above in a diffuse map means that the object will respond in an incorrect way to virtual incident light; such as casting shadows in the wrong directions or showing highlights where no direct incident light hits the object. PTM allows the extraction of accurate diffuse maps because the luminance and chromaticity information are stored separately. In LRGB PTM format the color information exists out of the box. For RGB PTM format, similar results can be obtained by casting light perpendicular to surface per texel to make sure all texels get the same amount of light. This results in a uniformly lit texture that is ideal for use as a diffuse map. b. Normal Map A Normal Map is a texture map containing surface normals at each texel stored as RGB. Directional Lighting information for each pixel is already stored in a PTM, which makes it possible to get a very good estimate of the surface normal at that pixel. Since Luminance is maximum when incident light is perpendicular to surface (i.e. incident light vector = surface normal), the surface normal at each texel can be estimated as the orientation maximizing light response. c. Height Map Height maps are grayscale texture maps in which each pixel’s white level corresponds to the height value of a vertice of a 3D grid mesh (usually in the Z direction). Height maps used here are generated by integrating normal maps obtained as described in the previous section. Normals at every surface point are perpendicular to the height map gradient. Integration is not always guaranteed to yield precise results since it is based on an estimated normal map. Also, information about surface discontinuities is lost in normal maps. The 3D models generated are good approximations of the real surface. Height map generation is implemented in an iterative manner in which each iteration improves contrast between low and high points. Height map pixels are initiated to zero height. On each new iteration, each pixel’s height is slightly modified according to the slopes of the surrounding pixels’ normals and their heights in the current iteration. Contributions from all eight surrounding pixels will be averaged and added to the current height. Fig. 11 shows a summary of surrounding pixels contributions and associated signs, where Nxand Ny are the X and Y components, respectively, of the surface normal. -Nx +Ny +Ny +Nx +Ny -Nx Current Pixel +Nx -Nx -Ny -Ny +Nx -Ny Fig. 11. Contributions of surroundingpixels tothe current pixel height when generating the height map from the normal map. B. Testing of algorithm In testing done by BA, a 2D Delaunay triangulation was used to generate rectangular grids that were deformed using resulting height maps and texture-mapped using diffuse and normal maps. The quality of the generated models improved proportionally with number of iterations used. Iterations beyond 100 000 iterations had no noticeable effect. Further testing of the algorithm took place at CyI. The testing showed that for low and mezzo-relief RTI the algorithm generated 3D models of good quality with no more than 10 000 iterations needed. An example of testing the algorithmis shown in Fig. 12. C. Implementation The algorithm will be implemented as a standalone command-line program that will take as input the desired number of height map generation iterations, the desired height modifier for the generated 3D model and the RTI file and output the generated 3D model. This program will be called using a standard extractor taking the above input parameters as
  • 11. Fig. 12. Example ofa PTMfile andits 3D model generatedby the RTI-to-3D algorithm. its input. The result of the extraction will be sent to the 3D extractors for further processing, as it is for all 3D models. VIII. FUTURE DIRECTIONS A. Immediate plans The community-generated metadata search will be enhanced. The user will be able to use logical expression-like query formulation to generate queries with AND, OR, NOT operators. Technical metadata will also be searchable. Integration of more MeshLab features in Medici will be attempted (if supported by X3DOM). These may include shading modes, a measuring caliber, community-generated annotations on the model display and being able to change the view’s light source. Support for more file formats and model types will be added, including printer-ready stereolithography (STL) diagrams, Virtual Reality spherical panoramas and videos. B. Further designs Medici metadata will be able to be integrated with Inscriptifact [1] and other Cultural Heritage repositories. Concerning 2D high-resolution imaging, the IIPImage framework [35] will serve as a mediator for flexibility on the presentation of more specialized CH imaging (e.g. able to support multispectral imaging and real time annotation). Regarding 3D, technical metadata extractors may be added and also the capability to generate 3D models from image datasets produced by photogrammetry similar to ARC 3D web service [36]. Furthermore, the use of Medici’s interactive system could therefore be potentially extended to more complex virtual exploration such as a digitized archaeological site, to serve not just as a previewer but as an intuitive highly VR environment for web. An interactive learning environment, a “virtual world” [37]. A SpiderGL-based one will replace the current java-based RTI viewer once SpiderGL is developed and made stable by the digital CH communities. Medici will also be able to extract semantically-meaningful features for content-based comparison (e.g. textual data from handwritten text). This will be made possible by constructing descriptors of each file according to derivations of semantically-meaningful features from a file’s data[38][39][40][41]. The generated descriptors for each file can be compared with descriptors of the same type generated from search query data. IX. CONCLUSION Though still in development, Medici 2 already supports a broad range of throughput-intensive research techniques and community data management. Many file and dataset formats can be uploaded, analyzed and visualized. These include the latest formats used by the CH scientific community, i.e. large, detailed 3D models and RTI, as well as the ever-present large images and audio/video. Not only can Medici 2 extract technical metadata from files and datasets, it also allows searching of metadata and social metadata generation according to each implementer’s user-input metadata schema. The above, together with the many important additions scheduled for the future, make it clear that Medici 2 will be a CMS more than worth considering for satisfying the CH scientific community’s dataset management, analysis and visualization needs. ACKNOWLEDGMENTS The authors thank the Cyprus Institute, the LinkSCEEM-2 project and its partners, the National Centre for Supercomputing Applications (University of Illinois) (NCSA) and Bibliotheca Alexandrina as organizations for providing them the means and guidelines enabling their contributions to the development of Medici. Moreover, they thank every member of staff in CyI, NCSA and Bibliotheca Alexandrina who provided them with user requirements and technical assistance. SOURCE CODE The current Medici source code is available from the NCSA Medici repository https://opensource.ncsa.illinois.edu/stash/projects/MED
  • 12. REFERENCES [1] L. Hunt, M. Lundberg, and B. Zuckerman, ―InscriptiFact: A virtual archive of ancient inscriptions from the Near East‖, International Journal on Digital Libraries, vol. 5, no. 3, pp. 153-166, May 2005. [2] L. Marini et al, ―Medici: A Scalable Multimedia Environment for Research‖, white paper, presented at Microsoft eScience Workshop 2010, Berkeley, CA, Oct. 2010. [3] The International Committee for Documentation of Cultural Heritage (CIPA). http://cipa.icomos.org [4] M. Mudge et al, “Image-Based Empirical Information Acquisition, Scientific Reliability, and Long-Term Digital Preservation for the Natural Sciences and Cultural Heritage”, in Eurographics 2008, Hersonissos, Greece, 2008. [5] Zoom.it – About. http://zoom.it/pages/about/ [6] J. Behr, P. Eschler, Y. Jung, and M. Zöllner, ―X3DOM:aDOM- based HTML5/X3D integration model‖, in Proc.of the 14th Int. Conf. on 3D Web Technology, Darmstadt, Germany, Jun. 2009, pp. 129-135. [7] S. Tomar, ―Converting video formats with FFmpeg‖, Linux J., vol.2006, no.146, pp. 10, Jun. 2006. [8] M. Di Benedetto, F. Ponchio, F. Ganovelli, and R. Scopigno, ―SpiderGL: a JavaScript 3D graphics library for next-generation WWW‖, in Proc, of the 15th Int. Conf. on Web 3D Technology, Los Angeles, CA, Jul. 2010, pp. 165-174. [9] A. Videla and J. J. W. Williams, ―RabbitMQ in action: distributed messaging for everyone‖, Shelter Island, NY: Manning, 2012. [10] R. T. Fielding, ―Architectural Styles and the Design of Network- based Software Architectures‖, Ph.D. dissertation, Dept. Comput. Sci., Univ. California Irvine, Irvine, CA, 2000. [11] P. Hilton, E. Bakker, and F. Canedo, ―Play for Scala (Early Access Edition)‖, Manning, 2012. [12] N. Leroux and S. De Kaper, ―Play for Java (Early Access Edition)‖, Manning, 2012. [13] A. Leff and J. Watson, ―Web-application development using the Model/View/Controller design pattern‖, in Proc. of the Enterprise Distributed Object Comp. Conf. 2001, Seattle, WA, Sep. 2001, pp. 118-127. [14] JQuery. http://jquery.com/ [15] S. Weibel, ―The Dublin Core: A Simple Content Description Model for Electronic Resources‖, Bulletin of the Amer. Soc. for Inform. Sci. and Technology, vol.24, no.1, pp. 9-11, Oct./Nov. 1997. [16] M. Doerr, C. E. Ore, and S. Stead, ―The CIDOC conceptual reference model: a new standard for knowledge sharing‖, in Tutorials, posters, panels and industrial contributions at the 26th int. conf. on Conceptual modeling, Auckland, New Zealand, Nov. 2007, pp. 51-56. [17] R. Tarjan, ―Depth-First Search and Linear Graph Algorithms‖, SIAM J. Comput.., vol.1, no.2, pp.146-160, Jun. 1972. [18] Deep Zoom-Features-Microsoft Silverlight. http://www.microsoft.com/silverlight/deep-zoom/ [19] M. Still, ―The Definitive Guide to ImageMagick‖, New York City, NY: Apress, 2005. [20] Drewnoakes.com - jpeg exif / iptc metadata extraction in java. http://drewnoakes.com/code/exif/ [21] A. Guarnieri, F. Pirotti, and A. Vettore, ―Cultural heritage interactive 3D models on the Web: An approach using open source and free software‖, J. of Cultural Heritage, vol. 11, no. 3, pp. 350-353, Jul.-Sept. 2010. [22] F. Bernardini and H. Rushmeier, ―The 3D Model Acquisition Pipeline‖, Comput. Graph. Forum, vol. 21, no. 2, pp. 149-172, Jun. 2002. [23] Web3D Consortium: Open Standards for Real-Time 3D Communication. http://www.web3d.org/realtime-3d/ [24] pdf.js by andreasgal. http://mozilla.github.io/pdf.js/ [25] P. Cignoni et al., ―MeshLab: an Open-Source Mesh Processing Tool‖, presented at Eurographics Italian Chapter Conf. 2008, Salerno, Italy, Jul. 2008, pp. 129-136. [26] J. Behr, U. Bockholt, and D. Fellner, ―Instantreality — A Framework for Industrial Augmented and Virtual Reality Applications‖, in Virtual Reality & Augmented Reality in Industry. Berlin, Germany: Heidelberg, 2011, ch. 5, pp. 91-99. [27] T. Malzbender, D. Gelb, and H. Wolters, ―Polynomial texture maps‖, in SIGGRAPH ’01 Proc. of the 28th Annu. Conf. on Comput.. graph.. and interactive techniques, Los Angeles, CA, Aug. 2001, pp. 519-528. [28] T. Malzbender, D. Gelb, and H. Wolters, ―Polynomial Texture Map (.ptm) FileFormat‖, Client and Media Systems Laboratory, HP Laboratories, Palo Alto, CA, Tech. Rep. HPL-2001-104, Apr. 2001. [29] PTM Web Viewer by Clifford Lyon. http://materialobjects.com/ptm/ [30] Hewlett-Packard labs PTM Desktop viewer. http://www.hpl.hp.com/research/ptm/downloads/download.html [31] Cultural Heritage Imaging (CHI) RTI Desktop viewer. http://culturalheritageimaging.org/Technologies/RTI/ [32] Inscriptifact Desktop Viewer. http://www.inscriptifact.com/instructions/ [33] J. T. Todd, ―The visual perception of 3D shape.‖, in Trends in cognitive sci., vol. 8, no. 3, pp. 115-121, Mar 2004. [34] R. J. Woodham, ―Photometric method for determining surface orientation from multiple images.‖, in Optical Eng., vol. 19, no. 1, pp. 139-144, Feb 1980. [35] IIPImage framework. http://iipimage.sourceforge.net/ [36] ARC 3D Web service. http://homes.esat.kuleuven.be/~visit3d/webservice/v2/index.php [37] B. Harper. J. G. Hedberg and R. Wright, ―Who benefits from virtuality?‖, Comput. & Educ., vol. 34, no. 3/4, pp. 163-176, Apr./May 2000. [38] L. Marini et al, ―Versus: A Framework for General Content- Based Comparisons‖, in IEEE eScience, Chicago, IL, 2012. [39] L. Diesendruck et al, ―Digitization and Search: A Non- Traditional Use of HPC‖, in IEEE eScience Workshop on Extending High Performance Computing Beyond its Traditional User Communities, Chicago, IL, 2012. [40] L. Diesendruck et al, ―A Framework to Access Handwritten Information within Large Digitized Paper Collections‖, in IEEE eScience, Chicago, IL, 2012. [41] L. Diesendruck et al, ―Using Lucene to Index and Search the Digitized 1940 US Census‖, in Extreme Science and Engineering Discovery Environment, San Diego, CA, 2013.