A brief introduction to the
Venice Time Machine
Giovanni Colavizza EPFL
Who am I
Giovanni Colavizza
PhD student in Management of Technology
chair of Digital Humanities, EPFL
previously:
Computer Science, History, Archival and Library Sciences,
2 start-ups and some positions in IT and research.
Today
Venice Time Machine
1- Vision (where to go)
2- Pipeline (how) and Projects (what)
3- Methods and DH in context (or why, and how again)
VTM Vision
VTM Vision
Preservation (from analog to digital)
Access (from browsing to searching)
Valorisation by use
Preservation
Digitisation and replication as a preservation strategy..
Quite complicated:
1- metadata (digital provenance)
2- replication protocols: IT infrastructure (centralised vs
distributed)
3- rights and partners’ needs (far away goal of open
access for public heritage)
Access
An Information System down to contents:
Valorisation
1- research
2- teaching
3- digital reconstruction and outreach
4- technology transfer
5- methodology transfer
Pipeline illustrated by projects
1. Digitisation (Tomography)
2. Image processing (Pre-processing Suite)
3. Content extraction (Automatic transcription, READ Project)
4. Information modelling (Garzoni Project)
5. Building an information system (Document Viewer)
6. Content enrichment and network effects (Linked Books Project)
7. Valorisation and use (GIS, digital experiences, …)
Tomography
Fauzia Albertin EPFL
Tomography
Fauzia Albertin EPFL
Pipeline illustrated by projects
1. Digitisation (Tomography)
2. Image processing (Pre-processing Suite)
3. Content extraction (Automatic transcription, READ Project)
4. Information modelling (Garzoni Project)
5. Building an information system (Document Viewer)
6. Content enrichment and network effects (Linked Books Project)
7. Valorisation and use (GIS, digital experiences, …)
Image pre-processing suite
Andrea Mazzei ODOMA
Image pre-processing suite
Video pt. 1
Andrea Mazzei ODOMA
Pipeline illustrated by projects
1. Digitisation (Tomography)
2. Image processing (Pre-processing Suite)
3. Content extraction (Automatic transcription, READ Project)
4. Information modelling (Garzoni Project)
5. Building an information system (Document Viewer)
6. Content enrichment and network effects (Linked Books Project)
7. Valorisation and use (GIS, digital experiences, …)
Semi-automatic transcription
or the Big Data quest for script family resemblances
READ Horizon 2020 project: 8.2 million €, 7
partners, maximum peer reviewers’ score.
Opt. 1: Alignment
ello$ stara$ en$ carcere$ domentre$ chel$ fara$
queste$ chose$ opagera.$ Et$ e5am$ deo$
stagando$collui$encarcere$se$sauera$la$
che$ sia$ dellauer$ de$ collui$ lodoxe$
comandera$ chello$ sia$ entromesso$
edara$ sse$ allo$ so$ credetor.$ Et$ e5am$
deo$ selo$ creditor$ uora$ enues5r$
lapprietade$ del$ debitor$ enquella$ fia$
da$ alcreditor$ sera$ data$ en$ ues5xon.$
Mosella$ femena$ che$ none$ maritata$
sera$9depnata$segon$do$che$desoura$
edito$ tuto$ se$ fara$ segondo$ che$ nui$
auemo$ soura$ dito$ delomo$ remetuda$
questa$ cho$ sa$ chello$ stara$ enlo$
teratorio$de$san$ҫacharia$e$
$
Fouad Slimane EPFL
Opt. 1: Alignment
Fouad Slimane EPFL
! chose! opagera.! Et! e.am! deo!
stagando!collui!encarcere!se!sauera!la!
che! sia! dellauer! de! collui! lodoxe!
comandera! chello! sia! entromesso!
edara! sse! allo! so! credetor.! Et! e.am!
deo! selo! creditor! uora! enues.r!
lapprietade! del! debitor! enquella! fia!
da! alcreditor! sera! data! en! ues.xon.!
Mosella! femena! che! none! maritata!
sera!9depnata!segon!do!che!desoura!
edito! tuto! se! fara! segondo! che! nui!
auemo! soura! dito! delomo! remetuda!
questa! cho! sa! chello! stara! enlo!
teratorio!de!san!ҫacharia!e!
!
Opt. 1: Alignment
Fouad Slimane EPFL
!
!encarcere!se!sauera!la!
che! sia! dellauer! de! collui! lodoxe!
comandera! chello! sia! entromesso!
edara! sse! allo! so! credetor.! Et! e3am!
deo! selo! creditor! uora! enues3r!
lapprietade! del! debitor! enquella! fia!
da! alcreditor! sera! data! en! ues3xon.!
Mosella! femena! che! none! maritata!
sera!9depnata!segon!do!che!desoura!
edito! tuto! se! fara! segondo! che! nui!
auemo! soura! dito! delomo! remetuda!
questa! cho! sa! chello! stara! enlo!
teratorio!de!san!ҫacharia!e!
!
Opt. 1: Alignment
Fouad Slimane EPFL
!
!
lodoxe!
comandera! chello! sia! entromesso!
edara! sse! allo! so! credetor.! Et! e2am!
deo! selo! creditor! uora! enues2r!
lapprietade! del! debitor! enquella! fia!
da! alcreditor! sera! data! en! ues2xon.!
Mosella! femena! che! none! maritata!
sera!9depnata!segon!do!che!desoura!
edito! tuto! se! fara! segondo! che! nui!
auemo! soura! dito! delomo! remetuda!
questa! cho! sa! chello! stara! enlo!
teratorio!de!san!ҫacharia!e!
!
Opt. 1: Alignment
Fouad Slimane EPFL
!
!
sse! allo! so! credetor.! Et! e-am!
deo! selo! creditor! uora! enues-r!
lapprietade! del! debitor! enquella! fia!
da! alcreditor! sera! data! en! ues-xon.!
Mosella! femena! che! none! maritata!
sera!9depnata!segon!do!che!desoura!
edito! tuto! se! fara! segondo! che! nui!
auemo! soura! dito! delomo! remetuda!
questa! cho! sa! chello! stara! enlo!
teratorio!de!san!ҫacharia!e!
!
Opt. 2: Word spotting and Neural Networks
Andrea Mazzei ODOMA
Video pt. 2
Pipeline illustrated by projects
1. Digitisation (Tomography)
2. Image processing (Pre-processing Suite)
3. Content extraction (Automatic transcription, READ Project)
4. Information modelling (Garzoni Project)
5. Building an information system (Document Viewer)
6. Content enrichment and network effects (Linked Books Project)
7. Valorisation and use (GIS, digital experiences, …)
Information modelling
Garzoni Project
Lille University and EPFL
ANR+FNS funded
Valentina Sapienza Lille
Maud Ehrmann EPFL
Information modelling
Valentina Sapienza Lille
Maud Ehrmann EPFL
Information modelling
Valentina Sapienza Lille
Maud Ehrmann EPFL
Pipeline illustrated by projects
1. Digitisation (Tomography)
2. Image processing (Pre-processing Suite)
3. Content extraction (Automatic transcription, READ Project)
4. Information modelling (Garzoni Project)
5. Building an information system (Document Viewer)
6. Content enrichment and network effects (Linked Books Project)
7. Valorisation and use (GIS, digital experiences, …)
Information system
Fabio Bortoluzzi EPFL
Information system
Fabio Bortoluzzi EPFL
Not all documents are the same
in connecting to each other.
Fiscal declarations (for
taxation)
Personal acts (contracts,
testaments, etc.)
State machinery (office
holding)
Information system
Fabio Bortoluzzi EPFL
How Venetians indexed this information?
Information system
Fabio Bortoluzzi EPFL
Real estate surveys
Fiscal declarations
Testaments
Information system
Fabio Bortoluzzi EPFL
Entities
Indexes
Documents
Information system
Orlin Topalov EPFL
Pipeline illustrated by projects
1. Digitisation (Tomography)
2. Image processing (Pre-processing Suite)
3. Content extraction (Automatic transcription, READ Project)
4. Information modelling (Garzoni Project)
5. Building an information system (Document Viewer)
6. Content enrichment and network effects (Linked Books Project)
7. Valorisation and use (GIS, digital experiences, …)
Content enrichment
Linked Books Project
EPFL, Ca’ Foscari, Marciana
FNS funded
Approx. half of the citations in humanities are to primary
sources [Wiberley (2009)].
Their use has hardly ever been studied with citation analytic
methods.
Network effects: directly link scholarship with primary sources.
Content enrichment
• Primary and secondary sources
• Citation history (e.g. Google Scholar)
• Citation semantics
• Algorithmic History of the History of Venice
Content enrichment
Content enrichment
Content enrichment
Content enrichment
Network-based models. Remember primary and
secondary sources, how many graphs can we
build?
Bibliographic coupling and co-citation
Content enrichment: multiple perspectives
Pipeline illustrated by projects
1. Digitisation (Tomography)
2. Image processing (Pre-processing Suite)
3. Content extraction (Automatic transcription, READ Project)
4. Information modelling (Garzoni Project)
5. Building an information system (Document Viewer)
6. Content enrichment and network effects (Linked Books Project)
7. Valorisation and use (GIS, digital experiences, …)
Valorisation: some examples
Immersive reality
Valorisation: some examples
GIS and 3d virtual reconstructions
Valorisation: some examples
Teaching and interdisciplinary collaborations
Valorisation: some examples
Replication and transfer
VTM in the context of DH
1- The Big vs Small Data debate, or a proposal for
reframing
2- The quest for evidence of value, or overcoming the
DH drudgery conundrum
3- Humanities in the digital era, or why we need
historians more than ever ;)
VTM in the context of DH
The Big vs Small Data debate, or a proposal for
reframing
Big Data (for Humanities):
1- a matter of dimensions (in Tb or Pb)
2- networked, relational vs well-bounded (Kaplan 2015)
3- Telescope vs Microscope
“Data” are not big or small per se, but are so according to the
observer. Do I want to aggregate or disaggregate? Do I have
“larger” or “smaller” questions?
VTM in the context of DH
The Big vs Small Data debate, or a proposal for
reframing
Macro MicroMeso
VTM in the context of DH
The quest for evidence of value, or overcoming the DH
drudgery conundrum
Tool-building not an end in itself.
Developing tools to answer old questions should lead to
new questions and perspectives. The great quest in DH
now is for new arguments.
VTM in the context of DH
Humanities in the digital era, or why we need historians
more than ever ;)
“historians are fundamentally in the business of taking
complex, incomplete sources that are full of biases
and errors, and interpreting them critically to
develop an argument that answers a research
question. Digital sources do not change this.”
Ian Gregory
VTM in the context of DH
Humanities in the digital era, or why we need historians
more than ever ;)
“Data of different kinds must be
understood in their historical
relationship.”
Historians as critical arbiters of
information trained to work with time
(“comparative modelling of multiple
variables over time” in jargon).
A brief introduction to the
Venice Time Machine
Thank you
Giovanni Colavizza EPFL
“Computers are incredibly fast, accurate and stupid;
humans are incredibly slow, inaccurate and brilliant;
together they are powerful beyond imagination.”
Albert Einstein (or was it someone else??)

Introduction to the Venice Time Machine

  • 1.
    A brief introductionto the Venice Time Machine Giovanni Colavizza EPFL
  • 2.
    Who am I GiovanniColavizza PhD student in Management of Technology chair of Digital Humanities, EPFL previously: Computer Science, History, Archival and Library Sciences, 2 start-ups and some positions in IT and research.
  • 3.
    Today Venice Time Machine 1-Vision (where to go) 2- Pipeline (how) and Projects (what) 3- Methods and DH in context (or why, and how again)
  • 4.
  • 5.
    VTM Vision Preservation (fromanalog to digital) Access (from browsing to searching) Valorisation by use
  • 6.
    Preservation Digitisation and replicationas a preservation strategy.. Quite complicated: 1- metadata (digital provenance) 2- replication protocols: IT infrastructure (centralised vs distributed) 3- rights and partners’ needs (far away goal of open access for public heritage)
  • 7.
  • 8.
    Valorisation 1- research 2- teaching 3-digital reconstruction and outreach 4- technology transfer 5- methodology transfer
  • 9.
    Pipeline illustrated byprojects 1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)
  • 10.
  • 11.
  • 12.
    Pipeline illustrated byprojects 1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)
  • 13.
  • 14.
    Image pre-processing suite Videopt. 1 Andrea Mazzei ODOMA
  • 15.
    Pipeline illustrated byprojects 1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)
  • 16.
    Semi-automatic transcription or theBig Data quest for script family resemblances READ Horizon 2020 project: 8.2 million €, 7 partners, maximum peer reviewers’ score.
  • 17.
    Opt. 1: Alignment ello$stara$ en$ carcere$ domentre$ chel$ fara$ queste$ chose$ opagera.$ Et$ e5am$ deo$ stagando$collui$encarcere$se$sauera$la$ che$ sia$ dellauer$ de$ collui$ lodoxe$ comandera$ chello$ sia$ entromesso$ edara$ sse$ allo$ so$ credetor.$ Et$ e5am$ deo$ selo$ creditor$ uora$ enues5r$ lapprietade$ del$ debitor$ enquella$ fia$ da$ alcreditor$ sera$ data$ en$ ues5xon.$ Mosella$ femena$ che$ none$ maritata$ sera$9depnata$segon$do$che$desoura$ edito$ tuto$ se$ fara$ segondo$ che$ nui$ auemo$ soura$ dito$ delomo$ remetuda$ questa$ cho$ sa$ chello$ stara$ enlo$ teratorio$de$san$ҫacharia$e$ $ Fouad Slimane EPFL
  • 18.
    Opt. 1: Alignment FouadSlimane EPFL ! chose! opagera.! Et! e.am! deo! stagando!collui!encarcere!se!sauera!la! che! sia! dellauer! de! collui! lodoxe! comandera! chello! sia! entromesso! edara! sse! allo! so! credetor.! Et! e.am! deo! selo! creditor! uora! enues.r! lapprietade! del! debitor! enquella! fia! da! alcreditor! sera! data! en! ues.xon.! Mosella! femena! che! none! maritata! sera!9depnata!segon!do!che!desoura! edito! tuto! se! fara! segondo! che! nui! auemo! soura! dito! delomo! remetuda! questa! cho! sa! chello! stara! enlo! teratorio!de!san!ҫacharia!e! !
  • 19.
    Opt. 1: Alignment FouadSlimane EPFL ! !encarcere!se!sauera!la! che! sia! dellauer! de! collui! lodoxe! comandera! chello! sia! entromesso! edara! sse! allo! so! credetor.! Et! e3am! deo! selo! creditor! uora! enues3r! lapprietade! del! debitor! enquella! fia! da! alcreditor! sera! data! en! ues3xon.! Mosella! femena! che! none! maritata! sera!9depnata!segon!do!che!desoura! edito! tuto! se! fara! segondo! che! nui! auemo! soura! dito! delomo! remetuda! questa! cho! sa! chello! stara! enlo! teratorio!de!san!ҫacharia!e! !
  • 20.
    Opt. 1: Alignment FouadSlimane EPFL ! ! lodoxe! comandera! chello! sia! entromesso! edara! sse! allo! so! credetor.! Et! e2am! deo! selo! creditor! uora! enues2r! lapprietade! del! debitor! enquella! fia! da! alcreditor! sera! data! en! ues2xon.! Mosella! femena! che! none! maritata! sera!9depnata!segon!do!che!desoura! edito! tuto! se! fara! segondo! che! nui! auemo! soura! dito! delomo! remetuda! questa! cho! sa! chello! stara! enlo! teratorio!de!san!ҫacharia!e! !
  • 21.
    Opt. 1: Alignment FouadSlimane EPFL ! ! sse! allo! so! credetor.! Et! e-am! deo! selo! creditor! uora! enues-r! lapprietade! del! debitor! enquella! fia! da! alcreditor! sera! data! en! ues-xon.! Mosella! femena! che! none! maritata! sera!9depnata!segon!do!che!desoura! edito! tuto! se! fara! segondo! che! nui! auemo! soura! dito! delomo! remetuda! questa! cho! sa! chello! stara! enlo! teratorio!de!san!ҫacharia!e! !
  • 22.
    Opt. 2: Wordspotting and Neural Networks Andrea Mazzei ODOMA Video pt. 2
  • 23.
    Pipeline illustrated byprojects 1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)
  • 24.
    Information modelling Garzoni Project LilleUniversity and EPFL ANR+FNS funded Valentina Sapienza Lille Maud Ehrmann EPFL
  • 25.
  • 26.
  • 27.
    Pipeline illustrated byprojects 1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)
  • 28.
  • 29.
    Information system Fabio BortoluzziEPFL Not all documents are the same in connecting to each other. Fiscal declarations (for taxation) Personal acts (contracts, testaments, etc.) State machinery (office holding)
  • 30.
    Information system Fabio BortoluzziEPFL How Venetians indexed this information?
  • 31.
    Information system Fabio BortoluzziEPFL Real estate surveys Fiscal declarations Testaments
  • 32.
    Information system Fabio BortoluzziEPFL Entities Indexes Documents
  • 33.
  • 34.
    Pipeline illustrated byprojects 1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)
  • 35.
    Content enrichment Linked BooksProject EPFL, Ca’ Foscari, Marciana FNS funded Approx. half of the citations in humanities are to primary sources [Wiberley (2009)]. Their use has hardly ever been studied with citation analytic methods. Network effects: directly link scholarship with primary sources.
  • 36.
    Content enrichment • Primaryand secondary sources • Citation history (e.g. Google Scholar) • Citation semantics • Algorithmic History of the History of Venice
  • 37.
  • 38.
  • 39.
  • 40.
    Content enrichment Network-based models.Remember primary and secondary sources, how many graphs can we build? Bibliographic coupling and co-citation
  • 41.
  • 42.
    Pipeline illustrated byprojects 1. Digitisation (Tomography) 2. Image processing (Pre-processing Suite) 3. Content extraction (Automatic transcription, READ Project) 4. Information modelling (Garzoni Project) 5. Building an information system (Document Viewer) 6. Content enrichment and network effects (Linked Books Project) 7. Valorisation and use (GIS, digital experiences, …)
  • 43.
  • 44.
    Valorisation: some examples GISand 3d virtual reconstructions
  • 45.
    Valorisation: some examples Teachingand interdisciplinary collaborations
  • 46.
  • 47.
    VTM in thecontext of DH 1- The Big vs Small Data debate, or a proposal for reframing 2- The quest for evidence of value, or overcoming the DH drudgery conundrum 3- Humanities in the digital era, or why we need historians more than ever ;)
  • 48.
    VTM in thecontext of DH The Big vs Small Data debate, or a proposal for reframing Big Data (for Humanities): 1- a matter of dimensions (in Tb or Pb) 2- networked, relational vs well-bounded (Kaplan 2015) 3- Telescope vs Microscope “Data” are not big or small per se, but are so according to the observer. Do I want to aggregate or disaggregate? Do I have “larger” or “smaller” questions?
  • 49.
    VTM in thecontext of DH The Big vs Small Data debate, or a proposal for reframing Macro MicroMeso
  • 50.
    VTM in thecontext of DH The quest for evidence of value, or overcoming the DH drudgery conundrum Tool-building not an end in itself. Developing tools to answer old questions should lead to new questions and perspectives. The great quest in DH now is for new arguments.
  • 51.
    VTM in thecontext of DH Humanities in the digital era, or why we need historians more than ever ;) “historians are fundamentally in the business of taking complex, incomplete sources that are full of biases and errors, and interpreting them critically to develop an argument that answers a research question. Digital sources do not change this.” Ian Gregory
  • 52.
    VTM in thecontext of DH Humanities in the digital era, or why we need historians more than ever ;) “Data of different kinds must be understood in their historical relationship.” Historians as critical arbiters of information trained to work with time (“comparative modelling of multiple variables over time” in jargon).
  • 53.
    A brief introductionto the Venice Time Machine Thank you Giovanni Colavizza EPFL “Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination.” Albert Einstein (or was it someone else??)