Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Â
Mapping the News Networks in XVII Italy
1. News networks in XVII century Italy
Giovanni Colavizza EPFL, Mario Infelise Caâ Foscari
2. Subject: the European news ïŹow
Hypothesis: 1 system of news exchange through Europe.
Raise in demand during 30y War, regular postal service.
Key traits of this information system:
âą multi-media (handwritten long and short range, more
ïŹexible on demand; printed short range and broader
public)
âą adaptive âhub and spokeâ network
âą multi-language
3. Our questions and general aims
How to:
1. prove the existence and extent of the ïŹow
2. reconstruct its ïŹne-grained dynamic cartography
3. study the problem of information supply and
exchange: media interactions
Basic approach: detect text reuse.
We start by developing robust methods for this end.
5. Sources (year 1648)
Asti
Cartagena
Francia
Catalogna
Provenza
Livorno
Alicante
Casale
Parma
Bruxelles
Avignone
Colonia
Palermo
Riviera diPonente
Madrid
Marsiglia
Inghilterra
Lione
Torino
Napoli
Lisbona
Roma
Londra Germania
Milano
Genova
Barcellona
Parigi
Venezia
Bologna
Francia
Svezia
Augusta
Palatinato
Costantinopoli
Monaco
Erfurt
Norimberga
Londra
Franconia
Cassel
Venezia
Vienna
Svevia
Munster
Ratisbona
Amburgo
Francoforte Praga
Colonia
Printed gazettes:
Turin and Genoa
Handwritten: from
Vatican Archives,
Segreteria di
Stato, Avvisi.
7. Results: editorial policies (printed gazettes)
Most frequent sequence order of printed news in each issue:
âą Genoa: Genoa, Rome/Naples/Marseille, Milan, Lisbon, Barcelona, Paris, London, Germany and Venice.
âą Turin: (i1) Turin, Barcelona, Paris, London, Germany; (i2) Milan, Genoa, Naples, Rome and Venice.
Statistic Genoa Turin
Total character
count
281206 579381
Total number of
paragraphs
263 1221
Average
characters per
paragraph
1069 474
8. Results: editorial policies (printed gazettes)
Sheet1
1 2 3 4 5 6
0
2000
4000
6000
8000
10000
12000
14000
16000
Average text per issue Turin
Genoa
Month
Charcount
Sheet1
1 2 3 4 5 6
0
200
400
600
800
1000
1200
Average text per item Turin
Genoa
Month
Charcount
2000
4000
6000
8000
10000
12000
14000
16000
Average text per issue Turin
Genoa
Charcount
9. Methods: matching algorithms - printed
Strategy: compare paragraphs (units of formatting/
reading but also meaning)
Global match: SubString Kernels (similarity of sequences
of non-contiguous characters)
Local alignment: Smith-Waterman (ïŹnds local matching
passages)
Threshold ïŹltering and manual evaluation of 2 highest
scoring matches
10. Results: the ïŹow (printed gazettes)
Turin
Paris
Barcelona
Lisbon
Milan Venice
London
Naples
Rome
Genoa
Germany
11. Results: comparisons (printed gazettes)
Categories:
1. verbatim copy of a whole paragraph or parts of it
2. paraphrasing or translations of the same source
3. same news from different sources
4. same topic but different news
Results:
1 and 3 <1%
2 circa 30%
4 circa 43%
Evaluation:
precision by hand
recall âintractableâ
12. Methods: data preparation - handwritten
Plenipotentiario di Spagna (keyword)
Re di Spagna (name_of_person)
Conte d'AvĂČ (name_of_person)
spagnoli (quantity)
Ambasciatore di Portogallo (keyword)
Perera (name_of_person)
Hassi (keyword)
Cassel (name_of_place)
Plenipotentiario di Franza (keyword)
Sua MaestĂ Cesarea (name_of_person)
Landgraviessa d'Assia (name_of_person)
Osnapruch (name_of_place)
trattato dell'Imperio (keyword)
Lantgravio di Darmstat (name_of_person)
Amnistia nello stati hereditarij (keyword)
anni (quantity)
Pinorada (name_of_person)
Svedesi (keyword)
Provincia d'Utrecht (name_of_place)
pace (keyword)
Spagna (name_of_place)
Olanda (name_of_place)
Zelanda (name_of_place)
Provinzie Basse (name_of_place)
Francia (name_of_place)
13. Methods: matching algorithms - handwritten
Strategy: compare paragraphs
Typed canonicalisation: similar words are grouped into
typed categories (Jaro-Winkler distance)
Paragraph comparison: Tf-idf vectors, cosine distance
Manual evaluation of 2 highest scoring matches
Too limited and skewed corpus for now..
15. Open questions
1. How to effectively evaluate results? The open question
of scalable recall and precision
2. How to get a larger corpus (e.g. at least 2 years to
study seasonality)? 1) lack of data 2) cost of data
preparation
3. How to compare printed and handwritten news?
Ongoing work
4. What to focus on? Variations are as interesting as
verbatim copies to study the interaction of different
medias and types of gazettes..
16. News networks in XVII century Italy
Thanks
Giovanni Colavizza EPFL, Mario Infelise Caâ Foscari