I City Rate 2016: la classifica delle città smart d'Italia FPA
Sul podio Milano, Bologna, Venezia. La ricerca FPA sulle città smart d'Italia e l'analisi delle sette dimensioni: economy, environment, governance, people, living, legality, mobility. Qui le slide presentate da Gianni Dominici, direttore generale FPA a Bologna, ICity Lab 2016.
How organizations can become data-driven: three main rulesAndrea Gigli
The presentation shows how organization can successfully become data driven and avoid wasting time and money. It explain how to prioritize business questtions, how to combine properly people, tech&data and processes, and how to structure a transforamtional journey for becoming a data driven.
I City Rate 2016: la classifica delle città smart d'Italia FPA
Sul podio Milano, Bologna, Venezia. La ricerca FPA sulle città smart d'Italia e l'analisi delle sette dimensioni: economy, environment, governance, people, living, legality, mobility. Qui le slide presentate da Gianni Dominici, direttore generale FPA a Bologna, ICity Lab 2016.
How organizations can become data-driven: three main rulesAndrea Gigli
The presentation shows how organization can successfully become data driven and avoid wasting time and money. It explain how to prioritize business questtions, how to combine properly people, tech&data and processes, and how to structure a transforamtional journey for becoming a data driven.
Cosa si intende per Data Analytics e Data Science.
Perché i dati rappresentano una risorsa strategica in ogni settore industriale e il ruolo della Data Science nelle aziende.
La Data Science in pratica: obiettivi strategici e processo di creazione del valore.
La professione del Data Scientist: management, leadership, recruiting.
Balance-sheet dynamics impact on FVA, MVA, KVAAndrea Gigli
In this talk I show how balance-sheet dynamics and changes in the Asset/Liability portfolio have and impact on the calculation of FVA, MVA and KVA through a simple multi-period structural model.
Recommendation Systems in banking and Financial ServicesAndrea Gigli
Robot advisory is a hot topic in Banking and Finance nowadays. The quality of any Robot relies on its ability to anticipate the choices of customers and engage them toward action. For this reason, recommendation systems are gaining ground in the banking sector as an alternative or supplementary approach to classical Portfolio Selection models. In this talk, I show how to build recommendation systems in Python using two different ideas, one inspired by graph theory, and the other by word embedding
Fast Feature Selection for Learning to Rank - ACM International Conference on...Andrea Gigli
My talk on fast feature selection filter algorithms at the ACM International Conference on the Theory of Information Retrieval (ICTIR 2016) held in Newark, DE, US
Feature Selection for Document RankingAndrea Gigli
Feature selection for Machine Learning applied to Document Ranking (aka L2R, LtR, LETOR). Contains empirical results on Yahoo! and Bing public available Web Search Engine data.
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Andrea Gigli
The talk hold in London on September 10th at the 5th Annual XVA Forum on Funding, Capital and Valuation. It covered some implications of Valuation Adjustments like CVA, DVA, FVA and KVA (XVAs) in the Pricing of Derivatives, Data Model Definition, Risk Management, Accounting, Trade Workflow processing.
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
In this project I compare different Machine Learning Algorithm on different Text Mining Tasks.
ML algorithms: Naive Bayes, Support Vector Machine, Decision Trees, Random Forest, Ordinal Regression as ML task
Tasks considered: Classifying Positive and Negative Reviews, Predicting Review Stars, Quantifying Sentiment Over Time, Detecting Fake Reviews
Presentazione Startup Saturday Europe @ ParmaCamp2013Andrea Gigli
Startup Saturday Europe is a not-for-profit organization born to promote collaborative networking among Innovation Stakeholders in Europe. It started its activity in January 2013. In this presentation Andrea Gigli presents SSE mission and philosopy during Parmacamp 2013.
Cosa si intende per Data Analytics e Data Science.
Perché i dati rappresentano una risorsa strategica in ogni settore industriale e il ruolo della Data Science nelle aziende.
La Data Science in pratica: obiettivi strategici e processo di creazione del valore.
La professione del Data Scientist: management, leadership, recruiting.
Balance-sheet dynamics impact on FVA, MVA, KVAAndrea Gigli
In this talk I show how balance-sheet dynamics and changes in the Asset/Liability portfolio have and impact on the calculation of FVA, MVA and KVA through a simple multi-period structural model.
Recommendation Systems in banking and Financial ServicesAndrea Gigli
Robot advisory is a hot topic in Banking and Finance nowadays. The quality of any Robot relies on its ability to anticipate the choices of customers and engage them toward action. For this reason, recommendation systems are gaining ground in the banking sector as an alternative or supplementary approach to classical Portfolio Selection models. In this talk, I show how to build recommendation systems in Python using two different ideas, one inspired by graph theory, and the other by word embedding
Fast Feature Selection for Learning to Rank - ACM International Conference on...Andrea Gigli
My talk on fast feature selection filter algorithms at the ACM International Conference on the Theory of Information Retrieval (ICTIR 2016) held in Newark, DE, US
Feature Selection for Document RankingAndrea Gigli
Feature selection for Machine Learning applied to Document Ranking (aka L2R, LtR, LETOR). Contains empirical results on Yahoo! and Bing public available Web Search Engine data.
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Andrea Gigli
The talk hold in London on September 10th at the 5th Annual XVA Forum on Funding, Capital and Valuation. It covered some implications of Valuation Adjustments like CVA, DVA, FVA and KVA (XVAs) in the Pricing of Derivatives, Data Model Definition, Risk Management, Accounting, Trade Workflow processing.
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
In this project I compare different Machine Learning Algorithm on different Text Mining Tasks.
ML algorithms: Naive Bayes, Support Vector Machine, Decision Trees, Random Forest, Ordinal Regression as ML task
Tasks considered: Classifying Positive and Negative Reviews, Predicting Review Stars, Quantifying Sentiment Over Time, Detecting Fake Reviews
Presentazione Startup Saturday Europe @ ParmaCamp2013Andrea Gigli
Startup Saturday Europe is a not-for-profit organization born to promote collaborative networking among Innovation Stakeholders in Europe. It started its activity in January 2013. In this presentation Andrea Gigli presents SSE mission and philosopy during Parmacamp 2013.
Presentazione Startup Saturday Europe @ ParmaCamp2013
Crawling Tripadvisor Attracion Reviews - Italiano
1. Come estrarre informazioni dalle
Review pubbliche di Trip Advisor
Andrea Gigli
https://about.me/andrea.gigli
2. Goal
• Creare un indicatore di similarità (relativa) tra città, sulla base
delle recensioni pubblicate su Tripadvisor
• Recensioni riferite solo alle attrazioni di ciascuna città, non
ad alberghi o ristoranti
• Misure di “vicinanza” utilizzate
– Jaccard
– Cosine Similarity su TF e TF-IDF
• …applicate a
– Tokens
– Entities
7. Cosa abbiamo analizzato
Per ogni città:
1. abbiamo raccolto recensioni delle attrazioni
che potessero essere recuperate con httrack
(r=4)
2. Abbiamo unito tutte le recensioni in un unico
corpus
3. Abbiamo normalizzato, filtrato, effettuato lo
stemming e il de-stemming dei tokens
4. Abbiamo ottenuto le entity presenti nel
corpus da tagme
8. Trip_cloud.py
for city in city_list:
print "Generating tokens and entities for {0}”.format(city)
tokens,entities=get_web_data(city_list)
print "Normalizing {0} entry...”.format(city)
normalized = normalize_words(tokens)
print "Filtering {0} entry...".format(city)
filtered_t=filter_words(normalized)
print "Stemming {0} entry...".format(city)
stemmed = stem_words(filtered_t)
stem_mapping = get_stem_mapping(filtered_t)
print “Destemming {0} entry...".format(city)
destemmed = destem_words(stemmed, stem_mapping)
9. Alcuni dati
Tokens Entities
Livorno 7,225 697
Massa 2,147 138
Grosseto 4,234 473
Firenze 13,129 1,492
Siena 9,994 1,105
Arezzo 8,234 1,022
Lucca 9,851 1,137
Pisa 9,056 952
Prato 6,375 833
Pistoia 9,258 1,035
New York 13,476 1,282
Londra 11,754 1,219
Antananarivo 1,146 115
105,879 11,500
Tokens Entities
Unique
items 13,178 2,370
10.
11.
12. Come abbiamo misurato la “similarità”
tra città
Per ogni coppia di città:
1. Abbiamo calcolato la Jaccard Distance e la
Cosine Similarity (TF) sui tokens di ciascun
corpus
2. Abbiamo calcolato la Jaccard Distance e la
Cosine Similarity (TF) sulle entities di ciascun
corpus
13. Jaccard Index – Tokens
(normalizzata su scala 0-100)
Massa Grosseto Firenze Siena Arezzo Lucca Pisa Prato Pistoia
New
York
Londra Antananarivo
Livorno 49 71 80 78 76 85 80 76 87 66 66 13
Massa 46 36 32 40 39 41 35 40 21 23 26
Grosseto 63 69 69 65 68 65 65 52 49 10
Firenze 97 91 92 83 66 82 86 82 0
Siena 90 94 93 75 84 68 78 10
Arezzo 96 88 82 84 71 73 5
Lucca 94 78 92 76 76 9
Pisa 89 91 74 80 12
Prato 89 56 59 10
Pistoia 67 64 12
New York 100 0
Londra 2
scala 0 10 20 30 40 50 60 70 80 90 100
14. Jaccard Index – Entities
(normalizzata su scala 0-100)
scala 0 10 20 30 40 50 60 70 80 90 100
Massa Grosseto Firenze Siena Arezzo Lucca Pisa Prato Pistoia
New
York
Londra Antananarivo
Livorno 20 58 46 47 44 41 41 45 53 30 24 13
Massa 34 8 16 20 21 18 18 18 10 7 19
Grosseto 47 62 66 61 60 57 70 33 25 7
Firenze 82 82 72 72 71 74 40 45 0
Siena 91 65 71 68 86 43 42 4
Arezzo 92 88 91 100 44 47 6
Lucca 77 74 95 45 35 4
Pisa 75 92 42 35 3
Prato 97 39 33 6
Pistoia 39 32 1
New York 59 6
Londra 8
15. Cosine Similarity TF – Tokens
(normalizzata su scala 0-100)
Massa Grosseto Firenze Siena Arezzo Lucca Pisa Prato Pistoia
New
York
Londra
Antanan
arivo
Livorno 41 63 48 58 57 67 56 56 66 67 60 13
Massa 48 35 28 19 20 32 28 41 35 34 17
Grosseto 57 58 49 56 59 54 60 58 47 6
Firenze 88 77 71 77 67 65 76 64 11
Siena 89 90 82 77 74 67 73 11
Arezzo 82 74 77 78 59 66 0
Lucca 84 78 77 66 69 13
Pisa 78 78 70 71 18
Prato 84 60 71 15
Pistoia 59 54 29
New York 100 22
Londra 24
scala 0 10 20 30 40 50 60 70 80 90 100
16. Cosine Similarity TF – Entities
(normalizzata su scala 0-100)
scala 0 10 20 30 40 50 60 70 80 90 100
Massa Grosseto Firenze Siena Arezzo Lucca Pisa Prato Pistoia
New
York
Londra
Antananari
vo
Livorno 76 38 9 23 14 22 37 25 27 44 22 24
Massa 97 44 63 53 70 83 60 78 100 65 77
Grosseto 25 42 40 43 51 39 47 56 30 34
Firenze 19 10 9 30 38 16 17 2 0
Siena 30 26 44 36 32 29 16 11
Arezzo 22 33 30 32 24 7 5
Lucca 37 25 35 34 15 16
Pisa 48 60 46 25 25
Prato 42 36 21 12
Pistoia 40 17 22
New York 46 43
Londra 21
17. Il peso di tokens e delle entities molto
frequenti distorce la misurazione
Frasi come «Molto bella. Ci torneremo
sicuramente» sono frequenti ma poco informative.
Abbiamo quindi ponderato la frequenza di tokens e
entities per log(N(t)/N), dove:
• N numero di città
• N(t) numero di città in cui compare il token/entity t
Da testare cosa cambia se
• N(t) = numero di recensioni in cui compare il token (o l’entity)
• N = numero di recensioni totale.
18. Cosine Similarity TF-IDF – Tokens
(normalizzata su scala 0-100)
Massa Grosseto Firenze Siena Arezzo Lucca Pisa Prato Pistoia
New
York
Londra
Antanana
rivo
Livorno 57 66 28 37 20 32 28 23 35 25 18 2
Massa 45 44 24 10 28 33 32 33 10 16 1
Grosseto 43 31 22 23 27 36 23 20 34 2
Firenze 94 56 56 66 36 41 50 31 5
Siena 72 100 51 56 51 30 40 6
Arezzo 48 39 34 34 19 21 0
Lucca 56 40 32 35 43 11
Pisa 43 61 42 35 14
Prato 73 20 27 1
Pistoia 31 25 8
New York 67 3
Londra 5
scala 0 10 20 30 40 50 60 70 80 90 100
19. Cosine Similarity TF-IDF – Entities
(normalizzata su scala 0-100)
scala 0 10 20 30 40 50 60 70 80 90 100
Massa Grosseto Firenze Siena Arezzo Lucca Pisa Prato Pistoia
New
York
Londra
Antananariv
o
Livorno 3 4 7 11 3 3 28 5 8 4 3 2
Massa 7 2 6 2 6 4 6 2 2 3 0
Grosseto 10 20 7 17 8 11 6 4 2 0
Firenze 38 14 15 55 51 18 3 5 0
Siena 20 18 34 45 21 3 10 2
Arezzo 15 11 16 12 2 2 0
Lucca 18 15 23 4 3 1
Pisa 49 43 3 8 0
Prato 100 8 7 2
Pistoia 3 1 0
New York 31 1
Londra 3
20. Poi ci abbiamo preso gusto
• Abbiamo incluso altre città italiane: Milano,
Bologna, Bolzano, Bari e Palermo
• L’aumento delle città dovrebbe portare ad un
aumento dell’efficacia di misurazione della Cosine
Similarity Tf-IDF