SlideShare a Scribd company logo
1 of 27
Download to read offline
(open) data
hacking
Pierpaolo Basile
Alumni Mathematica
Hello!
Pierpaolo Basile
Ricercatore in informatica (intelligenza
artificiale, elaborazione del linguaggio
naturale)
Socio di Alumni Mathematica
pierpaolo.basile@gmail.com
(OPEN) DATA
dati liberamente accessibili a tutti,
generalmente rilasciati da pubbliche
amministrazioni
HACKING
"Hacking in the sense of
deconstructing an idea, hardware,
anything and getting it to do
something it wasn’t intended or to
better understand how something
works."
Tre storie di data hacking
…e di disobbedienza
archivio online gratuito di oltre
51.000.000 di articoli scientifici
Alexandra Elbakyan
Aaron Swartz
4,8 milioni di articoli
scientifici dal database
accademico JSTOR
Julian Assange
WikiLeaks
Pubblicazione di documenti protetti dal
segreto di stato
Data Hacking Process
Acquisizione dei dati
Estrazione delle informazioni
e creazione di conoscenza
Pubblicazione dei dati
Acquisizione dei dati
Web Scraping
Estrazione di dati
da un sito web
attraverso
programmi
API o protocolli
Web
Estrazione di dati
attraverso
interfacce di
programmazione
o protocolli
specifici
Acquisione
digitale
Scansione di
documenti
cartacei
attraverso OCR
(Optical
Character
Recognition)
Web Scraping
Delibera Num. 675
Data: 18-05-2016
Oggetto: Seguito DGR 2421/2015. POR Puglia…
Struttura: Dipartimento Turismo, …
(download del file PDF)
Scraping
API o protocolli Web
Twitter API
user: BBCBreaking
text: France to declare state…
date: …
img: …
user: BBCBreaking
text: UK department store chain…
date: …
img: …
Acquisizione digitale
Scanner
OCR
Documento
Digitale
«…rock, and picking up this…»
Contenuto testuale
Estrazione delle informazioni
e creazione di conoscenza
60secondionline…
L’80%dell’informazioneè
nonstrutturata…
…prevalentementetesto
?
Le macchine
possono
comprendere il
testo?
Sentiment Analysis
@andreaiannone29 stavi indiavolato...
Bravo peccato per il rettilineo ma
meglio di così non potevi fare!!!
#Motomondiale
che schifo le castagne
Bologna-Palermo: ko Aguirregaray, c‘è
Della Rocca
neutrale
Entity Linking
Entity Linking
Comprensione automatica del testo
«Apple ha acquistato Beats»
Apple
(gruppo verbale)
ha:aux
acquistato:VBpp
Beats
soggetto oggetto
linking linking
Information Extraction
«Apple ha acquistato Beats per 2,5 miliardi di dollari»
«Apple ha comprato LearnSprout»
«Oracle ha acquisito Sun per 7,4 miliardi di dollari.»
Soggetto Oggetto Valore
Apple Beats 2,5
Apple LearnSprout ?
Oracle Sun 7,4
Acquisizioni
Pubblicazione dei dati
Pubblicazione dei dati
◦ Verificare il copyright dei dati
acquisiti (se non si vuole
disubbidire)
◦ Preferire formati aperti: XML,
JSON, CSV, RDF
◦ Licenza dei dati: creative
commons, (italian) open data
license, …
Grazie per l’attenzione!
DOMANDE?
pierpaolo.basile@gmail.com
www.alumnimathematica.org

More Related Content

More from Pierpaolo Basile

EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesPierpaolo Basile
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsPierpaolo Basile
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingPierpaolo Basile
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingPierpaolo Basile
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterPierpaolo Basile
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017Pierpaolo Basile
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachinePierpaolo Basile
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesPierpaolo Basile
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingPierpaolo Basile
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...Pierpaolo Basile
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
 
Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloPierpaolo Basile
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOPierpaolo Basile
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessPierpaolo Basile
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationPierpaolo Basile
 

More from Pierpaolo Basile (17)

EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language games
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian Tweets
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexing
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing Machine
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spaces
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
 
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a ...
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional Spaces
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaolo
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHO
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information Access
 
Encoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutationEncoding syntactic dependencies by vector permutation
Encoding syntactic dependencies by vector permutation
 

(Open) data hacking