SlideShare a Scribd company logo
1 of 17
Download to read offline
EVENT DETECTION
5th BDE Hang-out “Big Data in Secure societies”13/12/2017
George Giannakopoulos and
Nikiforos Pittaras,
NCSR "Demokritos"
Pilot Architecture
18-déc.-17www.big-data-europe.eu
Event Detection Workflow
18-déc.-17www.big-data-europe.eu
News & Twitter
Crawler
…
Event
Detector
Lookup
Service
ED Workflow: News Crawler
 Runs periodically
 Stores parsed content and metadata to Cassandra
 RSS feeds:
o Crawler conforms with privacy regulations
o Default RSS feeds list to Reuters generic categories
 Direct links to published articles:
o Best-effort parsing
18-déc.-17www.big-data-europe.eu
ED Workflow: Twitter Crawler
 Runs periodically
 Stores parsed content and metadata to Cassandra
 Multiple operation modes:
o Query specified twitter accounts
o Monitor all twitter posts of a specified language
o Keyword-based search
o Parse individual specified posts
18-déc.-17www.big-data-europe.eu
ED Workflow: Cassandra
 Scalable, noSQL distributed database
 I/O scenarios:
1. News & Tweets storage:
o Individual items (news articles or tweets) from the crawlers
2. Event storage:
o Event objects & metadata, as identified by the Event Detector
3. Frontend queries:
o Queries from Sextant about the stored news items and events
18-déc.-17www.big-data-europe.eu
ED Workflow: Event Detector
 Runs periodically
 Distributed execution based on Apache SPARK
Two algorithm steps:
1. Discovers related news items and clusters them into events
2. Produced events are augmented with useful meta-data: date,
locations, images and specified named entities
 Detector algorithm based on
18-déc.-17www.big-data-europe.eu
ED Workflow: ED Algorithm
1) Identify events:
o Gather all unique article pairs
o Extract similarity of members in each pair using graph
representation methods
 If similarity > threshold → related pair
o Form clusters based on related pairs
 If cluster has support > threshold → event
18-déc.-17www.big-data-europe.eu
ED Workflow: ED Algorithm
2) Enrich events:
o Assign individual social media items to events
 Convert to graph-based representation method, similarity-based classification
 If similarity > threshold → attach to event
o Augment events from external metadata extractable from their member
articles and tweets:
 Locations names and geocoordinates (GADM)
 Named entities (Famous people)
 Photographs (Flickr)
18-déc.-17www.big-data-europe.eu
ED Workflow: Location Extraction
 Based on Apache Lucene for fuzzy queries
 Based on the GAMD dataset
o more than 180,000 location names & geometries
 Input: Clean text
 Output: Location name(s) with their corresponding
geocoordinates
18-déc.-17www.big-data-europe.eu
ED Workflow: Entity extraction
Incorporation of semantic metadata extraction
 Augment events by extracting generic named
entities
o Grounded to a unique entity URI
o Highly extensible: entity metadata easily queriable
from additional RESTful APIs, if needed
 APIs & thesauri by the Semantic Web Company
18-déc.-17www.big-data-europe.eu
Text (https://en.wikipedia.org/wiki/The_Godfather#Cast)
ED Workflow: Entity extraction
 Example: famous people thesaurus:
18-déc.-17www.big-data-europe.eu
Extractor
APIhttp://bde.poolparty.biz/People/20
http://bde.poolparty.biz/People/446473
http://bde.poolparty.biz/People/688722
....
Metadata
API
name: Marlon Brando
uri: http://bde.poolparty.biz/People/688722
grounding: http://dbpedia.org/resource/Marlon_Brando
broaders: http://bde.poolparty.biz/People/2
properties: http://www.w3.org/1999/02/22-rdf-syntax-
ns#type
...
Entity metadata Entities
ED Workflow: Detector Scaling
Study on event detection performance scaling
 Distributed execution in Apache SPARK
 Further experiments on two datasets on two different domains
o News articles (Reuters-21578)
o Biomedical scientific publications (bioASQ)
 Up to 10K articles in total (~ 5 mil pairs)
 Technical report draft available upon request
18-déc.-17www.big-data-europe.eu
ED Workflow: Detector Scaling
 Preliminary results on Reuters-21578
 Parallel vs distributed execution time (lower is better)
 Substantial speedup at large enough (> 8K articles) workloads
18-déc.-17www.big-data-europe.eu
ED Workflow: Image extraction
 Enrichment of extracted locations with photographs
o Considers a radial area around the centroid of the
geocoordinates of a location geometry
o Queries the Flickr API for user-uploaded public
photographs within that area
o Filters results to a temporal window relevant to
the date of the event in question
18-déc.-17www.big-data-europe.eu
ED Workflow: Connectivity
Workflow inter-connections
 Automatic triggering of the CD workflow
o Event support calculated during detection
o Triggers if support greater than a specified threshold
 Twitter Crawler source injection
o Targeted consumption of specified posts
 Asynchronous non-blocking operations
18-déc.-17www.big-data-europe.eu
Thank you!
Questions?
Links
 Strabon: http://strabon.di.uoa.gr
 GeoTriples: https://github.com/LinkedEOData/GeoTriples
 Event Detection: https://github.com/big-data-europe/docker-
event-detection
18-déc.-17www.big-data-europe.eu

More Related Content

What's hot

Publishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataPublishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataRoberto García
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRAkbajda
 
Building stateful apps using serverless
Building stateful apps using serverlessBuilding stateful apps using serverless
Building stateful apps using serverlessTirumarai Selvan
 
Building real apps on serverless
Building real apps on serverlessBuilding real apps on serverless
Building real apps on serverlessTirumarai Selvan
 
SC4 Hangout - Luigi Selmi, Transport pilot architecture
SC4 Hangout - Luigi Selmi, Transport pilot architectureSC4 Hangout - Luigi Selmi, Transport pilot architecture
SC4 Hangout - Luigi Selmi, Transport pilot architectureBigData_Europe
 
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...AWS Chicago
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRoverChristoph Matthies
 
Tran Minh: big data platform in high performance computing at NISCI
Tran Minh: big data platform in high performance computing at NISCITran Minh: big data platform in high performance computing at NISCI
Tran Minh: big data platform in high performance computing at NISCIVu Hung Nguyen
 
Application as data flow - LSCC Talks #5
Application as data flow - LSCC Talks #5Application as data flow - LSCC Talks #5
Application as data flow - LSCC Talks #5Timo Tuominen
 
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...Managing and querying large data sets using Data Factory, Cosmos DB and Azure...
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...Marc Duiker
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJason Plurad
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Zhenxiao Luo
 
Triple store
Triple storeTriple store
Triple storeSoonho
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataChristophe Debruyne
 

What's hot (20)

Publishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataPublishing XBRL as Linked Open Data
Publishing XBRL as Linked Open Data
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Building stateful apps using serverless
Building stateful apps using serverlessBuilding stateful apps using serverless
Building stateful apps using serverless
 
Building real apps on serverless
Building real apps on serverlessBuilding real apps on serverless
Building real apps on serverless
 
SC4 Hangout - Luigi Selmi, Transport pilot architecture
SC4 Hangout - Luigi Selmi, Transport pilot architectureSC4 Hangout - Luigi Selmi, Transport pilot architecture
SC4 Hangout - Luigi Selmi, Transport pilot architecture
 
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRover
 
Data Science in the Cloud
Data Science in the CloudData Science in the Cloud
Data Science in the Cloud
 
Tran Minh: big data platform in high performance computing at NISCI
Tran Minh: big data platform in high performance computing at NISCITran Minh: big data platform in high performance computing at NISCI
Tran Minh: big data platform in high performance computing at NISCI
 
Workshop introduction-to-rxjs
Workshop introduction-to-rxjsWorkshop introduction-to-rxjs
Workshop introduction-to-rxjs
 
Application as data flow - LSCC Talks #5
Application as data flow - LSCC Talks #5Application as data flow - LSCC Talks #5
Application as data flow - LSCC Talks #5
 
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...Managing and querying large data sets using Data Factory, Cosmos DB and Azure...
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
 
Graphite
GraphiteGraphite
Graphite
 
Gdal introduction
Gdal introductionGdal introduction
Gdal introduction
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Dsl yodit stanton
Dsl    yodit stantonDsl    yodit stanton
Dsl yodit stanton
 
Triple store
Triple storeTriple store
Triple store
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 

Similar to SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"

SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"BigData_Europe
 
SC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
SC7 Hangout 3: Architecture of the BDE Pilot for Secure SocietiesSC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
SC7 Hangout 3: Architecture of the BDE Pilot for Secure SocietiesBigData_Europe
 
Linked open data sandwich
Linked open data sandwichLinked open data sandwich
Linked open data sandwichThimo Thoeye
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Thomas Gottron
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentOntotext
 
The Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management SystemThe Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management SystemRoberto García
 
Semantic Web & TYPO3
Semantic Web & TYPO3Semantic Web & TYPO3
Semantic Web & TYPO3André Wuttig
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics DemoOntotext
 
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesSC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesBigData_Europe
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkGeorgi Kobilarov
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesJose Emilio Labra Gayo
 
Big data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBig data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBigData_Europe
 
Student Management System
Student Management SystemStudent Management System
Student Management SystemAmit Gandhi
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentationLaura Steggles
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...Jennifer Bowen
 
N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)Woonsan Ko
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineArjen de Vries
 
Culture Geeks Feb talk: Adventures in Linked Data Land
Culture Geeks Feb talk: Adventures in Linked Data LandCulture Geeks Feb talk: Adventures in Linked Data Land
Culture Geeks Feb talk: Adventures in Linked Data Landval.cartei
 

Similar to SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection" (20)

SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
 
SC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
SC7 Hangout 3: Architecture of the BDE Pilot for Secure SocietiesSC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
SC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
 
Linked open data sandwich
Linked open data sandwichLinked open data sandwich
Linked open data sandwich
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
Data Infrastructure in Kumparan
Data Infrastructure in KumparanData Infrastructure in Kumparan
Data Infrastructure in Kumparan
 
Where is the World is my Open Government Data?
Where is the World is my Open Government Data?Where is the World is my Open Government Data?
Where is the World is my Open Government Data?
 
The Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management SystemThe Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management System
 
Semantic Web & TYPO3
Semantic Web & TYPO3Semantic Web & TYPO3
Semantic Web & TYPO3
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesSC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Big data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBig data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilots
 
Student Management System
Student Management SystemStudent Management System
Student Management System
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentation
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
 
N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Culture Geeks Feb talk: Adventures in Linked Data Land
Culture Geeks Feb talk: Adventures in Linked Data LandCulture Geeks Feb talk: Adventures in Linked Data Land
Culture Geeks Feb talk: Adventures in Linked Data Land
 

More from BigData_Europe

Luigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformLuigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformBigData_Europe
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4BigData_Europe
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectBigData_Europe
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...BigData_Europe
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...BigData_Europe
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...BigData_Europe
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 BDE SC3.3 Workshop -  BDE review: Scope and Opportunities BDE SC3.3 Workshop -  BDE review: Scope and Opportunities
BDE SC3.3 Workshop - BDE review: Scope and OpportunitiesBigData_Europe
 
BDE SC3.3 Workshop - Agenda
 BDE SC3.3 Workshop - Agenda BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - AgendaBigData_Europe
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re... BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...BigData_Europe
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 BDE SC3.3 Workshop - Data management in WT testing and monitoring  BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Data management in WT testing and monitoring BigData_Europe
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition MonitoringBigData_Europe
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BigData_Europe
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics  BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics BigData_Europe
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...BigData_Europe
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BigData_Europe
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BigData_Europe
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BigData_Europe
 

More from BigData_Europe (20)

Luigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformLuigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator Platform
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO Project
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 BDE SC3.3 Workshop -  BDE review: Scope and Opportunities BDE SC3.3 Workshop -  BDE review: Scope and Opportunities
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 
BDE SC3.3 Workshop - Agenda
 BDE SC3.3 Workshop - Agenda BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - Agenda
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re... BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 BDE SC3.3 Workshop - Data management in WT testing and monitoring  BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics  BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
 

Recently uploaded

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"

  • 1. EVENT DETECTION 5th BDE Hang-out “Big Data in Secure societies”13/12/2017 George Giannakopoulos and Nikiforos Pittaras, NCSR "Demokritos"
  • 3. Event Detection Workflow 18-déc.-17www.big-data-europe.eu News & Twitter Crawler … Event Detector Lookup Service
  • 4. ED Workflow: News Crawler  Runs periodically  Stores parsed content and metadata to Cassandra  RSS feeds: o Crawler conforms with privacy regulations o Default RSS feeds list to Reuters generic categories  Direct links to published articles: o Best-effort parsing 18-déc.-17www.big-data-europe.eu
  • 5. ED Workflow: Twitter Crawler  Runs periodically  Stores parsed content and metadata to Cassandra  Multiple operation modes: o Query specified twitter accounts o Monitor all twitter posts of a specified language o Keyword-based search o Parse individual specified posts 18-déc.-17www.big-data-europe.eu
  • 6. ED Workflow: Cassandra  Scalable, noSQL distributed database  I/O scenarios: 1. News & Tweets storage: o Individual items (news articles or tweets) from the crawlers 2. Event storage: o Event objects & metadata, as identified by the Event Detector 3. Frontend queries: o Queries from Sextant about the stored news items and events 18-déc.-17www.big-data-europe.eu
  • 7. ED Workflow: Event Detector  Runs periodically  Distributed execution based on Apache SPARK Two algorithm steps: 1. Discovers related news items and clusters them into events 2. Produced events are augmented with useful meta-data: date, locations, images and specified named entities  Detector algorithm based on 18-déc.-17www.big-data-europe.eu
  • 8. ED Workflow: ED Algorithm 1) Identify events: o Gather all unique article pairs o Extract similarity of members in each pair using graph representation methods  If similarity > threshold → related pair o Form clusters based on related pairs  If cluster has support > threshold → event 18-déc.-17www.big-data-europe.eu
  • 9. ED Workflow: ED Algorithm 2) Enrich events: o Assign individual social media items to events  Convert to graph-based representation method, similarity-based classification  If similarity > threshold → attach to event o Augment events from external metadata extractable from their member articles and tweets:  Locations names and geocoordinates (GADM)  Named entities (Famous people)  Photographs (Flickr) 18-déc.-17www.big-data-europe.eu
  • 10. ED Workflow: Location Extraction  Based on Apache Lucene for fuzzy queries  Based on the GAMD dataset o more than 180,000 location names & geometries  Input: Clean text  Output: Location name(s) with their corresponding geocoordinates 18-déc.-17www.big-data-europe.eu
  • 11. ED Workflow: Entity extraction Incorporation of semantic metadata extraction  Augment events by extracting generic named entities o Grounded to a unique entity URI o Highly extensible: entity metadata easily queriable from additional RESTful APIs, if needed  APIs & thesauri by the Semantic Web Company 18-déc.-17www.big-data-europe.eu
  • 12. Text (https://en.wikipedia.org/wiki/The_Godfather#Cast) ED Workflow: Entity extraction  Example: famous people thesaurus: 18-déc.-17www.big-data-europe.eu Extractor APIhttp://bde.poolparty.biz/People/20 http://bde.poolparty.biz/People/446473 http://bde.poolparty.biz/People/688722 .... Metadata API name: Marlon Brando uri: http://bde.poolparty.biz/People/688722 grounding: http://dbpedia.org/resource/Marlon_Brando broaders: http://bde.poolparty.biz/People/2 properties: http://www.w3.org/1999/02/22-rdf-syntax- ns#type ... Entity metadata Entities
  • 13. ED Workflow: Detector Scaling Study on event detection performance scaling  Distributed execution in Apache SPARK  Further experiments on two datasets on two different domains o News articles (Reuters-21578) o Biomedical scientific publications (bioASQ)  Up to 10K articles in total (~ 5 mil pairs)  Technical report draft available upon request 18-déc.-17www.big-data-europe.eu
  • 14. ED Workflow: Detector Scaling  Preliminary results on Reuters-21578  Parallel vs distributed execution time (lower is better)  Substantial speedup at large enough (> 8K articles) workloads 18-déc.-17www.big-data-europe.eu
  • 15. ED Workflow: Image extraction  Enrichment of extracted locations with photographs o Considers a radial area around the centroid of the geocoordinates of a location geometry o Queries the Flickr API for user-uploaded public photographs within that area o Filters results to a temporal window relevant to the date of the event in question 18-déc.-17www.big-data-europe.eu
  • 16. ED Workflow: Connectivity Workflow inter-connections  Automatic triggering of the CD workflow o Event support calculated during detection o Triggers if support greater than a specified threshold  Twitter Crawler source injection o Targeted consumption of specified posts  Asynchronous non-blocking operations 18-déc.-17www.big-data-europe.eu
  • 17. Thank you! Questions? Links  Strabon: http://strabon.di.uoa.gr  GeoTriples: https://github.com/LinkedEOData/GeoTriples  Event Detection: https://github.com/big-data-europe/docker- event-detection 18-déc.-17www.big-data-europe.eu