SlideShare a Scribd company logo
1 of 16
Download to read offline
Flux of MEME - description of work, 1st semester

            project: Flux of Meme
            author: Thomas M. Alisi - thomasalisi@gmail.com
            client: Telecom Italia
            review: deliverable 11.3.11




            1


Wednesday, March 9, 2011
even if geo-tagging is growing,
            it still represents <1% of the total user generated content




            2


Wednesday, March 9, 2011
What makes a trend a Trend?
     Twitter users now send more than 95 million Tweets a day, on just about every topic imaginable. We track the
     volume of terms mentioned on Twitter on an ongoing basis. Topics break into the Trends list when the volume of
     Tweets about that topic at a given moment dramatically increases.

     from Twitter blog, december 2010:


            3


Wednesday, March 9, 2011
project overview




                         1. fetch data                2. create clusters                                 4. analyze stats
                                                                              3. extract topics
                from real-time social networks   of geo-located information                       creating timeline predictions




            4


Wednesday, March 9, 2011
prologue - struggling with hardware and algorithms




            5


Wednesday, March 9, 2011
fetching data: the Twitter streaming API

            • data is fetched using Twitter streaming API

            • issues:

            • access to data is limited: a basic “Spritzer”
              account is limited to 1% of total tweets

            • the amount of geo-localized tweets still
              represent a small figure: around 1%

            • “good” data (meaning that has geo-
              localized information) is around:
              90M (total tweets/day) * 1% * 1%




            6


Wednesday, March 9, 2011
problems

            1.how to increase geo-localized data?

            2.how to increase the amount / quality of text used for topic extraction?




            7


Wednesday, March 9, 2011
approximating geo-information




                     geo information is extracted                                       after having indexed its content
                                                    and searched on geonames database
                      as text from twitter profile                                       (cities with population > 5,000)




            8


Wednesday, March 9, 2011
enriching information

                                                              geo information present

                                                              fetched through GeoNames

                                                              not present




            • extra information carried by single tweets is
              used to enrich data sets for topic extraction

            • linked data is filtered through a blacklist to
              crawl and fetch what is effectively relevant
              for clustering purposes

            9


Wednesday, March 9, 2011
e.r. model, focusing on posts / links / queries / clusters




            10


Wednesday, March 9, 2011
application lifecycle

            • as the twitter API is connected and fetches a
              continuous stream of data, the clustering
              algorithm is executed asynchronously            T




            1.fetch data and store in a continuous timeline

            2.cut time in relevant slices
                                                                  yesterday                today   tomorrow?



            3.create geo-localized clusters of information,
                                                                              time slice




              using HAC (Hierarchical Agglomerative
              Clustering)

            4.extract topics from geo-clusters using LDA
              (Latent Dirichlet Allocation)




            11


Wednesday, March 9, 2011
software architecture




            12


Wednesday, March 9, 2011
web interface

            • first prototype of web interface,
              showing geo-localized clusters

            • radius of clusters indicates standard
              deviation

            • opacity indicates density (number of
              posts)

            • for each cluster, its corresponding
              metadata is shown, including:

                 • list of topics

                 • list of posts

                 • related links


            13


Wednesday, March 9, 2011
what’s next?

            • refinements of LDA topic extraction algorithm (using different sources, determining better datasets of ground
              truth content for construction of statistical model)

            • twitter streaming API tweaks:

                 • location boxes

                 • use of keywords and keyword expansion for context specific searches

            • implementation of search masks with a content indexing system (i.e. Apach Solr)

            • timeline representation of clusters / topics




            14


Wednesday, March 9, 2011
http://a.parsons.edu/~drumb588/tweetcatcha/                     http://truthy.indiana.edu/




            15
                           http://www.janwillemtulp.com/worldeconomicforum/   http://moritz.stefaner.eu/projects/map%20your%20moves/
Wednesday, March 9, 2011
thanks!


                            Thomas M. Alisi, PhD      Giuseppe Serra, PhD     Marco Bertini, PhD
                           thomasalisi@gmail.com   giuseppe.serra@gmail.com   bertini@dsi.unifi.it




            16


Wednesday, March 9, 2011

More Related Content

Similar to Flux of MEME - DOW 1st semester

20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notesBernadette Hyland-Wood
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCFlorian Stegmaier
 
Open data and reuse of public information
Open data and reuse of public informationOpen data and reuse of public information
Open data and reuse of public informationVestforsk.no
 
New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.Piet J.H. Daas
 
A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...Europeana
 
APIs and URLs for Social TV
APIs and URLs for Social TVAPIs and URLs for Social TV
APIs and URLs for Social TVDan Brickley
 
From Sensor Data to Triples: Information Flow in Semantic Sensor Networks
From Sensor Data to Triples: Information Flow in Semantic Sensor NetworksFrom Sensor Data to Triples: Information Flow in Semantic Sensor Networks
From Sensor Data to Triples: Information Flow in Semantic Sensor NetworksNikolaos Konstantinou
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
SemTech West 2011 - Digital Provenance
SemTech West 2011 - Digital ProvenanceSemTech West 2011 - Digital Provenance
SemTech West 2011 - Digital Provenancegvj4v
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEnno Meijers
 
Dipity jsapi mar_3_2011
Dipity jsapi mar_3_2011Dipity jsapi mar_3_2011
Dipity jsapi mar_3_2011Dipity
 
ECM Meets the Semantic Web - Nuxeo World 2011
ECM Meets the Semantic Web - Nuxeo World 2011ECM Meets the Semantic Web - Nuxeo World 2011
ECM Meets the Semantic Web - Nuxeo World 2011Stefane Fermigier
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosOCLC
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 

Similar to Flux of MEME - DOW 1st semester (20)

CSE509 Lecture 5
CSE509 Lecture 5CSE509 Lecture 5
CSE509 Lecture 5
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBC
 
Open data and reuse of public information
Open data and reuse of public informationOpen data and reuse of public information
Open data and reuse of public information
 
New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.
 
A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...
 
APIs and URLs for Social TV
APIs and URLs for Social TVAPIs and URLs for Social TV
APIs and URLs for Social TV
 
From Sensor Data to Triples: Information Flow in Semantic Sensor Networks
From Sensor Data to Triples: Information Flow in Semantic Sensor NetworksFrom Sensor Data to Triples: Information Flow in Semantic Sensor Networks
From Sensor Data to Triples: Information Flow in Semantic Sensor Networks
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
SemTech West 2011 - Digital Provenance
SemTech West 2011 - Digital ProvenanceSemTech West 2011 - Digital Provenance
SemTech West 2011 - Digital Provenance
 
Research Statement
Research StatementResearch Statement
Research Statement
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
Hagen NTIS SLA 2011
Hagen NTIS SLA 2011Hagen NTIS SLA 2011
Hagen NTIS SLA 2011
 
Dipity jsapi mar_3_2011
Dipity jsapi mar_3_2011Dipity jsapi mar_3_2011
Dipity jsapi mar_3_2011
 
ECM Meets the Semantic Web - Nuxeo World 2011
ECM Meets the Semantic Web - Nuxeo World 2011ECM Meets the Semantic Web - Nuxeo World 2011
ECM Meets the Semantic Web - Nuxeo World 2011
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Flux of MEME - DOW 1st semester

  • 1. Flux of MEME - description of work, 1st semester project: Flux of Meme author: Thomas M. Alisi - thomasalisi@gmail.com client: Telecom Italia review: deliverable 11.3.11 1 Wednesday, March 9, 2011
  • 2. even if geo-tagging is growing, it still represents <1% of the total user generated content 2 Wednesday, March 9, 2011
  • 3. What makes a trend a Trend? Twitter users now send more than 95 million Tweets a day, on just about every topic imaginable. We track the volume of terms mentioned on Twitter on an ongoing basis. Topics break into the Trends list when the volume of Tweets about that topic at a given moment dramatically increases. from Twitter blog, december 2010: 3 Wednesday, March 9, 2011
  • 4. project overview 1. fetch data 2. create clusters 4. analyze stats 3. extract topics from real-time social networks of geo-located information creating timeline predictions 4 Wednesday, March 9, 2011
  • 5. prologue - struggling with hardware and algorithms 5 Wednesday, March 9, 2011
  • 6. fetching data: the Twitter streaming API • data is fetched using Twitter streaming API • issues: • access to data is limited: a basic “Spritzer” account is limited to 1% of total tweets • the amount of geo-localized tweets still represent a small figure: around 1% • “good” data (meaning that has geo- localized information) is around: 90M (total tweets/day) * 1% * 1% 6 Wednesday, March 9, 2011
  • 7. problems 1.how to increase geo-localized data? 2.how to increase the amount / quality of text used for topic extraction? 7 Wednesday, March 9, 2011
  • 8. approximating geo-information geo information is extracted after having indexed its content and searched on geonames database as text from twitter profile (cities with population > 5,000) 8 Wednesday, March 9, 2011
  • 9. enriching information geo information present fetched through GeoNames not present • extra information carried by single tweets is used to enrich data sets for topic extraction • linked data is filtered through a blacklist to crawl and fetch what is effectively relevant for clustering purposes 9 Wednesday, March 9, 2011
  • 10. e.r. model, focusing on posts / links / queries / clusters 10 Wednesday, March 9, 2011
  • 11. application lifecycle • as the twitter API is connected and fetches a continuous stream of data, the clustering algorithm is executed asynchronously T 1.fetch data and store in a continuous timeline 2.cut time in relevant slices yesterday today tomorrow? 3.create geo-localized clusters of information, time slice using HAC (Hierarchical Agglomerative Clustering) 4.extract topics from geo-clusters using LDA (Latent Dirichlet Allocation) 11 Wednesday, March 9, 2011
  • 12. software architecture 12 Wednesday, March 9, 2011
  • 13. web interface • first prototype of web interface, showing geo-localized clusters • radius of clusters indicates standard deviation • opacity indicates density (number of posts) • for each cluster, its corresponding metadata is shown, including: • list of topics • list of posts • related links 13 Wednesday, March 9, 2011
  • 14. what’s next? • refinements of LDA topic extraction algorithm (using different sources, determining better datasets of ground truth content for construction of statistical model) • twitter streaming API tweaks: • location boxes • use of keywords and keyword expansion for context specific searches • implementation of search masks with a content indexing system (i.e. Apach Solr) • timeline representation of clusters / topics 14 Wednesday, March 9, 2011
  • 15. http://a.parsons.edu/~drumb588/tweetcatcha/ http://truthy.indiana.edu/ 15 http://www.janwillemtulp.com/worldeconomicforum/ http://moritz.stefaner.eu/projects/map%20your%20moves/ Wednesday, March 9, 2011
  • 16. thanks! Thomas M. Alisi, PhD Giuseppe Serra, PhD Marco Bertini, PhD thomasalisi@gmail.com giuseppe.serra@gmail.com bertini@dsi.unifi.it 16 Wednesday, March 9, 2011