SlideShare a Scribd company logo
DATA MINING
        +
DATA VISUALIZATION




                     Cédric Warny
Social data
Text data
Word frequency


            Word use frequency


  thierry
  kristof
                                                                                       Rank slider




       xi
   lionel
christina
    priya
 philippe
monique
                                            Word rank
 laurent
   gilles
  tanguy
stéphane
  manon
suzanne
                                                                         Word search




                      Most frequent users
Rank slider


                                                                                                                                                                           Word search


                                                                                                                                                                           bienvenue
Word frequency




                                                                                                                                                                           bien
                                                                                                                                                                           bisous
                                                                                                                                                                           bienvenu
                                                                                                                                                                           bientot
                                                                                                                                                                           bitume
                                                                                              Word rank
                                                                                                                                                                           bibli
                                                                                                                                                                           biltiau
                                                                                                                                                                           bienbizz
                 Word use frequency




                                                                                                                                                                           bill4friends

                                                                                                                                                                            Most frequent users
                                                                   maman
                                                  stéphanie




                                                                           philippe


                                                                                               marielle


                                                                                                                   sophie




                                                                                                                                              stéphane


                                                                                                                                                                 suzanne
                                                                                                                                                         manon
                                      christina




                                                                                      priya
                                                              xi




                                                                                                          tanguy




                                                                                                                                     amaury
                                                                                                                            gilles
Rank slider


                                                    People search

                                                    ch
                                                     christina
Person frequency




                                                     charlene
                                                     charlotte
                                                     chantal



                                     Person rank
                   Most used words
Automatically categorize words
                           “Tags” or categories
Hidden                    Transition              Transition
Markov            y1       y1  y2       y2        y2  y3        y3
 Chain
               Emission                Emission                 Emission
           *   y1  the                y2  dog                y3  barks   STOP



Sentence       the                                                barks
                                        dog
Spatial data
Event data
Predicting whether and when
                           something happens
               1

                                  EVENTS
              0.8
                    Cutoff


              0.6
Probability




              0.4




              0.2




               0
                                  Time

More Related Content

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Presentation win

  • 1. DATA MINING + DATA VISUALIZATION Cédric Warny
  • 3.
  • 4.
  • 5.
  • 7. Word frequency Word use frequency thierry kristof Rank slider xi lionel christina priya philippe monique Word rank laurent gilles tanguy stéphane manon suzanne Word search Most frequent users
  • 8. Rank slider Word search bienvenue Word frequency bien bisous bienvenu bientot bitume Word rank bibli biltiau bienbizz Word use frequency bill4friends Most frequent users maman stéphanie philippe marielle sophie stéphane suzanne manon christina priya xi tanguy amaury gilles
  • 9. Rank slider People search ch christina Person frequency charlene charlotte chantal Person rank Most used words
  • 10. Automatically categorize words “Tags” or categories Hidden Transition Transition Markov y1 y1  y2 y2 y2  y3 y3 Chain Emission Emission Emission * y1  the y2  dog y3  barks STOP Sentence the barks dog
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 23. Predicting whether and when something happens 1 EVENTS 0.8 Cutoff 0.6 Probability 0.4 0.2 0 Time

Editor's Notes

  1. Student at the Institute for Advanced Analytics at NC State. Presentmy work by emphasizing what I believe is my strength, iethe combination ofdata mining and data visualization skills. I’ve tried to select projects that I believe relate to the work being done at the lab. Presentation structured by data type: social data, text data, geo data, and finally event data.
  2. This project is a dynamic visualization of a network of relationships. In this case, “relationship” is defined as joint citation of a person in any New York Times article. A seed is chosen (here, Qaddafi) and the program searches through the NYT’s public APIs for all the people mentioned with Qaddafi in the same article. The NYT API enables you to specify “types” of words to look for (key categories like people or organizations are automatically tagged in the articles).
  3. By clicking on a name, all the connections to that name are in turn spawned. So you can interactively explore the network of relationships.
  4. This is another type of social data visualization. This visualization emphasizes social influence. Indeed, this is a treemap of my Twitter followers where the size of someone’s profile picture is proportional to the number of followers that person has. It’s visually appealing because it uses profile pictures and significance is straigthforward: size = measure of social influence. The only issue with this visualization is that some pictures have to be “deformed” to fit the requirement of a treemap for the overall to fit a rectangle. Such a treemap could be made interactive whereby clicking on a profile picture updates the mosaic with that person’s followers. This could be a really nice way of navigating the Twitter graph.
  5. Text messages over a year. Dashboard where you can search either by word or by person. The slider enables you to select a word rank (i.e. the first, second, third, etc. most used word). And for a selected word you can see who’s the most frequent user of that word. The graph plotting word frequency by word rank illustrates the famous “Zipf’s law”, according to which the frequency of word in function of its rank is a power law, meaning that just a few words make the bulk of our vocabulary use.
  6. You can also search for a word by typing it with a real-time suggestion of words.
  7. And you can also do a search by person. In that case, you see a ranking of the most exchanged words with that person.A fun application for such data is for instance to calculate the vocabulary size of your friends and see who’s the “most learned”. That’s what I did, but when I posted on the results on the Facebook wall of my friend with the smallest vocabulary size, she stopped talking to me for 2 days. So I don’t recommend that course of action.
  8. This slide just illustrates an algorithm I’ve been implementing in Python to automatically categorize words in a sentence. Basically, the model assumes that the category of previous words in a sequence is predictive of the category of the next word. And so you would choose the category or tag that is both most likely to follow a given sequence of tags and most likely to be associated with the word. Such algorithms can be really useful in sentiment analysis. You could adapt the model to tag the sentiment associated to a word.This can be applied to assess the mood of citizens based on online, real-time text data (typically Twitter).
  9. Spatial data is key in analyzing the life of cities. Here I present a series of experimental visualizations of spatial data.
  10. Population density. The idea here is that of deformed maps, ie maps that you deform to represent a more abstract reality. I wanted to visualize population density, so my first though was: when something dense, it is heavier; if blown by the wind, the light will fly off further than the dense. Hence this 3D visualization, except that, here, denser countries fly off higher than less dense countries.
  11. This project illustrates the application of geostatistics using the ArcGIS software. It represents the spread of dengue fever in the village of Pennathur, India (in 2001). More particularly the goal was to determine if there was a clustering phenomenon. The color of the dots reflect significance of the clustering phenomenon: the darker, the more clustered. The idea is simply to compare the actual distribution of the points to a random distribution: if the number of events found within a certain distance is greater than what we would expect under a random distribution, then the distribution is clustered. Statistical test to check whether departure from randomness is significant or not. Being able to determine the significance of a spatial clustering phenomenon can have many applications for analyzing the life of cities: do certain types of people or certain events (like crimes) happen in clustered manner? Why?Thousands of random distributions are generate and for each, a measure of spatial distribution is calculated. From all these simulations, one takes the highest and lowest values –these are the “envelope”: if beyond that envelope, then significance.
  12. In this project, I used the app OpenPaths to track every 3 min my location through my phone. After a few months, I downloaded the data and used the Processing library UnfoldingMaps (by one of your former researcher) to visualized my spatial patterns in time using tile-based maps.
  13. Such projects can be useful to create more “personalized maps”: one could think of deformingmaps,ie changing distances between points to reflect a new spatial relationship between these points, like time to get there or frequency of travelling from one point to the other.Such projects can also be useful to build predictive models of where someone will be in the future.Defining new, more natural “borders”: clustering observations according to real-world data instead of arbitrary political boundaries. If we notice that a lot of people tend to overuse certain areas of the cities, while other people focus on some other areas, we could create boundaries that really separate areas based on people’s use of these areas. We could also use the spatial patterns to measure of the degree of clustering in spatial patterns: Highly spatially clustered individuals vs. Highly spatially dispersed individuals.
  14. Users can switch between various visualization modes to get different perspectives on the same data set: altitude vs (lat, lon)
  15. Spatial data is key in analyzing the life of cities. Here I present a series of experimental visualizations of spatial data.