Media Monitoring

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Media Monitoring - Presentation Transcript

    1. Outline Definition Methodology Business Governments Live example Media Monitoring Arnau Gavalda1 Carlos Ortega1 1 Escola T`cnica Superior d’Enginyeria e Rovira i Virgili University PCP: October 2009 Gavalda,Ortega Media Monitoring
    2. Outline Definition Methodology Business Governments Live example Outline of Topics Definition Methodology Overview Data gathering Data preprocessing Data analysis Data anonimization Business Governments Live example Gavalda,Ortega Media Monitoring
    3. Outline Definition Methodology Business Governments Live example What is media monitoring? “Media monitoring is the activity of monitoring the output of the print, online and broadcast media.” Wikipedia It is also refereed as “Media Intelligence” Gavalda,Ortega Media Monitoring
    4. Outline Definition Methodology Business Governments Live example Why is media monitoring is important? Directives of enterprises, corporations, government,... need the most reliable information in order to make the best decisions. Gavalda,Ortega Media Monitoring
    5. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example How to do media monitoring? 1. Input: Data Gathering by sniffing or being the center 2. Data processing Preprocessing: feature extraction and selection, remove correlated variables,... Analysis and prediction: multivariate statistics, clustering, machine learning,... 3. Output: Information, knowledge and predictions Gavalda,Ortega Media Monitoring
    6. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example How we can get data? Spying or sniffing communications i.e. Echleon, FBI Newly Declassified Files Detail Massive FBI Data-Mining Project http://www.wired.com/threatlevel/2009/09/fbi-nsac/ Becoming the center/media i.e. google, facebook,... Gavalda,Ortega Media Monitoring
    7. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example Which type of data can we gather? Text (email, sms, messaging,...) Audio (telephone, voice conversations,...) Images (photos,...) Video (web cam, video-conferences, personal video,...) Position (geographical location) Events and preferences Real documents Compromising emanations (Tempest attacks) Gavalda,Ortega Media Monitoring
    8. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example How can we get the most relevant variables? Depends on the type of data (numeric, text, image, video, location,...) Feature extraction (I) Extract relevant information of redundant data Text Natural Language Processing i.e. http://www.wolframalpha.com/input/?i=when+bill+ clinton+was+born Image Optical Character Recognition Detection of edge, blob, ridge, corner, shape,... Active contours i.e. http://similar-images.googlelabs.com/ Gavalda,Ortega Media Monitoring
    9. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example How can we get the most relevant variables? Feature extraction (II) Audio i.e. http://labs.google.com/gaudi Fast Fourier Transform Hamming Speech recognition (Hidden Markov model or Dynamic time warping) Location Sequencing Gavalda,Ortega Media Monitoring
    10. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example How can we get the most relevant variables? Feature selection/reduction “Special” dimensionality reduction that decreases data size (length, dimensionality,...) and preserves information Methods Principal or Independent Component Analysis (with or without kernel) Support Vector Regression Neural networks (MLP topology for compression, SOM,...) Partial Least Squares Software Weka RapidMinner Packages in R and Matlab Gavalda,Ortega Media Monitoring
    11. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example How can we get information from the data? Similarity and distance measures Core methods of statistics, machine learning and data mining How to compute it efficiently? Fast methods More information Gavalda,Ortega Media Monitoring
    12. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example Similarity and distance measures Most of methods need at least a similarity measure between any two points or supposing a distance in the space. Gavalda,Ortega Media Monitoring
    13. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example Core methods of statistics, machine learning & mining I N is the number of data, D the number of features or dimensionality and M the number of models Querying: nearest-neighbor O(N), spherical range-search O(N), orthogonal range-search O(N), contingency table Density estimation: kernel density estimation O(N 2 ), mixture of Gaussians O(N) Regression: linear regression O(D 3 ), kernel regression O(N 2 ), Gaussian process regression O(N 3 ) Classification: nearest-neighbor classifier O(N 2 ), nonparametric Bayes classifier O(N 2 ), support vector machine (mainly depends on kernel), multi layer perceptrons (depends on topology and quality of solution) Gavalda,Ortega Media Monitoring
    14. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example Core methods of statistics, machine learning & mining II Clustering: k-means O(N), hierarchical clustering O(N 3 ), by dimension reduction, self-organizing maps O(ND) Dimension reduction: principal component analysis O(D 3 ), non-negative matrix factorization, kernel PCA O(N 3 ), maximum variance unfolding O(N 3 ) Outlier detection: by robust L2 estimation, by density estimation, by dimension reduction Time series analysis: Kalman filter O(D 3 ), hidden Markov model, trajectory tracking 2-sample testing: n-point correlation O(N n ) Cross-match: bipartite matching O(N 3 ) Gavalda,Ortega Media Monitoring
    15. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example Core methods are “innefficient”, how we can compute it efficiently? Multi-scale Decompositions Generalized N-body algorithms (multiple trees) for distance/similarity-based computations Hierarchical series expansions for kernel summations Multi-scale Monte Carlo for linear algebra and summations Stochastic process approximations for time series [2009] Monte Carlo optimization: online, progressive [2009] Parallel computing Gavalda,Ortega Media Monitoring
    16. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example Fast methods I Querying: nearest-neighbor O(logN), spherical range-search O(logN), orthogonal range-search O(logN), contingency table Density estimation: kernel density estimation O(N) or O(1), mixture of Gaussians O(logN) Regression: linear regression O(D) or O(1), kernel regression O(N) or O(1), Gaussian process regression O(N) or O(1) Classification: nearest-neighbor classifier O(N), nonparametric Bayes classifier O(N), support vector machine O(N) Gavalda,Ortega Media Monitoring
    17. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example Fast methods II Clustering: k-means O(logN), hierarchical clustering O(NlogN), by dimension reduction Dimension reduction: principal component analysis O(D) or O(1), non-negative matrix factorization, kernel PCA O(N) or O(1), maximum variance unfolding O(N) Outlier detection: by robust L2 estimation, by density estimation, by dimension reduction Time series analysis: Kalman filter O(D) or O(1), hidden Markov model, trajectory tracking 2-sample testing: n-point correlation O(NlogN) Cross-match: bipartite matching O(N) or O(1) Gavalda,Ortega Media Monitoring
    18. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example More information on data analysis Pattern Recognition and Machine Learning http://research.microsoft.com/en-us/um/people/ cmbishop/prml/ Presentation on Machine Learning on Massive Datasets http://www.astro.caltech.edu/AIworkshop/Gray.pdf Electronic Statistics Textbook http://www.statsoft.com/textbook/stathome.html ”Survey of Clustering Algorithms” Riu Xu 2005 Neural Networks Tutorial with Java Applets http://lcn.epfl.ch/tutorial/english/ Gavalda,Ortega Media Monitoring
    19. Outline Overview Definition Data gathering Methodology Data preprocessing Business Data analysis Governments Data anonimization Live example Can we protect from Big Brother? The need of privacy Exist several services to be anonymous and have privacy But for individuals: Is data privacy/security a real concern? Anonimization and securization services provide too much overhead? Is there a problem of usability in anonimization and security services? To improve (a little) the individual privacy we propose these practical steps: Use encrypted connections (i.e. https) Anonymous internet connection http://www.torproject.org/ Gavalda,Ortega Media Monitoring
    20. Outline Definition Methodology Business Governments Live example How many money moves Media monitoring? Directly moves billions of Dollars http://www.google.com/finance?q=Media+monitoring http://www.google.com/finance?q=Media+intelligence Indirectly is unknown Gavalda,Ortega Media Monitoring
    21. Outline Definition Methodology Business Governments Live example Which software is available? Snap Stream: Beyond TV starting at 2000$ http://www.snapstream.com/ Clip in One http://www.clipinone.com/ http://peoplebrowsr.com/ Gavalda,Ortega Media Monitoring
    22. Outline Definition Methodology Business Governments Live example Media monitoring in governments Echleon UK-USA, Australia, Canada and New Zeland European Union recognized it in an official document Radio, satellite, microwave, cellular or fiber-optic. Indect European Union Automatic detection, recognition and intelligent processing of all information of abnormal behavior or violence. Others France Onyx: Switzerland Gavalda,Ortega Media Monitoring
    23. Outline Definition Methodology Business Governments Live example Do It Yourself media monitoring http://www.google.com/alerts http://www.google.com/trends And here the dream team goes! Gavalda,Ortega Media Monitoring
    24. Outline Definition Methodology Business Governments Live example References “Data Clustering: A Review” 1999 Wikipedia Google finance Ted talks http://www.ted.com/talks/bruce_bueno_de_ mesquita_predicts_iran_s_future.html Online review of basic and multivariate statistics, data clustering, machine learning and more http://www.statsoft.com/textbook/stathome.html Gavalda,Ortega Media Monitoring
    25. Outline Definition Methodology Business Governments Live example Acknowledgments We would like to thanks our friends, girlfriends and families for their unconditional help and patience Gavalda,Ortega Media Monitoring
    26. Outline Definition Methodology Business Governments Live example Thank you!! arnau.gavalda@gmail.com carlos.ortegah@estudiants.urv.cat Gavalda,Ortega Media Monitoring

    + masamunemasamune, 2 months ago

    custom

    136 views, 1 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 136
      • 136 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 1
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories