Evolving social data mining and affective analysis

  • 489 views
Uploaded on

Evolving social data mining and affective analysis methodologies, framework and applications - Web 2.0 facts and social data …

Evolving social data mining and affective analysis methodologies, framework and applications - Web 2.0 facts and social data
Social associations and all kinds of graphs
Evolving social data mining
Emotion-aware social data analysis
Frameworks and Applications

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
489
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Starting from a k -clique, nodes are attached as long as they are reachable through clique adjacency (two k -cliques are adjacent if they share k -1nodes)
  • To sentiment analysis kai opinion mining ekfrazoun to idio. Me vasi to wikipedia Sentiment analysis or opinion mining refers to the application of natural language processing , computational linguistics , and text analytics to identify and extract subjective information in source materials. Apo dw kai katw krataw ton oro sentiment analysis
  • Ta relational networks anaparistoun to poia entities anaferontai mazi (nodes), to poso suxna anaferontai mazi (mikos akmwn), kai to poso prosfata xrhsimopoihthikan mazi (entasi tou edge)

Transcript

  • 1. EVOLVING SOCIAL DATA MINING ANDAFFECTIVE ANALYSISMETHODOLOGIES, FRAMEWORKANDAPPLICATIONSProf. Athena VakaliIDEAS 2012, Prague, August 8th2012
  • 2. Presentation Outline Web 2.0 facts and social data Social associations and all kinds of graphs Evolving social data mining Emotion-aware social data analysis Frameworks and Applications
  • 3.  Web 2.0 facts and social data evolution & characteristics is there hidden information there? motivation for social (evolving) data mining Social associations and all kinds of graphs Emotion-aware social data analysis Frameworks and Applications
  • 4.  Web 2.0 has become a source of vast amounts of social dataevolving at fast rates Users participate massively in Web 2.0 applications such as: social networking sites (e.g. ) blogs, microblogs (e.g. ) social bookmarking/tagging systems (e.g.) Social data refer to different types of entities users content handled separately or together metadataassociated via various types of interactions /relationshipsWeb 2.0 facts & social data
  • 5. SOCIAL DATA SOURCES: is there hiddeninformation here? Web 2.0 users act explicitly by declaring their associations Relationships may be multipartite (e.g. user u1 making acomment on the post p of user u2) but are usuallysimplified in bipartite associations between some involvedentities However, explicit associations generate implicit threads ofrelationships which: are triggered by users’ common activities may hold among the non-user entities (resources,user-user“is frie nd with”user-user“is frie nd with”user-resource“like s”user-resource“like s”user-metadata“cre ate s tag ”user-metadata“cre ate s tag ”user-group“be lo ng s to ”user-group“be lo ng s to ”
  • 6. Web 2.0 relations initiated by users’ onlinebehaviorIRinferencemechanism
  • 7. Implicit relations uncovered for various types ofentities Numerous relations might unfold … however only some of them areselected based on analysis’ focus or application’s context user-user IRs such as: “like sam e re so urce ”, “use sam e tag ” can beleveraged for studying behavioral patterns (e.g. in applications likeFlickr) tag-tag IRs such as: “are assig ne d o n sam e re so urce ” can beleveraged for tag clustering… g e ne ratio n m e chanism s
  • 8. Motivation forSocial DataMining The availability of massive sizes of data gave new impetus todata mining. e.g. more than active 500 million active Facebookusers,sharing on average more than 30 billion pieces of contenteach month [Facebook Statistics 2011] Mining social web data can act as a barometer of the users’opinion. Non-obvious results may emerge. Collaboration and contribution of many individualsformation of collectiveintelligence Wisdomof thecrowd: more accurate, unbiased source ofinformation. Social data mining results can be useful for applications suchas recommender systems, automatic event detectors, etc
  • 9. Static vs evolving data mining Social data interactions are constantly updating in fastrates however, a data mining approach could be static or evolvingstatic mining approaches: aggregateall social interactions over a specificperiod and deal with them as a uniquedatasetevolving mining approaches:track and exploit more fine-grained & “richer” informationdynamic data mining:emphasis placed on dataevolution and not onaggregation data modeling with a giventime granularity which affectsthe amount of detailscontained in the datasetstreaming data mining:new user activity data received in astreaming fashiontime-aware data approximatingmodel incrementally created andmaintained, subject to time andspace constraintsmodel readapted on arrival of eithersingle update or batch of updates
  • 10. Motivation forEvolving Social DataMining Identifying over time the events that affect socialinteractions tracking posts in a micro-blogging website to identifyfloods, fires, riots, or other events and inform the public Highlighting trends in users’ opinions, preferences, etc companies can track customers’ opinions and complaintsin a timely fashion to make strategic decisions Tracking the evolution of groups (co m m unitie s) ofusers or resources, finding changes in time andcorrelations develop better personalized recommender systems toimprove user experience
  • 11.  Social data in the Web 2.0 Social associations and all kinds of graphs structures for static social data evolving data representation structures Evolving social data mining Emotion-aware social data analysis Frameworks and Applications
  • 12. The ne two rk m o de las an o bvio us cho ice …Social data are interconnected through associationsforming a networkor graph G(V, E), where Vis the set ofnodes and E is the set of edges. nodes represent entities/objects and edgesrepresent relations different types of nodes and edges weighted/unweighted directed/undirectedSocial associations and all kinds ofgraphs
  • 13. … the multi-graph structuresA hypergraphexample
  • 14. Structures forstatic social data Hypergraph: generalization of a graph where anedge (hype re dg e ) connects more than two nodes[Brinkmeier07] Folksonomy: lightweight knowledge representationemerging from the use of a shared vocabulary tocharacterize resources – tripartite hype rg raph[Hotho06, Mika05] Projection on simple graphs to lower complexity [AuYeung09] further simplifications in bipartite & unipartite g raphs e.g. tag-tag network where two tags are connected ifassigned to the same resource Simple graphs’ structure can be encoded in antripartite graph
  • 15. Folksonomy projections on simplegraphs
  • 16. Evolving data representationstructures Need for modeling the different data states in successive time-steps, often determined by the data’s sampling ratet1 t2 t3 t4 … tk-1 tkG1 G2 G3 G4 … Gk-1 Gkseg1 seg2 … segmthe graph stream asa sequence ofsnapshot graphs: G= {Gt }, t∈ℕGgraphsnapshotstimelinesegmentsgraph stream
  • 17. t1 t2 t3 t4 … tk-1 tkG1 G2 G3 G4 … Gk-1 Gkseg1 seg2 … segmThe snapshot layerGadjacency/similaritymatricesfolksonomiesgraphsnapshotstimelinesegmentsgraph streamthe graph stream asa sequence ofsnapshot graphs: G= {Gt }, t∈ℕ
  • 18. t1 t2 t3 t4 … tk-1 tkG1 G2 G3 G4 … Gk-1 Gkseg1 seg2 … segmThe segment layerGgraphsnapshotstimelinesegmentsgraph streamthe graph stream asa sequence ofsnapshot graphs: G= {Gt }, t∈ℕPre-processing techniqueIdentify graph segments consisting ofsimilar snapshots and compute asmooth graph approximation for eachsegment [Yang09]adjacency/similarity matricesmore coarse-grainedtime-aggregated matrices areused for emphasizing on mostrecent edges [Tong08]tensors: generalization of matrices (> 2dimensions) [Sun06]
  • 19. t1 t2 t3 t4 … tk-1 tkG1 G2 G3 G4 … Gk-1 Gkseg1 seg2 … segmThe streamlayerGgraphsnapshotstimelinesegmentsgraph streamthe graph stream asa sequence ofsnapshot graphs: G= {Gt }, t∈ℕ tensors“multi-graphs” with edgesencoding relations as well astemporal information [Zhao07]time aggregate adjacencymatrices [Tong08]
  • 20. References[Giatsoglou12] M. Giatsoglou, A. Vakali. Capturing Social Data Evolution via Graph Clustering. InIEEE Internet Computing, IEEE computer Society Digital Library. IEEE Computer Society. DOI:10.1109/MIC.2012.24, Feb. 2012.Au Yeung09] Au Yeung, C.M., Gibbins, N., and Shadbolt, N. 2009. Contextualising Tags inCollaborative Tagging Systems. In Proceedings of 20th ACM Conference on Hypertext andHypermedia, pp. 251-260.[Brinkmeier07] Brinkmeier, M., Werner, J., and Recknagel, S. 2007. Communities in graphs andhypergraphs. CIKM07. ACM, 869-872.[Hotho06] Hotho, A., Robert, J., Christoph, S., and Gerd, S. 2006. Emergent Semantics inBibSonomy. GI Jahrestagung Vol. P-94, 305–312. Gesellschaft fr Informatik.[Mika05] Mika, P. 2005. Ontologies Are Us: A Unified Model of Social Networks and Semantics. InProceedings of the 4th international SemanticWeb Conference. ISWC’05. Springer Berlin/Heidelberg, pp. 522-536.[Sun07] Sun, J., Faloutsos, C., Papadimitriou, S., and Yu, P. S. 2007. GraphScope: parameter-freemining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD internationalConference on Knowledge Discovery and Data Mining . KDD 07. ACM, 687-696.[Tong08] Tong, H., Papadimitriou, S., Yu, P. S., and Faloutsos, C. 2008. Proximity tracking on time-evolving bipartite graphs. In Proceedings of the 9th SIAM international Conference on Data Mining .
  • 21. Presentation Outline Social data in the Web 2.0 Social associations and all kinds of graphs Evolving social data mining the clustering approach Applications Emotion-aware social data analysis Frameworks and applications
  • 22. Web 2.0 social data mining : The genericworkflownext, focus on clustering and community detection
  • 23. Evolving social data clustering/community detectionapproachesPreliminary efforts followed the CommunityMapping -CM- approach:1.application of traditional community detection algorithmson individual static graph snapshots2.identification of correlations and mapping betweensuccessive snapshots’ communities using specialsimilarity measures & temporal smoothing techniquesLimitation: tend to find community structures withhigh temporal variation (due to real worldambiguous & noisy data)
  • 24. Evolving social data clustering/community detectionapproachesEvolutionary community identification approaches: utilizecommunity structure’s history to maximize temporalsmoothness and lead to smoother community evolutiontraditional clustering revisited foran evolutionary setting (TCRapproach): modification of existing static clustering algorithms to havememory of previous states of the data;spectral clustering (SC approach): uses the spectrum of the graph’ssimilarity matrix to perform dimensionality reduction for clustering in fewerdimensions;non-negative matrix/tensorfactorization (NFC approach): apply NFC fordiscovering communities jointly maximizing the fit to the observed data &the temporal evolution;graph streamsegment identification & community structure detection
  • 25. CM approachPalla2008undirected graphsnapshotsI. co-authorshipnetworkII. mobilephone callnetworkI. 142 monthsII. 52 weeks(26tim e slo ts)I. {30K authors}{“are co -autho rs ” }II. {4M users} {“calls ”}community detectionbased on cliq uepe rco latio n;successivecommunities mappingby relative nodes’overlap;Lin2007directedgraphsnapshotsdata crawledfrom 407 blogs63 weeks {275K bloggers}{149K “re plie s to po sto f”}hypothesis: use rs’mutualawareness (e . g .blo g g e rs co m m e ntingo n e ach o the r’s blo g )drive s co m m unityfo rm atio n;mutual awareness’expansion as arandom walk processleads to communitydetectionmethod structuredatasetoutcome/scopesource time-period entities / relations
  • 26. TCRapproachChakrabarti2006(time-aware)similaritymatrixphotosharingservice68 weeks {5K tags}{“are applie d o n sam ere so urce ”}jointoptimization of2 criteria:snapshotqualityhistory quality;genericframework with2 instantiationsproposed:agglomerativehierarchicalalgorithmK-meansmethod structuredatasetoutcome/scopesource time-period entities / relationsSimultaneous optimization of two potentially conflicting criteria:(i) snapshot quality sq, and (ii) history quality hqAt each time-step the framework finds a clustering based on the new similarity matrix Mtand the so far historyEvolutionary clustering in an online setting
  • 27. SC approachTang2008series ofnetworksnapshots;interactionmatrix for eachsnapshotI. mailexchangenetworkII. co-authorshipnetworkI. 12monthsII. 25 yearsI. {2.4K users}{emails}{36.7K words}{“se ndse m ail”}{“re ce ive se m ail”}{“co ntainste rm ”}II. {492K papers}{347K authors}{2.8K venues}{9.5K titleterms} {“write s”}{“participate sin”} {“ispublishe d in”}communityevolution indynamic networksof multiple socialentities;iterativeapproximation ofcommunityevolution usingeigenvectorcalculation andK-means clusteringmethodstructure ormodeldatasetoutcome/scopesource period entities / relationsSpectral clustering uses the spectrum of thegraph’s similarity matrix toperform dimensionality reduction for clusteringin fewer dimensionsEvolutionary spectral clustering appliedin multi-mode networks
  • 28. NFC approachLin2009metagraph:hypergraph withnodesrepresentingface ts and edgesmultipartiteinteractions;tensorssocialbookmarking servicewith votingcapabilities27 days(9tim e slo ts){users} {posts} {keywords}{topics}{152K “po st-ke ywo rd-to pic”}{56.4K “is frie nd with”}{44K“uplo ads ”}{1.2M “vo te s o n”}{242K “use r-po st-co m m e nt”}{94.6K “re plie s with”}communityextraction via time-stamped tensorfactorization;on-line methodhandling time-varying relationsthrough incrementalmetagraphfactorization;communities derivedby jointly leveragingall types ofmultipartite relationsmethod structure or modeldatasetoutcome/scopesource period entities / relations
  • 29. SGC approachSun2007unweightedundirectedbipartitegraphs;graphsegmentsI. mailexchangenetworkII. mobile phonecall networkIII. mobiledeviceproximityrecordsI. 165weeksII. 46 weeksIII. 46 weeksI. {34.3K senders}{34.3K recipients}{15K“se nd m ail” / week}II. {96 callers} {3.8Kcallees}{ 430”calls”/ week}III. {96 users } {96 users}{689 “is lo cate dne ar”/week}unparametricmethod basedon MinimumDescriptionLength;eachsegment’ssource anddestinationnodes arepartitionedseparately todecrease cost;compressedgraph in < 4%than originalspacemethod structuredatasetoutcome/scopesource period entities / relations
  • 30. References (1/2)[Chakrabarti06] Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proc. ofKDD 06. ACM, New York, NY, 554-560.[Tang08] Tang, L., Liu, H., Zhang, J., and Nazeri, Z. 2008. Community evolution in dynamic multi-modenetworks. In Proc.. KDD 08. ACM, New York, NY, 677-685.[Palla05] Palla, G., Derény, I., Farkas, I., and Vicsek, T. 2005. Uncovering the overlapping communitystructure of complex networks in nature and society. Nature 435, 814–818.[Palla07] Palla, G., Barabási, A.-L., and Vicsek, T. 2007. Quantifying social group evolution. Nature446, 664-667.[Lin07] Lin, Y., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. L. 2007. Blog Community Discoveryand Evolution Based on Mutual Awareness Expansion. In Proceedings of the IEEE/WIC/ACMinternational Conference on Web intelligence. Web Intelligence. IEEE Computer Society, Washington,DC, 48-56.[Lin09] Lin, Y., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Kelliher, A. 2009. MetaFac:community discovery via relational hypergraph factorization. KDD09. ACM, 527-536.[Sun07] Sun, J., Faloutsos, C., Papadimitriou, S., and Yu, P. S. 2007. GraphScope: parameter-freemining of large time-evolving graphs. In Proc. of KDD 07. ACM,, 687-696.[Quack08] Quack, T., Leibe, B., and Van Gool, L. 2008. World-scale mining of objects and events fromcommunity photo collections. In Proc. of CIVR 08. ACM, 47-56.
  • 31. References (2/2)[Zhao07] Zhao, Q., Mitra, P., and Chen, B. 2007. Temporal and information flow based event detectionfrom social text streams. In Proc. of the 22nd National Conference on Artificial intelligence - Volume 2.Aaai Conference On Artificial Intelligence. AAAI Press, 1501-1506.[Duan09] Duan, D., Li, Y., Jin, Y., and Lu, Z. 2009. Community mining on weighted directed graphs.In Proc. of CNIKM 09. ACM, New York, NY, 11-18.[Chi06] Chi, Y., Tseng, B. L., and Tatemura, J. 2006. Eigen-trend: trend analysis in the blogospherebased on singular value decompositions. In Proc of CIKM 06. ACM, 68-77.[Papadopoulos11] Papadopoulos, S., Zigkolis, C., Kompatsiaris, Y., and Vakali A. 2011. Cluster-BasedLandmark and Event Detection for Tagged Photo Collections. IEEE Multimedia, pp. 52-63.[Java07] Java, A., Song, X., Finin, T., and Tseng, B. 2007. Why we twitter: understanding microbloggingusage and communities. WebKDD/SNA-KDD07. ACM, 56-65.[Jansen,09] Jansen, B. J., Zhang, M., Sobel, K., and Chowdury, A. 2009. Twitter power: Tweets aselectronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60, 11, 2169-2188.[Sankaranarayanan 09] Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., and Sperling,J. 2009. TwitterStand: news in tweets. GIS09. ACM, 42-51.
  • 32. Why focusing on time as acriterion ? typical analysis involves “static”views (users-tags) events, trends affect user interests users Tagging Behavior changesover time Time is a fundamental dimensionin analysis of users and tags in asocial tagging systeme.g. : prediction of first weekend box-office revenues using tweets
  • 33. Many times, a user’s targeted interest ishidden in the general tagging activity….Accessories,bags,fashion,Cars, football,holidays, horses,sea, turkey, fashionNew York, hat,trousers,fashion, Guccianimals, elephants,nature sea, turkey,bagshats,Guccifashion,jeans, NYUser 1 User 2 User 3fashionweek,fashion, silk,wool
  • 34. Time-aware user/tagclustering
  • 35. Related Approaches Sun and colleagues [Sun08] use the χ2statistical model, todetermine whether the appearance of tag t in a tim e fram e iis sig nificant and, thus, to disco ve r tag s that co nstitute“topics of interest" at particular time frames. Wetzker and colleagues [Wetzker08] claim that a tagsignifies a trend, if it attracts significantly more new users ina currently monitored time frame than in past time frames. A trend detection measure is introduced in [Hotho06], whichcaptures topic-specific trends at each time frame and isbased on the weight-spreading ranking of the PageRankalgorithm
  • 36. Use Cases Capturing trends, interests, periodic activitiesof users in specific time periods Community-based tag recommendation Personalization (time-aware user profiles) Fighting spam on social web sites (bydiscriminating regular and occasional users)
  • 37.  Social data in the Web 2.0 Social associations and all kinds of graphs Evolving social data mining In & out zooming on time aware user/tag clusters Emotion-aware social data analysis User generated data The idea for emotion aware clustering Application on micro-blogging sources Frameworks and applications
  • 38. The idea for emotion-awareanalysisClustering methods embed various criteria such as:semantics, tags, time, geo-information ... etcbutsince social sources are driven and managed byhumansemotion/sentimentMUST also be considered …Clustering methods embed various criteria such as:semantics, tags, time, geo-information ... etcbutsince social sources are driven and managed byhumansemotion/sentimentMUST also be considered …
  • 39. The nature of evolving socialdata Two main types of textual information. Facts and Opinions Note: factual statements can imply opinions too. Most current text information processingmethods (e.g., web search, text mining) workwith factual information. Sentiment analysis or opinion mining computational study of opinions, sentiments andemotions expressed in text. Why sentiment analysis now? Mainly becauseof the Web; huge volumes of opinionated text.
  • 40. User-generated data (I)Opinions are important because whenever we need to makea decision, we’re influenced by others’ opinions. According to [Horrigan09] of more than 2000 American adults: 81% of Internet users (or 60% of Americans) have done onlineresearch on a product at least once; among readers of online reviews of restaurants, hotels, andvarious services (e.g., travel agencies or doctors), between 73%and 87% report that reviews had a significant influence on theirpurchase; 32% have provided a rating on a product, service, or person viaan online ratings system, and 30% have posted an onlinecomment or review regarding a product or service
  • 41. user-generated data (II) Word-of-mouth on the Web User-generated media: One can express opinions on anything inreviews, forums, discussion groups, blogs ... Opinions of global scale: No longer limited to: Individuals: one’s circle of friends Businesses: Small scale surveys, tiny focus groups, etc. Why affect/sentiment analysis? Customers: need peer opinions to make purchase decisions Business providers: need customers’ opinions to improve product need to track opinions to make marketing decisions Social researchers: want to know people’s reactions aboutsocial events Government: wants to know people’s reactions to a new policy Psychology, education, etc.
  • 42. More Applications Product review mining: What features of the iPhonedo customers like and which do they dislike? Review classification: Is a review positive or negativetoward the iPhone? Tracking sentiments toward topics overtime: Is angerratcheting up or cooling down? Prediction (election outcomes, market trends): WillClinton or Obama win?
  • 43. Why Extracting sentiments from Web 2.0data sources? Web 2.0 data features Easy to collect: huge amount, clean format Broadly distributed: demographics Topic diversified: free discussion about any topic/product/event Opinion rich: highly personalized Distributed over time, user generated content Motivation Sentiment is a very natural expression of a human being. Sentiment Analysis aims at getting sentiment-related knowledgeespecially from the huge amount of information on the internet Can be generally used to understand opinion in a set ofdocuments or user generated content
  • 44. Challenges Contrasts with Standard Fact-Based Textual Analysis typically, text categorization seeks to classify documents bytopic BUT nature, strength of feelings, degree of positivity, etcimposes a tailored sentiment categorization Key Factors that Make Sentiment Analysis challenging choosing the right set of keywords might be less trivial thanone might initially think; Sentiment and subjectivity are quite context-sensitive, and, ata coarser granularity, quite domain dependent e.g., “go read the book” most likely indicates positive sentiment forbook reviews, but negative sentiment for movie reviews. Web users postings are of a challenging nature, since there isno code in expressions
  • 45. so far … Lexical Resources Se ntiWo rdne t Built on the top of WordNet synsets Attaches sentiment-related information with synsets SentiWordNet assigns to each synset of WordNet threesentiment scores: positivity, negativity, objectivity Ge ne ralInq uire r Included are manually-classified terms labeled with varioustypes of positive or negative semantic orientation, andwords having to do with agreement or disagreement. O pinio nFinde r’s Subje ctivity Le xico n OpinionFinder is a system that performs subje ctivityanalysis , automatically identifying when opinions,sentiments, speculations, and other private state s arepresent in text.
  • 46. so far … Lydia System Lydia [Lioyd05] news analysis system does adaily analysis of over 1000+ online Englishnewspapers, Blogs, RSS feeds, and other newssources. It identifies who is being talked about, by whom,when and where? Applications of Lydia heatmap generation (pos/negfor a topic); relational networks
  • 47. so far … integration of news &blogs Bautin, Vijayarenu and Skiena [Bautin08]presented an approach for the internationalanalysis for news and blogs … still on thepositive/negative side … Cross-language analysis across news streamsPolarity score of London in Arabic,German, Italian and Spanish over the May1-10, 2007 period.
  • 48. so far … sentiment-awaresearching Sentiment Analysis forSemantic Enrichment ofWeb Search Results[Demartini10] the first few results arepresentative sample ofthe entire result setAverage Sentiment score in top N results for 3 search engines
  • 49. so far … beyond pos/neg : the affectanalysis it involves several affects at the sametime. affect classes may be correlated oropposed. Abbasi, Chen Thoms and Fu[Abbasi08] proposed a support vectorregression correlation ensemble(SVRCE) method for text-based affectclassification. affect feature and techniquecomparison. apply to multi-domain.I cannot stand you!hate angryI cannot standyou!Affect class IntensityHappiness 0.01Sadness 0.03Anger 0.6Hate 0.5
  • 50.  Social data in the Web 2.0 Social associations and all kinds of graphs Evolving social data mining In & out zooming on time aware user/tag clusters Emotion-aware social data analysis Frameworks and applications Most popular applications A mining and analysis framework Social data analysis on the cloud Emotions capturing in microblogs
  • 51. clustering of usersexploiting the timedimensionApplications of Mining Evolving SocialDataThe results of community detection, or different miningtechniques, on evolving social data can be exploited inapplications:social network analysisim ag e from [Touchgraph]event detectiondiag ram fro m [Sun07]trend detectionim a g e from [Trendsmap][Touchgraph] http: //www. to uchg raph. co m /TG Face bo o kBro wse r. htm l[Trendsmap] http: //tre ndsm ap. co m /
  • 52. Event detectionGraph segmentationcommunity detectionmethods [Sun07, Duan09]identify events assignificant change-timepoints in thestream.Definition of eventinformation flow between a group of socialactors on a specific topic over a certain timeperiod [Zhao07]occasions which take place at a specifictime and location (concerts, festivals, etc.)[Quack08]Event & landmarkdetection from geo-taggeddata [Papadopoulos11]detection of image clusters using visual & tag-based similaritiesclassification of clusters as events or landmarks Landmarks differ from events on the number ofusers tagging photos over different timegranularities exploitation of similarities with disjoint sets oflandmark- and event- related tagsevent detection fromsocial datastreams [Zhao07]features exploration in 3dimensions: textual content, social,temporalgeneration of multiple intermediateclustering structures using content-based similarity & information flowpatternsevents as groups of nodes closelyrelated in time & topicevent detection from social datastreams [Zhao07]features exploration in 3dimensions: textual content, social,temporalgeneration of multiple intermediateclustering structures using content-based similarity & information flowpatternsevents as groups of nodes closelyrelated in time & topic
  • 53. Trend tracking Social data fluctuate in structure and frequency as they evolve and over time sometopics, images, tags, etc, become most popular amongst users.Trends can be identified by adata mining approach g lo bally or lo cally (within communities) and they usuallyindicate what interests users the most at a given time.trend detection in blogs [Chi06] focused on: keyword popularity in successivetimeframes detection of different topics relating to akeyword contribution of individual users to a trend uses the results of SingularValueDecomposition as trend indicatorscapturing both temporal datachanges & bloggers’ characteristics exploits textual content and citationsbetween blogstrend detection in Twitter[Java07],[Jansen09],[Sankaranarayanan09] several attempts using statisticalanalysis methods analysis of evolving Twitter data toidentify trending keywords fordifferent weekdays sentiment identification on tweetsto identify trending sentimentsabout brands online clustering on streamingtweets, combined withclassification, to identify breakingnewstrend detection in Twitter[Java07],[Jansen09],[Sankaranarayanan09] several attempts using statisticalanalysis methods analysis of evolving Twitter data toidentify trending keywords fordifferent weekdays sentiment identification on tweetsto identify trending sentimentsabout brands online clustering on streamingtweets, combined withclassification, to identify breakingnews
  • 54. A mining and analysis framework
  • 55. 2 indicative applicatio ns Cloud4TrendsLeveraging the cloud infrastructure forlocalized real-time trend detection in socialmedia CapturEmosCapturing emotional patterns in micro-blogging data streamsAristotle University, OSWINDS Group
  • 56. Cloud4Trends is a microblogging & blogging localized contentcollection and analysis frameworkfor detecting currently populartopics of users’ interestCloud4Trends is a microblogging & blogging localized contentcollection and analysis frameworkfor detecting currently populartopics of users’ interest Social media reflect societal concerns exhibiting‘bursts’ of content generation on the occurrence ofevents po pular to pics / inte re sts fluctuate with tim e Challenging for both computer scientists &application developers to reach unbiased,meaningful conclusions about trendingusers’opinion and interestsCloud4Trends - Motivation
  • 57. Challenges & Outcome Massive content sizes andunpredictable content generationrates : scalable analysis is needed Trending topics should bediscovered when they are “fresh” : an on-line analysis approach isdemanded Trends should be meaningful need for contextual trends Content is dispersed in multiplesources : trend detection needs a combinedapproachThe Cloud deployment ofthe Cloud4Trendsscenario with use of theVENUS-C servicesverified that Cloud-basedarchitectures are a viablesolution for online webdata mining applicationsthat are beneficial forboth researchers andentrepreneurs.
  • 58. http://clo ud4tre ndsde m o . clo udapp. ne t
  • 59. Emotional Aware Clustering on Micro-Blogging Sources (affect analysis)K. Tsagkalidou, V. Koutsonikola, A.Vakali and. K. Kafetsios: EmotionalAware Clustering on Micro-BloggingSources, accepted for publication,Affective Computing and IntelligentInteraction 2011
  • 60. our emotional dictionary create an extended emotional dictionary byenriching an opinion lexicon provided by theUMBC university with synonymous words fromWordNet
  • 61. The used affect space representing the extreme ends of four emotional pairs[Gill08] emotion exemplar words
  • 62. Relevance between tweet andemotion Semantic similarity the maximum semantic similarity between eachof the tweet’s wo rds and the emotion’srepresentatives, as defined in Wordnet Sentiment similarity expresses a word’s emotional intensity as definedin the extended dictionary Overall similarity between a tweet & anemotion Σi=1…|words|Sem(wi,emotion)*Sent(wi)|wi|: Sent(wi)≠0, for 1<=i<=|words|
  • 63. • capturing and understanding crowd’s emotionsfor a particular topic or product in an implicitmanner via computational methods.• se ntim e nt analysis & m icro blo g g ing (statistical)pro ce ssing : emphasis on affective and opinionmining, lexicon-based processing, knowledgeextraction techniques;• de ve lo pm e nt o f applicatio ns: web applicationenhanced with of crowds emotions visualizationcapabilities.CapturEmos- MotivationCapturEmos- Motivation
  • 64. The application/serviceuseful for capturing brandingsuccess & diffusion in themarket, as expressed by thecrowds emotionsOur innovation principle :focus on the “affect” which isdistinguished from discreteemotions.discrete emotions : concernaffective reactions in relationto one’s goalsaffect refers to : anoverarching positive ornegative valence of one’sfeelings.http://oswinds2.csd.auth.gr/~vkoutson/EmoGlobe/
  • 65. Niche targeted Marketspolitical stakeholders, public authorities (e.g. municipalities),consumer behavior policy makers, chambers of commerce,tourism and infotainment sectors…markets characteristics : emerging, unpredicted bursts, multi-profilesChallenges : multi-lingual support; privacy and anonymitypreservation;real-time emerging data flows; time and space complexities;Markets & challengesproposed framework and applications :• address wide stakeholders and markets audiences;• certain tasks can be realized (e.g. capturing branding success &diffusion in the market) expressed by the crowds emotions;• can support policy and decision making.
  • 66. Future work and horizon ... emerging, and unpredicted bursts detectionsin evolving social media; user multi-profiles patterns; support applications with multi-lingual support; privacy and anonymity preservation development of intelligent and collectiveinformation retrieval techniques are requiredand well expected.
  • 67. References[Horrigan09] J. A. Horrigan, “Online shopping,” Pew Internet & American Life Project Report, 2008.[comScore07] comScore/the Kelsey group, “Online consumer-generated reviews have significant impact on offlinepurchase behavior,” Press Release, http://www.comscore.com/press/release.asp?press=1928, November 2007.[Lioyd05] Lloyd, L., Kechagias, D., Skiena, S.: Lydia: A system for large-scale news analysis. In: String Processingand Information Retrieval (SPIRE 2005). Volume Lecture Notes in Computer Science, 3772. (2005) 161-166[Bautin08] M. Bautin, L. Vijayarenu, S. Skiena : International sentiment analysis for News and Blogs, In Proceedingsof ICWSM (2008)[Demartini10] G. Demartini and S. Siersdorfer: Dear Search Engine: What’s youropinion about...? Sentiment Analysisfor Semantic Enrichment of Web Search Results, In Proc. Of WWW201 0, April 2630, 2010[Abbasi08] A. Abbasi, H. Chen, S. Thoms, T. Fu: Affect Analysis of Web Forums and Blogs Using CorrelationEnsembles, IEEE Transactions on Knowledge and Data Engineering (2008) Vol. 20, Issue 9[OM] Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences, Bing Liu Department of Computer ScienceUniversity Of Illinois at Chicago[Sun 08] Sun A, Zeng D, Li H, Zheng X (2008) Discovering trends in collaborative tagging systems. In: Proceedings ofthe IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics,Springer, pp 377-383[Wetzker08] Wetzker R, Plumbaum T, Korth A, Bauckhage C, Alpcan T, Metze F (2008a) Detecting trends in socialbookmarking systems using a probabilistic generative model and smoothing. In: Proceedings of 19thInternational Conference on Pattern Recognition (ICPR 2008), IEEE, pp 1-4[Hotho06] Hotho A, Jaschke R, Schmitz C, Stumme G (2006a) Information retrieval in folksonomies: Search andranking. In: Proceedings of the 3rd European Semantic Web Conference, Springer, Budva, Montenegro, LNCS,vol 4011, pp 411-426[Gill08] Gill, A.J., French R.M., Gergle, D., and Oberlander, J.: Identifying Emotional Characteristics from Short BlogTexts. Proc. of the 30th Annual Conf. of the Cognitive Science Society, Washington DC (2008) 2237–2242