Your SlideShare is downloading. ×
Evolving social data mining and affective analysis
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Evolving social data mining and affective analysis


Published on

Evolving social data mining and affective analysis methodologies, framework and applications - Web 2.0 facts and social data …

Evolving social data mining and affective analysis methodologies, framework and applications - Web 2.0 facts and social data
Social associations and all kinds of graphs
Evolving social data mining
Emotion-aware social data analysis
Frameworks and Applications

Published in: Technology

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Starting from a k -clique, nodes are attached as long as they are reachable through clique adjacency (two k -cliques are adjacent if they share k -1nodes)
  • To sentiment analysis kai opinion mining ekfrazoun to idio. Me vasi to wikipedia Sentiment analysis or opinion mining refers to the application of natural language processing , computational linguistics , and text analytics to identify and extract subjective information in source materials. Apo dw kai katw krataw ton oro sentiment analysis
  • Ta relational networks anaparistoun to poia entities anaferontai mazi (nodes), to poso suxna anaferontai mazi (mikos akmwn), kai to poso prosfata xrhsimopoihthikan mazi (entasi tou edge)
  • Transcript

    • 2. Presentation Outline Web 2.0 facts and social data Social associations and all kinds of graphs Evolving social data mining Emotion-aware social data analysis Frameworks and Applications
    • 3.  Web 2.0 facts and social data evolution & characteristics is there hidden information there? motivation for social (evolving) data mining Social associations and all kinds of graphs Emotion-aware social data analysis Frameworks and Applications
    • 4.  Web 2.0 has become a source of vast amounts of social dataevolving at fast rates Users participate massively in Web 2.0 applications such as: social networking sites (e.g. ) blogs, microblogs (e.g. ) social bookmarking/tagging systems (e.g.) Social data refer to different types of entities users content handled separately or together metadataassociated via various types of interactions /relationshipsWeb 2.0 facts & social data
    • 5. SOCIAL DATA SOURCES: is there hiddeninformation here? Web 2.0 users act explicitly by declaring their associations Relationships may be multipartite (e.g. user u1 making acomment on the post p of user u2) but are usuallysimplified in bipartite associations between some involvedentities However, explicit associations generate implicit threads ofrelationships which: are triggered by users’ common activities may hold among the non-user entities (resources,user-user“is frie nd with”user-user“is frie nd with”user-resource“like s”user-resource“like s”user-metadata“cre ate s tag ”user-metadata“cre ate s tag ”user-group“be lo ng s to ”user-group“be lo ng s to ”
    • 6. Web 2.0 relations initiated by users’ onlinebehaviorIRinferencemechanism
    • 7. Implicit relations uncovered for various types ofentities Numerous relations might unfold … however only some of them areselected based on analysis’ focus or application’s context user-user IRs such as: “like sam e re so urce ”, “use sam e tag ” can beleveraged for studying behavioral patterns (e.g. in applications likeFlickr) tag-tag IRs such as: “are assig ne d o n sam e re so urce ” can beleveraged for tag clustering… g e ne ratio n m e chanism s
    • 8. Motivation forSocial DataMining The availability of massive sizes of data gave new impetus todata mining. e.g. more than active 500 million active Facebookusers,sharing on average more than 30 billion pieces of contenteach month [Facebook Statistics 2011] Mining social web data can act as a barometer of the users’opinion. Non-obvious results may emerge. Collaboration and contribution of many individualsformation of collectiveintelligence Wisdomof thecrowd: more accurate, unbiased source ofinformation. Social data mining results can be useful for applications suchas recommender systems, automatic event detectors, etc
    • 9. Static vs evolving data mining Social data interactions are constantly updating in fastrates however, a data mining approach could be static or evolvingstatic mining approaches: aggregateall social interactions over a specificperiod and deal with them as a uniquedatasetevolving mining approaches:track and exploit more fine-grained & “richer” informationdynamic data mining:emphasis placed on dataevolution and not onaggregation data modeling with a giventime granularity which affectsthe amount of detailscontained in the datasetstreaming data mining:new user activity data received in astreaming fashiontime-aware data approximatingmodel incrementally created andmaintained, subject to time andspace constraintsmodel readapted on arrival of eithersingle update or batch of updates
    • 10. Motivation forEvolving Social DataMining Identifying over time the events that affect socialinteractions tracking posts in a micro-blogging website to identifyfloods, fires, riots, or other events and inform the public Highlighting trends in users’ opinions, preferences, etc companies can track customers’ opinions and complaintsin a timely fashion to make strategic decisions Tracking the evolution of groups (co m m unitie s) ofusers or resources, finding changes in time andcorrelations develop better personalized recommender systems toimprove user experience
    • 11.  Social data in the Web 2.0 Social associations and all kinds of graphs structures for static social data evolving data representation structures Evolving social data mining Emotion-aware social data analysis Frameworks and Applications
    • 12. The ne two rk m o de las an o bvio us cho ice …Social data are interconnected through associationsforming a networkor graph G(V, E), where Vis the set ofnodes and E is the set of edges. nodes represent entities/objects and edgesrepresent relations different types of nodes and edges weighted/unweighted directed/undirectedSocial associations and all kinds ofgraphs
    • 13. … the multi-graph structuresA hypergraphexample
    • 14. Structures forstatic social data Hypergraph: generalization of a graph where anedge (hype re dg e ) connects more than two nodes[Brinkmeier07] Folksonomy: lightweight knowledge representationemerging from the use of a shared vocabulary tocharacterize resources – tripartite hype rg raph[Hotho06, Mika05] Projection on simple graphs to lower complexity [AuYeung09] further simplifications in bipartite & unipartite g raphs e.g. tag-tag network where two tags are connected ifassigned to the same resource Simple graphs’ structure can be encoded in antripartite graph
    • 15. Folksonomy projections on simplegraphs
    • 16. Evolving data representationstructures Need for modeling the different data states in successive time-steps, often determined by the data’s sampling ratet1 t2 t3 t4 … tk-1 tkG1 G2 G3 G4 … Gk-1 Gkseg1 seg2 … segmthe graph stream asa sequence ofsnapshot graphs: G= {Gt }, t∈ℕGgraphsnapshotstimelinesegmentsgraph stream
    • 17. t1 t2 t3 t4 … tk-1 tkG1 G2 G3 G4 … Gk-1 Gkseg1 seg2 … segmThe snapshot layerGadjacency/similaritymatricesfolksonomiesgraphsnapshotstimelinesegmentsgraph streamthe graph stream asa sequence ofsnapshot graphs: G= {Gt }, t∈ℕ
    • 18. t1 t2 t3 t4 … tk-1 tkG1 G2 G3 G4 … Gk-1 Gkseg1 seg2 … segmThe segment layerGgraphsnapshotstimelinesegmentsgraph streamthe graph stream asa sequence ofsnapshot graphs: G= {Gt }, t∈ℕPre-processing techniqueIdentify graph segments consisting ofsimilar snapshots and compute asmooth graph approximation for eachsegment [Yang09]adjacency/similarity matricesmore coarse-grainedtime-aggregated matrices areused for emphasizing on mostrecent edges [Tong08]tensors: generalization of matrices (> 2dimensions) [Sun06]
    • 19. t1 t2 t3 t4 … tk-1 tkG1 G2 G3 G4 … Gk-1 Gkseg1 seg2 … segmThe streamlayerGgraphsnapshotstimelinesegmentsgraph streamthe graph stream asa sequence ofsnapshot graphs: G= {Gt }, t∈ℕ tensors“multi-graphs” with edgesencoding relations as well astemporal information [Zhao07]time aggregate adjacencymatrices [Tong08]
    • 20. References[Giatsoglou12] M. Giatsoglou, A. Vakali. Capturing Social Data Evolution via Graph Clustering. InIEEE Internet Computing, IEEE computer Society Digital Library. IEEE Computer Society. DOI:10.1109/MIC.2012.24, Feb. 2012.Au Yeung09] Au Yeung, C.M., Gibbins, N., and Shadbolt, N. 2009. Contextualising Tags inCollaborative Tagging Systems. In Proceedings of 20th ACM Conference on Hypertext andHypermedia, pp. 251-260.[Brinkmeier07] Brinkmeier, M., Werner, J., and Recknagel, S. 2007. Communities in graphs andhypergraphs. CIKM07. ACM, 869-872.[Hotho06] Hotho, A., Robert, J., Christoph, S., and Gerd, S. 2006. Emergent Semantics inBibSonomy. GI Jahrestagung Vol. P-94, 305–312. Gesellschaft fr Informatik.[Mika05] Mika, P. 2005. Ontologies Are Us: A Unified Model of Social Networks and Semantics. InProceedings of the 4th international SemanticWeb Conference. ISWC’05. Springer Berlin/Heidelberg, pp. 522-536.[Sun07] Sun, J., Faloutsos, C., Papadimitriou, S., and Yu, P. S. 2007. GraphScope: parameter-freemining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD internationalConference on Knowledge Discovery and Data Mining . KDD 07. ACM, 687-696.[Tong08] Tong, H., Papadimitriou, S., Yu, P. S., and Faloutsos, C. 2008. Proximity tracking on time-evolving bipartite graphs. In Proceedings of the 9th SIAM international Conference on Data Mining .
    • 21. Presentation Outline Social data in the Web 2.0 Social associations and all kinds of graphs Evolving social data mining the clustering approach Applications Emotion-aware social data analysis Frameworks and applications
    • 22. Web 2.0 social data mining : The genericworkflownext, focus on clustering and community detection
    • 23. Evolving social data clustering/community detectionapproachesPreliminary efforts followed the CommunityMapping -CM- approach:1.application of traditional community detection algorithmson individual static graph snapshots2.identification of correlations and mapping betweensuccessive snapshots’ communities using specialsimilarity measures & temporal smoothing techniquesLimitation: tend to find community structures withhigh temporal variation (due to real worldambiguous & noisy data)
    • 24. Evolving social data clustering/community detectionapproachesEvolutionary community identification approaches: utilizecommunity structure’s history to maximize temporalsmoothness and lead to smoother community evolutiontraditional clustering revisited foran evolutionary setting (TCRapproach): modification of existing static clustering algorithms to havememory of previous states of the data;spectral clustering (SC approach): uses the spectrum of the graph’ssimilarity matrix to perform dimensionality reduction for clustering in fewerdimensions;non-negative matrix/tensorfactorization (NFC approach): apply NFC fordiscovering communities jointly maximizing the fit to the observed data &the temporal evolution;graph streamsegment identification & community structure detection
    • 25. CM approachPalla2008undirected graphsnapshotsI. co-authorshipnetworkII. mobilephone callnetworkI. 142 monthsII. 52 weeks(26tim e slo ts)I. {30K authors}{“are co -autho rs ” }II. {4M users} {“calls ”}community detectionbased on cliq uepe rco latio n;successivecommunities mappingby relative nodes’overlap;Lin2007directedgraphsnapshotsdata crawledfrom 407 blogs63 weeks {275K bloggers}{149K “re plie s to po sto f”}hypothesis: use rs’mutualawareness (e . g .blo g g e rs co m m e ntingo n e ach o the r’s blo g )drive s co m m unityfo rm atio n;mutual awareness’expansion as arandom walk processleads to communitydetectionmethod structuredatasetoutcome/scopesource time-period entities / relations
    • 26. TCRapproachChakrabarti2006(time-aware)similaritymatrixphotosharingservice68 weeks {5K tags}{“are applie d o n sam ere so urce ”}jointoptimization of2 criteria:snapshotqualityhistory quality;genericframework with2 instantiationsproposed:agglomerativehierarchicalalgorithmK-meansmethod structuredatasetoutcome/scopesource time-period entities / relationsSimultaneous optimization of two potentially conflicting criteria:(i) snapshot quality sq, and (ii) history quality hqAt each time-step the framework finds a clustering based on the new similarity matrix Mtand the so far historyEvolutionary clustering in an online setting
    • 27. SC approachTang2008series ofnetworksnapshots;interactionmatrix for eachsnapshotI. mailexchangenetworkII. co-authorshipnetworkI. 12monthsII. 25 yearsI. {2.4K users}{emails}{36.7K words}{“se ndse m ail”}{“re ce ive se m ail”}{“co ntainste rm ”}II. {492K papers}{347K authors}{2.8K venues}{9.5K titleterms} {“write s”}{“participate sin”} {“ispublishe d in”}communityevolution indynamic networksof multiple socialentities;iterativeapproximation ofcommunityevolution usingeigenvectorcalculation andK-means clusteringmethodstructure ormodeldatasetoutcome/scopesource period entities / relationsSpectral clustering uses the spectrum of thegraph’s similarity matrix toperform dimensionality reduction for clusteringin fewer dimensionsEvolutionary spectral clustering appliedin multi-mode networks
    • 28. NFC approachLin2009metagraph:hypergraph withnodesrepresentingface ts and edgesmultipartiteinteractions;tensorssocialbookmarking servicewith votingcapabilities27 days(9tim e slo ts){users} {posts} {keywords}{topics}{152K “po st-ke ywo rd-to pic”}{56.4K “is frie nd with”}{44K“uplo ads ”}{1.2M “vo te s o n”}{242K “use r-po st-co m m e nt”}{94.6K “re plie s with”}communityextraction via time-stamped tensorfactorization;on-line methodhandling time-varying relationsthrough incrementalmetagraphfactorization;communities derivedby jointly leveragingall types ofmultipartite relationsmethod structure or modeldatasetoutcome/scopesource period entities / relations
    • 29. SGC approachSun2007unweightedundirectedbipartitegraphs;graphsegmentsI. mailexchangenetworkII. mobile phonecall networkIII. mobiledeviceproximityrecordsI. 165weeksII. 46 weeksIII. 46 weeksI. {34.3K senders}{34.3K recipients}{15K“se nd m ail” / week}II. {96 callers} {3.8Kcallees}{ 430”calls”/ week}III. {96 users } {96 users}{689 “is lo cate dne ar”/week}unparametricmethod basedon MinimumDescriptionLength;eachsegment’ssource anddestinationnodes arepartitionedseparately todecrease cost;compressedgraph in < 4%than originalspacemethod structuredatasetoutcome/scopesource period entities / relations
    • 30. References (1/2)[Chakrabarti06] Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proc. ofKDD 06. ACM, New York, NY, 554-560.[Tang08] Tang, L., Liu, H., Zhang, J., and Nazeri, Z. 2008. Community evolution in dynamic multi-modenetworks. In Proc.. KDD 08. ACM, New York, NY, 677-685.[Palla05] Palla, G., Derény, I., Farkas, I., and Vicsek, T. 2005. Uncovering the overlapping communitystructure of complex networks in nature and society. Nature 435, 814–818.[Palla07] Palla, G., Barabási, A.-L., and Vicsek, T. 2007. Quantifying social group evolution. Nature446, 664-667.[Lin07] Lin, Y., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. L. 2007. Blog Community Discoveryand Evolution Based on Mutual Awareness Expansion. In Proceedings of the IEEE/WIC/ACMinternational Conference on Web intelligence. Web Intelligence. IEEE Computer Society, Washington,DC, 48-56.[Lin09] Lin, Y., Sun, J., Castro, P., Konuru, R., Sundaram, H., and Kelliher, A. 2009. MetaFac:community discovery via relational hypergraph factorization. KDD09. ACM, 527-536.[Sun07] Sun, J., Faloutsos, C., Papadimitriou, S., and Yu, P. S. 2007. GraphScope: parameter-freemining of large time-evolving graphs. In Proc. of KDD 07. ACM,, 687-696.[Quack08] Quack, T., Leibe, B., and Van Gool, L. 2008. World-scale mining of objects and events fromcommunity photo collections. In Proc. of CIVR 08. ACM, 47-56.
    • 31. References (2/2)[Zhao07] Zhao, Q., Mitra, P., and Chen, B. 2007. Temporal and information flow based event detectionfrom social text streams. In Proc. of the 22nd National Conference on Artificial intelligence - Volume 2.Aaai Conference On Artificial Intelligence. AAAI Press, 1501-1506.[Duan09] Duan, D., Li, Y., Jin, Y., and Lu, Z. 2009. Community mining on weighted directed graphs.In Proc. of CNIKM 09. ACM, New York, NY, 11-18.[Chi06] Chi, Y., Tseng, B. L., and Tatemura, J. 2006. Eigen-trend: trend analysis in the blogospherebased on singular value decompositions. In Proc of CIKM 06. ACM, 68-77.[Papadopoulos11] Papadopoulos, S., Zigkolis, C., Kompatsiaris, Y., and Vakali A. 2011. Cluster-BasedLandmark and Event Detection for Tagged Photo Collections. IEEE Multimedia, pp. 52-63.[Java07] Java, A., Song, X., Finin, T., and Tseng, B. 2007. Why we twitter: understanding microbloggingusage and communities. WebKDD/SNA-KDD07. ACM, 56-65.[Jansen,09] Jansen, B. J., Zhang, M., Sobel, K., and Chowdury, A. 2009. Twitter power: Tweets aselectronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60, 11, 2169-2188.[Sankaranarayanan 09] Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., and Sperling,J. 2009. TwitterStand: news in tweets. GIS09. ACM, 42-51.
    • 32. Why focusing on time as acriterion ? typical analysis involves “static”views (users-tags) events, trends affect user interests users Tagging Behavior changesover time Time is a fundamental dimensionin analysis of users and tags in asocial tagging systeme.g. : prediction of first weekend box-office revenues using tweets
    • 33. Many times, a user’s targeted interest ishidden in the general tagging activity….Accessories,bags,fashion,Cars, football,holidays, horses,sea, turkey, fashionNew York, hat,trousers,fashion, Guccianimals, elephants,nature sea, turkey,bagshats,Guccifashion,jeans, NYUser 1 User 2 User 3fashionweek,fashion, silk,wool
    • 34. Time-aware user/tagclustering
    • 35. Related Approaches Sun and colleagues [Sun08] use the χ2statistical model, todetermine whether the appearance of tag t in a tim e fram e iis sig nificant and, thus, to disco ve r tag s that co nstitute“topics of interest" at particular time frames. Wetzker and colleagues [Wetzker08] claim that a tagsignifies a trend, if it attracts significantly more new users ina currently monitored time frame than in past time frames. A trend detection measure is introduced in [Hotho06], whichcaptures topic-specific trends at each time frame and isbased on the weight-spreading ranking of the PageRankalgorithm
    • 36. Use Cases Capturing trends, interests, periodic activitiesof users in specific time periods Community-based tag recommendation Personalization (time-aware user profiles) Fighting spam on social web sites (bydiscriminating regular and occasional users)
    • 37.  Social data in the Web 2.0 Social associations and all kinds of graphs Evolving social data mining In & out zooming on time aware user/tag clusters Emotion-aware social data analysis User generated data The idea for emotion aware clustering Application on micro-blogging sources Frameworks and applications
    • 38. The idea for emotion-awareanalysisClustering methods embed various criteria such as:semantics, tags, time, geo-information ... etcbutsince social sources are driven and managed byhumansemotion/sentimentMUST also be considered …Clustering methods embed various criteria such as:semantics, tags, time, geo-information ... etcbutsince social sources are driven and managed byhumansemotion/sentimentMUST also be considered …
    • 39. The nature of evolving socialdata Two main types of textual information. Facts and Opinions Note: factual statements can imply opinions too. Most current text information processingmethods (e.g., web search, text mining) workwith factual information. Sentiment analysis or opinion mining computational study of opinions, sentiments andemotions expressed in text. Why sentiment analysis now? Mainly becauseof the Web; huge volumes of opinionated text.
    • 40. User-generated data (I)Opinions are important because whenever we need to makea decision, we’re influenced by others’ opinions. According to [Horrigan09] of more than 2000 American adults: 81% of Internet users (or 60% of Americans) have done onlineresearch on a product at least once; among readers of online reviews of restaurants, hotels, andvarious services (e.g., travel agencies or doctors), between 73%and 87% report that reviews had a significant influence on theirpurchase; 32% have provided a rating on a product, service, or person viaan online ratings system, and 30% have posted an onlinecomment or review regarding a product or service
    • 41. user-generated data (II) Word-of-mouth on the Web User-generated media: One can express opinions on anything inreviews, forums, discussion groups, blogs ... Opinions of global scale: No longer limited to: Individuals: one’s circle of friends Businesses: Small scale surveys, tiny focus groups, etc. Why affect/sentiment analysis? Customers: need peer opinions to make purchase decisions Business providers: need customers’ opinions to improve product need to track opinions to make marketing decisions Social researchers: want to know people’s reactions aboutsocial events Government: wants to know people’s reactions to a new policy Psychology, education, etc.
    • 42. More Applications Product review mining: What features of the iPhonedo customers like and which do they dislike? Review classification: Is a review positive or negativetoward the iPhone? Tracking sentiments toward topics overtime: Is angerratcheting up or cooling down? Prediction (election outcomes, market trends): WillClinton or Obama win?
    • 43. Why Extracting sentiments from Web 2.0data sources? Web 2.0 data features Easy to collect: huge amount, clean format Broadly distributed: demographics Topic diversified: free discussion about any topic/product/event Opinion rich: highly personalized Distributed over time, user generated content Motivation Sentiment is a very natural expression of a human being. Sentiment Analysis aims at getting sentiment-related knowledgeespecially from the huge amount of information on the internet Can be generally used to understand opinion in a set ofdocuments or user generated content
    • 44. Challenges Contrasts with Standard Fact-Based Textual Analysis typically, text categorization seeks to classify documents bytopic BUT nature, strength of feelings, degree of positivity, etcimposes a tailored sentiment categorization Key Factors that Make Sentiment Analysis challenging choosing the right set of keywords might be less trivial thanone might initially think; Sentiment and subjectivity are quite context-sensitive, and, ata coarser granularity, quite domain dependent e.g., “go read the book” most likely indicates positive sentiment forbook reviews, but negative sentiment for movie reviews. Web users postings are of a challenging nature, since there isno code in expressions
    • 45. so far … Lexical Resources Se ntiWo rdne t Built on the top of WordNet synsets Attaches sentiment-related information with synsets SentiWordNet assigns to each synset of WordNet threesentiment scores: positivity, negativity, objectivity Ge ne ralInq uire r Included are manually-classified terms labeled with varioustypes of positive or negative semantic orientation, andwords having to do with agreement or disagreement. O pinio nFinde r’s Subje ctivity Le xico n OpinionFinder is a system that performs subje ctivityanalysis , automatically identifying when opinions,sentiments, speculations, and other private state s arepresent in text.
    • 46. so far … Lydia System Lydia [Lioyd05] news analysis system does adaily analysis of over 1000+ online Englishnewspapers, Blogs, RSS feeds, and other newssources. It identifies who is being talked about, by whom,when and where? Applications of Lydia heatmap generation (pos/negfor a topic); relational networks
    • 47. so far … integration of news &blogs Bautin, Vijayarenu and Skiena [Bautin08]presented an approach for the internationalanalysis for news and blogs … still on thepositive/negative side … Cross-language analysis across news streamsPolarity score of London in Arabic,German, Italian and Spanish over the May1-10, 2007 period.
    • 48. so far … sentiment-awaresearching Sentiment Analysis forSemantic Enrichment ofWeb Search Results[Demartini10] the first few results arepresentative sample ofthe entire result setAverage Sentiment score in top N results for 3 search engines
    • 49. so far … beyond pos/neg : the affectanalysis it involves several affects at the sametime. affect classes may be correlated oropposed. Abbasi, Chen Thoms and Fu[Abbasi08] proposed a support vectorregression correlation ensemble(SVRCE) method for text-based affectclassification. affect feature and techniquecomparison. apply to multi-domain.I cannot stand you!hate angryI cannot standyou!Affect class IntensityHappiness 0.01Sadness 0.03Anger 0.6Hate 0.5
    • 50.  Social data in the Web 2.0 Social associations and all kinds of graphs Evolving social data mining In & out zooming on time aware user/tag clusters Emotion-aware social data analysis Frameworks and applications Most popular applications A mining and analysis framework Social data analysis on the cloud Emotions capturing in microblogs
    • 51. clustering of usersexploiting the timedimensionApplications of Mining Evolving SocialDataThe results of community detection, or different miningtechniques, on evolving social data can be exploited inapplications:social network analysisim ag e from [Touchgraph]event detectiondiag ram fro m [Sun07]trend detectionim a g e from [Trendsmap][Touchgraph] http: //www. to uchg raph. co m /TG Face bo o kBro wse r. htm l[Trendsmap] http: //tre ndsm ap. co m /
    • 52. Event detectionGraph segmentationcommunity detectionmethods [Sun07, Duan09]identify events assignificant change-timepoints in thestream.Definition of eventinformation flow between a group of socialactors on a specific topic over a certain timeperiod [Zhao07]occasions which take place at a specifictime and location (concerts, festivals, etc.)[Quack08]Event & landmarkdetection from geo-taggeddata [Papadopoulos11]detection of image clusters using visual & tag-based similaritiesclassification of clusters as events or landmarks Landmarks differ from events on the number ofusers tagging photos over different timegranularities exploitation of similarities with disjoint sets oflandmark- and event- related tagsevent detection fromsocial datastreams [Zhao07]features exploration in 3dimensions: textual content, social,temporalgeneration of multiple intermediateclustering structures using content-based similarity & information flowpatternsevents as groups of nodes closelyrelated in time & topicevent detection from social datastreams [Zhao07]features exploration in 3dimensions: textual content, social,temporalgeneration of multiple intermediateclustering structures using content-based similarity & information flowpatternsevents as groups of nodes closelyrelated in time & topic
    • 53. Trend tracking Social data fluctuate in structure and frequency as they evolve and over time sometopics, images, tags, etc, become most popular amongst users.Trends can be identified by adata mining approach g lo bally or lo cally (within communities) and they usuallyindicate what interests users the most at a given time.trend detection in blogs [Chi06] focused on: keyword popularity in successivetimeframes detection of different topics relating to akeyword contribution of individual users to a trend uses the results of SingularValueDecomposition as trend indicatorscapturing both temporal datachanges & bloggers’ characteristics exploits textual content and citationsbetween blogstrend detection in Twitter[Java07],[Jansen09],[Sankaranarayanan09] several attempts using statisticalanalysis methods analysis of evolving Twitter data toidentify trending keywords fordifferent weekdays sentiment identification on tweetsto identify trending sentimentsabout brands online clustering on streamingtweets, combined withclassification, to identify breakingnewstrend detection in Twitter[Java07],[Jansen09],[Sankaranarayanan09] several attempts using statisticalanalysis methods analysis of evolving Twitter data toidentify trending keywords fordifferent weekdays sentiment identification on tweetsto identify trending sentimentsabout brands online clustering on streamingtweets, combined withclassification, to identify breakingnews
    • 54. A mining and analysis framework
    • 55. 2 indicative applicatio ns Cloud4TrendsLeveraging the cloud infrastructure forlocalized real-time trend detection in socialmedia CapturEmosCapturing emotional patterns in micro-blogging data streamsAristotle University, OSWINDS Group
    • 56. Cloud4Trends is a microblogging & blogging localized contentcollection and analysis frameworkfor detecting currently populartopics of users’ interestCloud4Trends is a microblogging & blogging localized contentcollection and analysis frameworkfor detecting currently populartopics of users’ interest Social media reflect societal concerns exhibiting‘bursts’ of content generation on the occurrence ofevents po pular to pics / inte re sts fluctuate with tim e Challenging for both computer scientists &application developers to reach unbiased,meaningful conclusions about trendingusers’opinion and interestsCloud4Trends - Motivation
    • 57. Challenges & Outcome Massive content sizes andunpredictable content generationrates : scalable analysis is needed Trending topics should bediscovered when they are “fresh” : an on-line analysis approach isdemanded Trends should be meaningful need for contextual trends Content is dispersed in multiplesources : trend detection needs a combinedapproachThe Cloud deployment ofthe Cloud4Trendsscenario with use of theVENUS-C servicesverified that Cloud-basedarchitectures are a viablesolution for online webdata mining applicationsthat are beneficial forboth researchers andentrepreneurs.
    • 58. http://clo ud4tre ndsde m o . clo udapp. ne t
    • 59. Emotional Aware Clustering on Micro-Blogging Sources (affect analysis)K. Tsagkalidou, V. Koutsonikola, A.Vakali and. K. Kafetsios: EmotionalAware Clustering on Micro-BloggingSources, accepted for publication,Affective Computing and IntelligentInteraction 2011
    • 60. our emotional dictionary create an extended emotional dictionary byenriching an opinion lexicon provided by theUMBC university with synonymous words fromWordNet
    • 61. The used affect space representing the extreme ends of four emotional pairs[Gill08] emotion exemplar words
    • 62. Relevance between tweet andemotion Semantic similarity the maximum semantic similarity between eachof the tweet’s wo rds and the emotion’srepresentatives, as defined in Wordnet Sentiment similarity expresses a word’s emotional intensity as definedin the extended dictionary Overall similarity between a tweet & anemotion Σi=1…|words|Sem(wi,emotion)*Sent(wi)|wi|: Sent(wi)≠0, for 1<=i<=|words|
    • 63. • capturing and understanding crowd’s emotionsfor a particular topic or product in an implicitmanner via computational methods.• se ntim e nt analysis & m icro blo g g ing (statistical)pro ce ssing : emphasis on affective and opinionmining, lexicon-based processing, knowledgeextraction techniques;• de ve lo pm e nt o f applicatio ns: web applicationenhanced with of crowds emotions visualizationcapabilities.CapturEmos- MotivationCapturEmos- Motivation
    • 64. The application/serviceuseful for capturing brandingsuccess & diffusion in themarket, as expressed by thecrowds emotionsOur innovation principle :focus on the “affect” which isdistinguished from discreteemotions.discrete emotions : concernaffective reactions in relationto one’s goalsaffect refers to : anoverarching positive ornegative valence of one’sfeelings.
    • 65. Niche targeted Marketspolitical stakeholders, public authorities (e.g. municipalities),consumer behavior policy makers, chambers of commerce,tourism and infotainment sectors…markets characteristics : emerging, unpredicted bursts, multi-profilesChallenges : multi-lingual support; privacy and anonymitypreservation;real-time emerging data flows; time and space complexities;Markets & challengesproposed framework and applications :• address wide stakeholders and markets audiences;• certain tasks can be realized (e.g. capturing branding success &diffusion in the market) expressed by the crowds emotions;• can support policy and decision making.
    • 66. Future work and horizon ... emerging, and unpredicted bursts detectionsin evolving social media; user multi-profiles patterns; support applications with multi-lingual support; privacy and anonymity preservation development of intelligent and collectiveinformation retrieval techniques are requiredand well expected.
    • 67. References[Horrigan09] J. A. Horrigan, “Online shopping,” Pew Internet & American Life Project Report, 2008.[comScore07] comScore/the Kelsey group, “Online consumer-generated reviews have significant impact on offlinepurchase behavior,” Press Release,, November 2007.[Lioyd05] Lloyd, L., Kechagias, D., Skiena, S.: Lydia: A system for large-scale news analysis. In: String Processingand Information Retrieval (SPIRE 2005). Volume Lecture Notes in Computer Science, 3772. (2005) 161-166[Bautin08] M. Bautin, L. Vijayarenu, S. Skiena : International sentiment analysis for News and Blogs, In Proceedingsof ICWSM (2008)[Demartini10] G. Demartini and S. Siersdorfer: Dear Search Engine: What’s youropinion about...? Sentiment Analysisfor Semantic Enrichment of Web Search Results, In Proc. Of WWW201 0, April 2630, 2010[Abbasi08] A. Abbasi, H. Chen, S. Thoms, T. Fu: Affect Analysis of Web Forums and Blogs Using CorrelationEnsembles, IEEE Transactions on Knowledge and Data Engineering (2008) Vol. 20, Issue 9[OM] Opinion Mining and Sentiment Analysis: NLP Meets Social Sciences, Bing Liu Department of Computer ScienceUniversity Of Illinois at Chicago[Sun 08] Sun A, Zeng D, Li H, Zheng X (2008) Discovering trends in collaborative tagging systems. In: Proceedings ofthe IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics,Springer, pp 377-383[Wetzker08] Wetzker R, Plumbaum T, Korth A, Bauckhage C, Alpcan T, Metze F (2008a) Detecting trends in socialbookmarking systems using a probabilistic generative model and smoothing. In: Proceedings of 19thInternational Conference on Pattern Recognition (ICPR 2008), IEEE, pp 1-4[Hotho06] Hotho A, Jaschke R, Schmitz C, Stumme G (2006a) Information retrieval in folksonomies: Search andranking. In: Proceedings of the 3rd European Semantic Web Conference, Springer, Budva, Montenegro, LNCS,vol 4011, pp 411-426[Gill08] Gill, A.J., French R.M., Gergle, D., and Oberlander, J.: Identifying Emotional Characteristics from Short BlogTexts. Proc. of the 30th Annual Conf. of the Cognitive Science Society, Washington DC (2008) 2237–2242