Introduction to Computational Social ScienceTalha OZ
This document provides an introduction to computational social science through summaries of key concepts and examples. It discusses three main challenges in computational social science: computational modeling of complex social phenomena; analysis of large social data sets from sources like cell phones and social media; and virtual lab-style social experiments. It also summarizes approaches like agent-based modeling, where autonomous agents interact and adapt according to rules, and contrasts this with traditional modeling approaches. Examples of computational social science topics and tools are given, such as social network analysis, geospatial analysis, and machine learning. The document advocates for agent-based modeling to flexibly capture social dynamics in a way that mathematical models cannot.
The emerging field of computational social science (CSS) is devoted to the pursuit of interdisciplinary social science research from an information processing perspective, through the medium of advanced computing and information technologies.
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...Lauri Eloranta
Final lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Lauri Eloranta
Seventh lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...Lauri Eloranta
Sixth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Complex Social Systems - Lecture 5 in Introduction to Computational Social Sc...Lauri Eloranta
The document outlines the schedule and topics for a series of lectures on computational social science, including introductions to the topics of complex social systems, complexity theory, and complex adaptive systems. It provides background definitions and concepts regarding these topics, discussing ideas like social complexity, emergence in complex systems, and key properties of complex adaptive systems.
Introduction to Computational Social Science - Lecture 1Lauri Eloranta
1. The document provides an overview of the first lecture in an Introduction to Computational Social Science course. 2. It defines computational social science and discusses its main areas which include big data, social networks, social complexity, and simulation. 3. The lecture also explores some examples of computational social science research such as modeling the spread of disease, tracking news and meme propagation online, and simulating water demand.
Introduction to Computational Social ScienceTalha OZ
This document provides an introduction to computational social science through summaries of key concepts and examples. It discusses three main challenges in computational social science: computational modeling of complex social phenomena; analysis of large social data sets from sources like cell phones and social media; and virtual lab-style social experiments. It also summarizes approaches like agent-based modeling, where autonomous agents interact and adapt according to rules, and contrasts this with traditional modeling approaches. Examples of computational social science topics and tools are given, such as social network analysis, geospatial analysis, and machine learning. The document advocates for agent-based modeling to flexibly capture social dynamics in a way that mathematical models cannot.
The emerging field of computational social science (CSS) is devoted to the pursuit of interdisciplinary social science research from an information processing perspective, through the medium of advanced computing and information technologies.
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...Lauri Eloranta
Final lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Lauri Eloranta
Seventh lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...Lauri Eloranta
Sixth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Complex Social Systems - Lecture 5 in Introduction to Computational Social Sc...Lauri Eloranta
The document outlines the schedule and topics for a series of lectures on computational social science, including introductions to the topics of complex social systems, complexity theory, and complex adaptive systems. It provides background definitions and concepts regarding these topics, discussing ideas like social complexity, emergence in complex systems, and key properties of complex adaptive systems.
Introduction to Computational Social Science - Lecture 1Lauri Eloranta
1. The document provides an overview of the first lecture in an Introduction to Computational Social Science course. 2. It defines computational social science and discusses its main areas which include big data, social networks, social complexity, and simulation. 3. The lecture also explores some examples of computational social science research such as modeling the spread of disease, tracking news and meme propagation online, and simulating water demand.
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
Fourth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
10 More than a Pretty Picture: Visual Thinking in Network Studiesdnac
Visualization has been important in network science since its beginnings to make invisible structures visible. While metrics can describe networks, visualizations allow researchers to see relationships and patterns across multiple dimensions that numbers alone cannot reveal. Effective network visualizations communicate insights that would be difficult to understand otherwise, by depicting global patterns and local details simultaneously in a way that builds intuition about the network's structure and generating processes. However, challenges include lack of consistent display frameworks, integrating too much multidimensional information, and issues of scale for large and dynamic networks.
This is a colloquium that I presented on 4/22/21: Stockholm University, Nordic Institute for Theoretical Physics (NORDITA), WINQ–AlbaNova Colloquium
Here is a video of my talk: http://video.albanova.se/ALBANOVA20210422/video.mp4
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern's English Department gave a talk on Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Julia discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.
For more see: http://informatics.mit.edu/event/brown-bag-jobs-roles-skills-tools-working-digital-academy-julia-flanders
13 An Introduction to Stochastic Actor-Oriented Models (aka SIENA)dnac
This document provides an introduction to Stochastic Actor-Oriented Models (SAOMs), also known as SIENA models. It discusses when SAOMs are appropriate to use, provides an overview of the general SAOM form, and covers key components like the network and behavior objective functions and rate functions. The presentation also outlines how SAOMs are estimated and fitted to data, provides an empirical example, and discusses extensions. SAOMs model how networks and behaviors change over time as actors make micro-level decisions to maximize their objective functions.
Social Network Analysis: applications for education researchChristian Bokhove
What is your talk about?
This seminar will illustrate various social network analysis (SNA) techniques and measures and their applications to research problems in education. These applications will be illustrated from our own research utilising a range of SNA techniques.
What are the key messages of your talk?
We will cover some of the ways in which network data can be collected and utilised with other research data to examine the relationships between network measures and other attributes of individuals and organisations, and how it can be linked to other approaches in multiple methods studies.
What are the implications for practice or research from your talk?
SNA is an approach that draws from theories of social capital to study the relational ties that exist between actors or institutions in a specific context. Such ties might include learning exchanges or advice-seeking interactions. SNA techniques allow researchers to incorporate the interdependence of participants within their research questions, whereas many traditional techniques assume our participants, and their responses to our questions, are independent of one another.
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
This talk, prepared for the MIT Program on Information Science, and updating a talk at the National Academies workshop on reproducibility, frames reproducibility from an informatics perspective
01 Introduction to Networks Methods and Measuresdnac
This document provides an introduction to social network analysis. It discusses how networks matter through two fundamental mechanisms: connections and positions. Connections refer to the flow of things through networks, viewing networks as pipes. Positions refer to relational patterns and networks capturing role behavior, viewing networks as roles. The document also covers basic network data structures including nodes, edges, directed/undirected ties, binary/valued ties, and different levels of analysis such as ego networks and complete networks. It provides examples of one-mode and two-mode network data.
Introduction to Topological Data AnalysisMason Porter
Here are slides for my 3/14/21 talk on an introduction to topological data analysis.
This is the first talk in our Short Course on topological data analysis at the 2021 American Physical Society (APS) March Meeting: https://march.aps.org/program/dsoft/gsnp-short-course-introduction-to-topological-data-analysis/
This document provides an overview of community detection in networks. It begins with an introduction to the concept of communities and their usefulness in network analysis. It then discusses two main approaches to calculating communities - descriptive methods like modularity, and generative methods like stochastic block models. The document notes that community detection is an active area of research, with opportunities to extend current methods. It provides several examples of community detection applications and acknowledges contributions from other researchers in the field.
Presentation given at the HEA Social Sciences learning and teaching summit 'Exploring the implications of ‘the era of big data’ for learning and teaching'.
A blog post outlining the issues discussed at the summit is available via: http://bit.ly/1lCBUIB
Using Data Integration Modelsfor Understanding Complex Social SystemsBruce Edmonds
Describing the use of complex, descriptive simulations to integrate the maximum amount of evidence in a staged manner. With an example from the SCID project (http://www.scid-project.org).
Paper Writing in Applied Mathematics (slightly updated slides)Mason Porter
Here are my slides (which I have updated very slightly) in writing papers in applied mathematics.
There will be an accompanying oral presentation and discussion on Friday 20 April. I am recording the video for that and plan to post it along with these (or a further updated version of these) slides.
This document discusses network data collection. It begins by providing examples of how social structure matters and influences outcomes. It then discusses different ways to detect social structure through network data collection, including small group questionnaires, large surveys, observations, and digital data scraping. The document outlines key network questions that can shape data collection, such as how networks form and their consequences. It also discusses sampling and defining network boundaries. Overall, the document provides an overview of network data collection methods and considerations.
How to conduct a social network analysis: A tool for empowering teams and wor...Jeromy Anglim
Slides and details available at: http://jeromyanglim.blogspot.com/2009/10/how-to-conduct-social-network-analysis.html
A talk on using social network analysis as a team development tool.
Who creates trends in online social mediaAmir Razmjou
The document discusses and compares two paradigms for how trends spread in online social media: the influential hypothesis and the crowd hypothesis. The influential hypothesis suggests that a small number of "opinion leaders" drive the adoption of new trends, acting as intermediaries between the media and the public. However, the document argues this view is outdated and does not apply well to today's internet era. Instead, evidence shows the crowd's early participation in spreading ideas can lead to widespread diffusion, rather than domination by influential elites. Analysis of slang word adoption over time supports the crowd hypothesis, finding ordinary users with around 150-300 followers play a key role in trends going viral.
Everything is connected: people, information, events and places. A practical way of making sense of the tangle of connections is to analyze them as networks. The objective of this workshop is to introduce the essential concepts of Social Network Analysis (SNA). It also seeks to show how SNA may help organizations unlock and mobilize these informal networks in order to achieve sustainable strategic goals. After discussing the essential concepts in theory of SNA, the computational tools for modeling and analysis of social networks will also be introduced in this presentation.
This is my attempt at an introduction to data ethics for mathematicians. Mathematicians increasingly need to deal with these kinds of issues, but we don't have the tradition of ethics training from other disciplines.
I welcome comments on how to improve these slides. Did I miss any salient points? Do you want to offer a different perspective on any of these? Do you want to offer any counterpoints? (Please e-mail me directly with comments and suggestions.)
Eventually, I hope to develop these slides further into an article for a venue aimed at mathematical scientists, and of course I would love to have knowledgeable coauthors who can offer a different perspective from mine.
The document discusses network diffusion and peer influence. It covers compartmental models of diffusion, how network structure affects diffusion through factors like distance, clustering, and highly connected nodes. Simulation studies show networks with shorter path distances, more independent paths between nodes, and higher clustering coefficients diffuse ideas and behaviors more quickly. The regression analysis finds these network structural characteristics strongly predict a network's relative diffusion ratio compared to random networks.
These slides are for my talk for the Somerville College Mathematics Reunion ("Somerville Maths Reunion", 6/24/17): http://www.some.ox.ac.uk/event/somerville-maths-reunion/
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...Academia Sinica
VoIP playout buffer dimensioning has long been a challeng- ing optimization problem, as the buffer size must maintain a balance between conversational interactivity and speech quality. The conversational quality may be affected by a number of factors, some of which may change over time. Although a great deal of research effort has been expended in trying to solve the problem, how the research results are applied in practice is unclear.
In this paper, we investigate the playout buffer dimension- ing algorithms applied in three popular VoIP applications, namely, Skype, Google Talk, and MSN Messenger. We conduct experiments to assess how the applications adjust their playout buffer sizes. Using an objective QoE (Quality of Experience) metric, we show that Google Talk and MSN Messenger do not adjust their respective buffer sizes appropriately, while Skype does not adjust its buffer at all. In other words, they could provide better QoE to users by improving their buffer dimensioning algorithms. Moreover, none of the applications adapts its buffer size to the network loss rate, which should also be considered to ensure optimal QoE provisioning.
Quantifying User Satisfaction in Mobile Cloud GamesAcademia Sinica
We conduct real experiments to quantify user satisfaction in mobile cloud games using a real cloud gaming system built on the open-sourced GamingAnywhere. We share our experiences in porting GamingAnywhere client to Android OS and perform extensive experiments on both the mobile and desktop clients. The experiment results reveal several new insights: (1) gamers are more satisfied with the graphics quality on mobile devices, while they are more satisfied with the control quality on desktops, (2) the bitrate, frame rate, and network delay significantly affect the graphics and smoothness quality, and (3) the control quality only depends on the client type (mobile versus desktop). To the best of our knowledge, such user studies have never been done in the literature.
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
Fourth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
10 More than a Pretty Picture: Visual Thinking in Network Studiesdnac
Visualization has been important in network science since its beginnings to make invisible structures visible. While metrics can describe networks, visualizations allow researchers to see relationships and patterns across multiple dimensions that numbers alone cannot reveal. Effective network visualizations communicate insights that would be difficult to understand otherwise, by depicting global patterns and local details simultaneously in a way that builds intuition about the network's structure and generating processes. However, challenges include lack of consistent display frameworks, integrating too much multidimensional information, and issues of scale for large and dynamic networks.
This is a colloquium that I presented on 4/22/21: Stockholm University, Nordic Institute for Theoretical Physics (NORDITA), WINQ–AlbaNova Colloquium
Here is a video of my talk: http://video.albanova.se/ALBANOVA20210422/video.mp4
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern's English Department gave a talk on Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Julia discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.
For more see: http://informatics.mit.edu/event/brown-bag-jobs-roles-skills-tools-working-digital-academy-julia-flanders
13 An Introduction to Stochastic Actor-Oriented Models (aka SIENA)dnac
This document provides an introduction to Stochastic Actor-Oriented Models (SAOMs), also known as SIENA models. It discusses when SAOMs are appropriate to use, provides an overview of the general SAOM form, and covers key components like the network and behavior objective functions and rate functions. The presentation also outlines how SAOMs are estimated and fitted to data, provides an empirical example, and discusses extensions. SAOMs model how networks and behaviors change over time as actors make micro-level decisions to maximize their objective functions.
Social Network Analysis: applications for education researchChristian Bokhove
What is your talk about?
This seminar will illustrate various social network analysis (SNA) techniques and measures and their applications to research problems in education. These applications will be illustrated from our own research utilising a range of SNA techniques.
What are the key messages of your talk?
We will cover some of the ways in which network data can be collected and utilised with other research data to examine the relationships between network measures and other attributes of individuals and organisations, and how it can be linked to other approaches in multiple methods studies.
What are the implications for practice or research from your talk?
SNA is an approach that draws from theories of social capital to study the relational ties that exist between actors or institutions in a specific context. Such ties might include learning exchanges or advice-seeking interactions. SNA techniques allow researchers to incorporate the interdependence of participants within their research questions, whereas many traditional techniques assume our participants, and their responses to our questions, are independent of one another.
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
This talk, prepared for the MIT Program on Information Science, and updating a talk at the National Academies workshop on reproducibility, frames reproducibility from an informatics perspective
01 Introduction to Networks Methods and Measuresdnac
This document provides an introduction to social network analysis. It discusses how networks matter through two fundamental mechanisms: connections and positions. Connections refer to the flow of things through networks, viewing networks as pipes. Positions refer to relational patterns and networks capturing role behavior, viewing networks as roles. The document also covers basic network data structures including nodes, edges, directed/undirected ties, binary/valued ties, and different levels of analysis such as ego networks and complete networks. It provides examples of one-mode and two-mode network data.
Introduction to Topological Data AnalysisMason Porter
Here are slides for my 3/14/21 talk on an introduction to topological data analysis.
This is the first talk in our Short Course on topological data analysis at the 2021 American Physical Society (APS) March Meeting: https://march.aps.org/program/dsoft/gsnp-short-course-introduction-to-topological-data-analysis/
This document provides an overview of community detection in networks. It begins with an introduction to the concept of communities and their usefulness in network analysis. It then discusses two main approaches to calculating communities - descriptive methods like modularity, and generative methods like stochastic block models. The document notes that community detection is an active area of research, with opportunities to extend current methods. It provides several examples of community detection applications and acknowledges contributions from other researchers in the field.
Presentation given at the HEA Social Sciences learning and teaching summit 'Exploring the implications of ‘the era of big data’ for learning and teaching'.
A blog post outlining the issues discussed at the summit is available via: http://bit.ly/1lCBUIB
Using Data Integration Modelsfor Understanding Complex Social SystemsBruce Edmonds
Describing the use of complex, descriptive simulations to integrate the maximum amount of evidence in a staged manner. With an example from the SCID project (http://www.scid-project.org).
Paper Writing in Applied Mathematics (slightly updated slides)Mason Porter
Here are my slides (which I have updated very slightly) in writing papers in applied mathematics.
There will be an accompanying oral presentation and discussion on Friday 20 April. I am recording the video for that and plan to post it along with these (or a further updated version of these) slides.
This document discusses network data collection. It begins by providing examples of how social structure matters and influences outcomes. It then discusses different ways to detect social structure through network data collection, including small group questionnaires, large surveys, observations, and digital data scraping. The document outlines key network questions that can shape data collection, such as how networks form and their consequences. It also discusses sampling and defining network boundaries. Overall, the document provides an overview of network data collection methods and considerations.
How to conduct a social network analysis: A tool for empowering teams and wor...Jeromy Anglim
Slides and details available at: http://jeromyanglim.blogspot.com/2009/10/how-to-conduct-social-network-analysis.html
A talk on using social network analysis as a team development tool.
Who creates trends in online social mediaAmir Razmjou
The document discusses and compares two paradigms for how trends spread in online social media: the influential hypothesis and the crowd hypothesis. The influential hypothesis suggests that a small number of "opinion leaders" drive the adoption of new trends, acting as intermediaries between the media and the public. However, the document argues this view is outdated and does not apply well to today's internet era. Instead, evidence shows the crowd's early participation in spreading ideas can lead to widespread diffusion, rather than domination by influential elites. Analysis of slang word adoption over time supports the crowd hypothesis, finding ordinary users with around 150-300 followers play a key role in trends going viral.
Everything is connected: people, information, events and places. A practical way of making sense of the tangle of connections is to analyze them as networks. The objective of this workshop is to introduce the essential concepts of Social Network Analysis (SNA). It also seeks to show how SNA may help organizations unlock and mobilize these informal networks in order to achieve sustainable strategic goals. After discussing the essential concepts in theory of SNA, the computational tools for modeling and analysis of social networks will also be introduced in this presentation.
This is my attempt at an introduction to data ethics for mathematicians. Mathematicians increasingly need to deal with these kinds of issues, but we don't have the tradition of ethics training from other disciplines.
I welcome comments on how to improve these slides. Did I miss any salient points? Do you want to offer a different perspective on any of these? Do you want to offer any counterpoints? (Please e-mail me directly with comments and suggestions.)
Eventually, I hope to develop these slides further into an article for a venue aimed at mathematical scientists, and of course I would love to have knowledgeable coauthors who can offer a different perspective from mine.
The document discusses network diffusion and peer influence. It covers compartmental models of diffusion, how network structure affects diffusion through factors like distance, clustering, and highly connected nodes. Simulation studies show networks with shorter path distances, more independent paths between nodes, and higher clustering coefficients diffuse ideas and behaviors more quickly. The regression analysis finds these network structural characteristics strongly predict a network's relative diffusion ratio compared to random networks.
These slides are for my talk for the Somerville College Mathematics Reunion ("Somerville Maths Reunion", 6/24/17): http://www.some.ox.ac.uk/event/somerville-maths-reunion/
An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google ...Academia Sinica
VoIP playout buffer dimensioning has long been a challeng- ing optimization problem, as the buffer size must maintain a balance between conversational interactivity and speech quality. The conversational quality may be affected by a number of factors, some of which may change over time. Although a great deal of research effort has been expended in trying to solve the problem, how the research results are applied in practice is unclear.
In this paper, we investigate the playout buffer dimension- ing algorithms applied in three popular VoIP applications, namely, Skype, Google Talk, and MSN Messenger. We conduct experiments to assess how the applications adjust their playout buffer sizes. Using an objective QoE (Quality of Experience) metric, we show that Google Talk and MSN Messenger do not adjust their respective buffer sizes appropriately, while Skype does not adjust its buffer at all. In other words, they could provide better QoE to users by improving their buffer dimensioning algorithms. Moreover, none of the applications adapts its buffer size to the network loss rate, which should also be considered to ensure optimal QoE provisioning.
Quantifying User Satisfaction in Mobile Cloud GamesAcademia Sinica
We conduct real experiments to quantify user satisfaction in mobile cloud games using a real cloud gaming system built on the open-sourced GamingAnywhere. We share our experiences in porting GamingAnywhere client to Android OS and perform extensive experiments on both the mobile and desktop clients. The experiment results reveal several new insights: (1) gamers are more satisfied with the graphics quality on mobile devices, while they are more satisfied with the control quality on desktops, (2) the bitrate, frame rate, and network delay significantly affect the graphics and smoothness quality, and (3) the control quality only depends on the client type (mobile versus desktop). To the best of our knowledge, such user studies have never been done in the literature.
Games on demand, a.k.a., cloud gaming, refers to a new way to deliver computer games to users, where computationally complex games are executed and rendered on powerful cloud servers rather than local computing devices. In this talk, I will give an overview of the challenges in developing cloud gaming systems, what we have done, and what remains to do. I will start from GamingAnywhere, an open-source cloud gaming system, followed by a number of studies based on the system. Finally I will conclude the talk with open issues in providing highly real-time and high-definition audio/visual quality multimedia experience (e.g., in the form of gaming and virtual reality).
Cloud Gaming Onward: Research Opportunities and OutlookAcademia Sinica
Cloud gaming has become increasingly more popular in the academia and the industry, evident by the large numbers of related research papers and startup companies. Some public cloud gaming services have attracted hundreds of thousands subscribers, demonstrating the initial success of cloud gaming services. Pushing the cloud gaming services forward, however, faces various challenges, which open up many research opportunities. In this paper, we share our views on the future cloud gaming research, and point out several research problems spanning over a wide spectrum of different directions: including distributed systems, video codecs, virtualization, human-computer interaction, quality of experience, resource allocation, and dynamic adaptation. Solving these research problems will allow service providers to offer high-quality cloud gaming services yet remain profitable, which in turn results in even more successful cloud gaming eco-environment. In addition, we believe there will be many more novel ideas to capitalize the abundant and elastic cloud resources for better gaming experience, and we will see these ideas and associated challenges in the years to come.
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Lauri Eloranta
Third lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your ...Sheng-Wei (Kuan-Ta) Chen
This document summarizes a presentation about building crowdmining services for data mining research. It discusses using crowdsourcing to generate annotations, evaluate data, and assist with retrieval tasks. Common ways crowdsourcing has been used in data mining include image tagging, data entry, answering questions, and assessing image semantics or photo orientation. However, the presenter argues that crowdsourcing is not just conducting user studies, analyzing user generated content, or using Mechanical Turk. The presentation also describes an implicit crowdmining platform called Pomics that turns photos into comic strips to share experiences.
The document discusses computational social science and three common approaches: macroscopes, virtual labs, and empirical modeling. It provides examples of each approach, including using large-scale Facebook and Twitter data to study language patterns and personality (macroscope), manipulating Facebook data to study emotional contagion and voter turnout (virtual lab), and using Twitter data to predict county-level health outcomes (empirical modeling). Overall, the document outlines how new sources of big data allow computational social science to address challenges of studying social phenomena at large scales.
Social media and public health misinformation
The document discusses how social media acts as a platform for spreading information, beliefs, and behaviors. It summarizes research showing:
1) Anti-vaccine videos are more prevalent and easier to access than pro-vaccine videos on YouTube. Videos with more dislikes are more likely to be pro-vaccine.
2) The YouTube recommendation network makes anti-vaccine views more accessible over time.
3) Hostility online may reinforce distinct "in-groups" and "out-groups" rather than change views, highlighting the need for respectful discussion.
The document advocates using social influence through consensus building and anonymous discussion to counter health misinformation online.
Data Science is an interdisciplinary approach that combines computational science, statistics, and domain knowledge to extract meaningful insights from large and complex data. It aims to address challenges posed by the data revolution characterized by big data from diverse sources. There is no single agreed upon definition, but most definitions emphasize applying techniques from computer science, statistics, and the relevant domain area to discover patterns, make predictions, and support decision making from data. Key aspects include developing appropriate methodologies for knowledge discovery, forecasting and decisions using large and diverse data from sources like surveys, social media, sensors and more. The integration of domain knowledge representation with computational and statistical tools is seen as an important novelty that can enhance data analysis and interpretation.
OII Summer Doctoral Programme 2010: Global brain by Meyer & SchroederEric Meyer
The document discusses how technology is driving research to become more collaborative globally through distributed and networked tools. It examines several case studies where technologies enabled large-scale collaborative research projects that addressed questions too big for individual labs. These include distributed computing for particle physics, genomic studies, and proteomics. Challenges discussed include interoperability, data sharing policies, and sustaining momentum in infrastructure.
National Academies Workshop on Big Data and Analysis for Infectious Disease Research, Operations and Policy.. Pan American Health Organization, Washington, DC (May 11 2016)
This document discusses the responsible use of data science techniques and technologies. It describes data science as answering questions using large, noisy, and heterogeneous datasets that were collected for unrelated purposes. It raises concerns about the irresponsible use of data science, such as algorithms amplifying biases in data. The work of the DataLab group at the University of Washington is presented, which aims to address these issues by developing techniques to balance predictive accuracy with fairness, increase data sharing while protecting privacy, and ensure transparency in datasets and methods.
Ohio Center of Excellence in Knowledge-Enabled Computing at Wright State (Kno.e.sis)
Center overview: http://bit.ly/coe-k
Invitation: http://bit.ly/COE-invite
Presentation at "Strategies for managing social media research data", Feb 12, 2016. Cambridge. http://www.data.cam.ac.uk/events/strategies-managing-social-media-research-data
This paper looks at the problem of privacy in the context
of Online Social Networks (OSNs). In particular, it examines the predictability of different types of personal information based on OSN data and compares it to the perceptions of users about the disclosure of their information. To this end, a real life dataset is composed. This consists of the Facebook data (images, posts and likes) of 170 people along with
their replies to a survey that addresses both their personal information, as well as their perceptions about the sensitivity and the predictability of different types of information. Importantly, we evaluate several learning techniques for the prediction of user attributes based on their OSN data. Our analysis shows that the perceptions of users with respect to
the disclosure of specific types of information are often incorrect. For instance, it appears that the predictability of their political beliefs and employment status is higher than they tend to believe. Interestingly, it also appears that information that is characterized by users as more sensitive, is actually more easily predictable than users think, and vice versa (i.e. information that is characterized as relatively less sensitive is less easily predictable than users might have thought).
Depression Detection in Tweets using Logistic Regression Modelijtsrd
In the growing world of modernization, mental health issues like depression, anxiety and stress are very normal among people and social media like Facebook, Instagram and Twitter have boosted the growth of such mental health. Everything has its legitimacy and negative mark. During this pandemic, people are more likely to suffer from mental health issues, they are available 24 7 and are cut off from the real world. Past examinations have shown that individuals who invest more energy via online media are bound to be depressed. In this project, we find out people who are depressed based on their tweets, followers, following and many other factors. For this, I have trained and tested our text classifier, which will distinguish between the user who is depressed or not depressed. Rahul Kumar Sharma | Vijayakumar A "Depression Detection in Tweets using Logistic Regression Model" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd41284.pdf Paper URL: https://www.ijtsrd.comcomputer-science/data-miining/41284/depression-detection-in-tweets-using-logistic-regression-model/rahul-kumar-sharma
This is a citizen science overview particularly aimed at graduate students enrolled in a new course at Arizona State University, aptly titled "Citizen Science." The author of this presentation, and course instructor, Darlene Cavalier, will talk students through its nuances and intersections with science, technology, and society.
Data Science Innovations : Democratisation of Data and Data Science suresh sood
Data Science Innovations : Democratisation of Data and Data Science covers the opportunity of citizen data science lying at the convergence of natural language generation and discoveries in data made by the professions, not data scientists.
NYC Data Science Meetup: Computational Social Sciencejakehofman
An emerging field called computational social science leverages the ability to collect and analyze large datasets to study questions in social sciences. This field is occurring in technology companies and government agencies but could also be established in open academic environments. Computational social science intersects social sciences, statistics, and computer science to address long-standing questions through large-scale data analysis and modeling.
An introduction to machine learning in biomedical research: Key concepts, pr...FranciscoJAzuajeG
This presentation provides an introduction to machine learning in biomedical research, covering key concepts, applications, and challenges. It discusses how machine learning aims to develop systems with advanced analytical or predictive capabilities using artificial intelligence and machine learning techniques. Supervised, unsupervised, and reinforcement learning paradigms are described. Examples of machine learning applications in biomedical research include phenotypic image analysis, medical diagnosis from images, and predicting cardiovascular outcomes. Challenges include data heterogeneity, lack of large labeled datasets, multi-layered data, and ensuring interpretability and reproducibility. Popular machine learning software frameworks like Scikit-learn, Keras, and TensorFlow are also mentioned.
This document discusses dynamic user profiling in open science infrastructures. It begins by providing context on open science and digital scientific infrastructures. The main challenges are increasing user engagement to promote open science practices. The document then discusses measuring user engagement through metrics like activity ratio and lurking ratio. The methodology section outlines using density clustering to identify user profiles from Nanohub data and examine life journeys where engagement increased. Preliminary results show the Nanohub data has strong clustering tendency and engagement behavior changes over a user's lifespan, with older populations exhibiting similar behaviors. Future work involves pattern recognition over time and identifying critical behaviors for long-term engagement.
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
Amit Sheth's keynote at IEEE BigData 2014, Oct 29, 2014.
Abstract from:
http://cci.drexel.edu/bigdata/bigdata2014/keynotespeech.htm
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is personalized digital health that related to taking better decisions about our health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (e.g., information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, “How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city.
Slides from #SMWCPH event Social Media Analytics: Concepts, Models, Methods, and Tools. For more information on the slides, please contact Professor Ravi Vatrapu at Copenhagen Business School. #smwcbsdata
TRANSFORMING BIG DATA INTO SMART DATA: Deriving Value via Harnessing Volume, ...Amit Sheth
Transforming big data into smart data involves deriving value from harnessing the volume, variety, and velocity of big data using semantics and the semantic web. This allows making sense of big data by providing actionable information that improves decision making. Examples discussed include a healthcare application called kHealth that uses personal sensor data along with population level data to provide personalized and timely health recommendations and interventions for conditions like asthma.
Similar to Computational Social Science:The Collaborative Futures of Big Data, Computer Science, and Social Sciences (20)
Detecting In-Situ Identity Fraud on Social Network Services: A Case Study on ...Academia Sinica
In this paper, we propose to use a continuous authentication approach to detect the in-situ identity fraud incidents, which occur when the attackers use the same devices and IP addresses as the victims. Using Facebook as a case study, we show that it is possible to detect such incidents by analyzing SNS users’ browsing behavior. Our experiment results demonstrate that the approach can achieve reasonable accuracy given a few minutes of observation time.
Although online games have been an important Internet activity today, players inevitably suffer from lag from time to time due to the Internet’s non-QoS-guaranteed architecture. Here by lag we refer to the phenomena when a game fails to respond to user commands or update the screen in a timely fashion due to long system processing or network delays. Currently, little is known about how game players feel about lag and how they react when encountering lag during game play.
In this paper, we present an Internet survey that is designed to understand the following questions: 1) How do players perceive lag, 2) what do players think of the causes of lag, and 3) how do players react to lag. Our results show that game players often struggle with lag, because they are unable to identify the root cause. Therefore, they have to try any combination of possible solutions found on the Internet, blame game companies, or learn to cope. These findings manifest a strong demand for an automatic diagnostic tool that can identify the root cause of lag for gamers.
Understanding The Performance of Thin-Client GamingAcademia Sinica
The thin-client model is considered a perfect fit for online gaming. As modern games normally require tremendous computing and rendering power at the game client, deploying games with such models can transfer the burden of hardware upgrades from players to game operators. As a result, there are a variety of solutions proposed for thin-client gaming today. However, little is known about the performance of such thinclient systems in different scenarios, and there is no systematic means yet to conduct such analysis.
In this paper, we propose a methodology for quantifying the performance of thin-clients on gaming, even for thin-clients which are close-sourced. Taking a classic game, Ms. Pac-Man, and three popular thin-clients, LogMeIn, TeamViewer, and UltraVNC, as examples, we perform a demonstration study and determine that 1) display frame rate and frame distortion are both critical to gaming; and 2) different thin-client implementations may have very different levels of robustness against network impairments. Generally, LogMeIn performs best when network conditions are reasonably good, while TeamViewer and UltraVNC are the better choices under certain network conditions.
Quantifying QoS Requirements of Network Services: A Cheat-Proof FrameworkAcademia Sinica
Despite all the efforts devoted to improving the QoS of networked multimedia services, the baseline for such improvements has yet to be defined. In other words, although it is well recognized that better network conditions generally yield better service quality, the exact minimum level of network QoS required to ensure satisfactory user experience remains an open question.
In this paper, we propose a general, cheat-proof framework that enables researchers to systematically quantify the minimum QoS needs for real-time networked multimedia services. Our framework has two major features: 1) it measures the quality of a service that users find intolerable by intuitive responses and therefore reduces the burden on experiment participants; and 2) it is cheat-proof because it supports systematic verification of the participants' inputs. Via a pilot study involving 38 participants, we verify the efficacy of our framework by proving that even inexperienced participants can easily produce consistent judgments. In addition, by cross-application and cross-service comparative analysis, we demonstrate the usefulness of the derived QoS thresholds. Such knowledge will serve important reference in the evaluation of competitive applications, application recommendation, network planning, and resource arbitration.
Online Game QoE Evaluation using Paired ComparisonsAcademia Sinica
To satisfy players' gaming experience, there is a strong need for a technique that can measure a game's quality systemically, efficiently, and reliably. In this paper, we propose to use paired comparisons and probabilistic choice models to quantify online games's QoE under various network situations. The advantages of our methodology over the traditional MOS ratings are 1) the rating procedure is simpler thus less burden is on experiment participants, 2) it derives ratio-scale scores, and 3) it enables systematic verification of participants' inputs.
As a demonstration, we apply our methodology to evaluate three popular FPS (first-person-shooter) games, namely, Alien Arena (Alien), Halo, and Unreal Tournament (UT), and investigate their network robustness. The results indicate that Halo performs the best in terms of their network robustness against packet delay and loss. However, if we take the degree of the games' sophistication into account, we consider that the robustness of UT against downlink delays should able be improved. We also show that our methodology can be a helpful tool for making decisions about design alternatives, such how dead reckoning algorithms and time synchronization mechanisms should be implemented.
Cloud gaming is a promising application of the rapidly expanding cloud computing infrastructure. Existing cloud gaming systems, however, are closed-source with proprietary protocols, which raises the bars to setting up testbeds for experiencing cloud games. In this paper, we present a complete cloud gaming system, called GamingAnywhere, which is to the best of our knowledge the first open cloud gaming system. In addition to its openness, we design GamingAnywhere for high extensibility, portability, and reconfigurability. We implement GamingAnywhere on Windows, Linux, and OS X, while its client can be readily ported to other OS's, including iOS and Android. We conduct extensive experiments to evaluate the performance of GamingAnywhere, and compare it against two well-known cloud gaming systems: OnLive and StreamMyGame. Our experimental results indicate that GamingAnywhere is efficient and provides high responsiveness and video quality. For example, GamingAnywhere yields a per-frame processing delay of 34 ms, which is 3+ and 10+ times shorter than OnLive and StreamMyGame, respectively. Our experiments also reveal that all these performance gains are achieved without the expense of higher network loads; in fact, GamingAnywhere incurs less network traffic. The proposed GamingAnywhere can be employed by the researchers, game developers, service providers, and end users for setting up cloud gaming testbeds, which, we believe, will stimulate more research innovations on cloud gaming systems.
GamingAnywhere is now publicly available at http://gaminganywhere.org.
Are All Games Equally Cloud-Gaming-Friendly? An Electromyographic ApproachAcademia Sinica
Cloud gaming makes any computer game playable on a thin client without worrying the hardware requirements as before. It frees players from the need to constantly upgrade their computers as they can now play games that host on remote servers with a broadband Internet connection and a thin client. However, cloud games are intrinsically more susceptible to latency than online games because game graphics are rendered on server nodes and thin clients do not possess game state information that is required by delay compensation techniques.
In this paper, we investigate how the response latency would affect users' experience when playing games on clouds and how the impact of latency on players' experience varies across different games. We show that not all games are equally friendly to cloud gaming. The same degree of latency may have very different impact on a game's quality of experience depending on the game's real-time strictness. We thus develop a model that can predict a game's real-time strictness based on the rate of players' inputs and game screen dynamics. The model can be used to enhance players' experience in cloud gaming and optimize data center operation cost simultaneously.
1) The document discusses predicting the addictiveness and lifetime of online games using player behavior data and facial electromyography measurements of emotions.
2) An addictiveness index is defined based on the decline rate of the ratio of player presence over time, and facial muscle activity is measured to quantify emotional responses to different games.
3) A model is created relating emotional strength measurements to addictiveness indexes with 94% accuracy, and is validated with 11% average error rate at predicting addictiveness of unseen games. This could help optimize game development and investment decisions.
Identifying MMORPG Bots: A Traffic Analysis ApproachAcademia Sinica
The document proposes four strategies to detect bots in MMORPG games by analyzing network traffic patterns without intruding on players' experiences. It describes collecting network traces from both human players and bots in Ragnarok Online. The strategies analyze timing of commands, burstiness of traffic over time, and compare client and server traffic burstiness to identify patterns characteristic of bot behavior, such as highly regular response times and smoother traffic patterns than humans. A decision framework determines if a traffic stream indicates bot control based on these strategies.
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay NodesAcademia Sinica
Peer-to-peer relaying is commonly used in realtime applications to cope with NAT and firewall restrictions and provide better quality network paths. As relaying is not natively supported by the Internet, it is usually implemented at the application layer. Also, in a modern operating system, the processor is shared, so the receive-process-forward process for each relay packet may take a considerable amount of time if the host is busy handling some other tasks. Thus, if we happen to select a loaded relay node, the relaying may introduce significant delays to the packet transmission time and even degrade the application performance.
In this work, based on an extensive set of Internet traces, we pursue an understanding of the processing delays incurred at relay nodes and their impact on the application performance. Our contribution is three-fold: 1) we propose a methodology for measuring the processing delays at any relay node on the Internet; 2) we characterize the workload patterns of a variety of Internet relay nodes; and 3) we show that, serious VoIP quality degradation may occur due to relay processing, thus we have to monitor the processing delays of a relay node continuously to prevent the application performance from being degraded.
Inferring Speech Activity from Encrypted Skype TrafficAcademia Sinica
Normally, voice activity detection (VAD) refers to speech processing algorithms for detecting the presence or absence of human speech in segments of audio signals. In this paper, however, we focus on speech detection algorithms that take VoIP traffic instead of audio signals as input. We call this category of algorithms network-level VAD.
Traditional VAD usually plays a fundamental role in speech processing systems because of its ability to delimit speech segments. Network-level VAD, on the other hand, can be quite helpful in network management, which is the motivation for our study. We propose the first real-time network-level VAD algorithm that can extract voice activity from encrypted and non-silence-suppressed Skype traffic. We evaluate the speech detection accuracy of the proposed algorithm with extensive reallife traces. The results show that our scheme achieve reasonably good performance even high degree of randomness has been injected into the network traffic.
Game Bot Detection Based on Avatar TrajectoryAcademia Sinica
In recent years, online gaming has become one of the most popular Internet activities, but cheating activity, such as the use of game bots, has increased as a consequence. Generally, the gaming community disagrees with the use of game bots, as bot users obtain unreasonable rewards without corresponding efforts. However, bots are hard to detect because they are designed to simulate human game playing behavior and they follow game rules exactly. Existing detection approaches either interrupt the players’gaming experience, or they assume game bots are run as standalone clients or assigned a specific goal, such as aim bots in FPS games.
In this paper, we propose a trajectory-based approach to detect game bots. It is a general technique that can be applied to any game in which the avatar’s movement is controlled directly by the players. Through real-life data traces, we show that the trajectories of human players and those of game bots are very different. In addition, although game bots may endeavor to simulate players’ decisions, certain human behavior patterns are difficult to mimic because they are AI-hard. Taking Quake 2 as a case study, we evaluate our scheme’s performance based on reallife traces. The results show that the scheme can achieve a detection accuracy of 95% or higher given a trace of 200 seconds or longer.
A Collusion-Resistant Automation Scheme for Social Moderation SystemsAcademia Sinica
This document summarizes a research paper on developing a collusion-resistant automation scheme for social moderation systems. The researchers propose a community-based scheme that analyzes accusation relations between users to distinguish genuine accusations from malicious ones made by colluders. It clusters users based on incoming and outgoing accusation features to label misbehaving users. Simulation results show the scheme achieves over 90% accuracy in identifying misbehaving users and prevents 90% of victims from false accusations due to collusion. The community-based approach improves over simple count-based schemes that are vulnerable to collusion attacks.
Tuning Skype’s Redundancy Control Algorithm for User SatisfactionAcademia Sinica
Determining how to transport delay-sensitive voice data has long been a problem in multimedia networking. The difficulty arises because voice and best-effort data are different by nature. It would not be fair to give priority to voice traffic and starve its best-effort counterpart; however, the voice data delivered might not be perceptible if each voice call is limited to the rate of an average TCP flow. To address the problem, we approach it from a user-centric perspective by tuning the voice data rate based on user satisfaction.
Our contribution in this work is threefold. First, we investigate how Skype, the largest and fastest growing VoIP service on the Internet, adapts its voice data rate (i.e., the redundancy ratio) to network conditions. Second, by exploiting implementations of public domain codecs, we discover that Skype’s mechanism is not really geared to user satisfaction. Third, based on a set of systematic experiments that quantify user satisfaction under different levels of packet loss and burstiness, we derive a concise model that allows user-centric redundancy control. The model can be easily incorporated into general VoIP services (not only Skype) to ensure consistent user satisfaction.
Network Game Design: Hints and Implications of Player InteractionAcademia Sinica
While psychologists analyze network game-playing behavior in terms of players’ social interaction and experience, understanding user behavior is equally important to network researchers, because how users act determines how well network systems, such as online games, perform. To gain a better understanding of patterns of player interaction and their implications for game design, we analyze a 1, 356-millionpacket trace of ShenZhou Online, a mid-sized commercial MMORPG. This work is dedicated to draw out hints and implications of player interaction patterns, which is inferred from network-level traces, for online games.
We find that the dispersion of players in a virtual world is heavy-tailed, which implies that static and fixed-size partitioning of game worlds is inadequate. Neighbors and teammates tend to be closer to each other in network topology. This property is an advantage, because message delivery between the hosts of interacting players can be faster than between those of unrelated players. In addition, the property can make game playing fairer, since interacting players tend to have similar latencies to their servers. We also find that participants who have a higher degree of social interaction tend to play much longer, and players who are closer in network topology tend to team up for longer periods. This suggests that game designers could increase the “stickiness” of games by encouraging, or even forcing, team playing.
Mitigating Active Attacks Towards Client Networks Using the Bitmap FilterAcademia Sinica
With the emergence of active worms, the targets of attacks have been moved from well-known Internet servers to generic Internet hosts, and since the rate at which patches can be applied is always much slower than the spread of a worm, an Internet worm can usually attack or infect millions of hosts in a short time. It is difficult to eliminate Internet attacks globally; thus, protecting client networks from being attacked or infected is a relatively critical issue.
In this paper, we propose a method that protects client networks from being attacked by people who try to scan, attack, or infect hosts in local networks via unpatched vulnerabilities. Based on the symmetry of network traffic in both temporal and spatial domains, a bitmap filter is installed at the entry point of a client network to filter out possible attack traffic. Our evaluation shows that with a small amount of memory (less than 1 megabyte), more than 95% of attack traffic can be filtered out in a small- or medium-scale client network.
Online gaming has become increasingly popular in recent years. Currently, the most common business model of online gaming is based on monthly subscription fees that game players pay to obtain credits, which allow them to start or continue a journey in the game’s virtual world. Therefore, from the perspective of game operators, predicting how many players will join a game and how long they will stay in the game is important since these two factors dominate their revenue.
This paper represents a pilot study of the predictability of online gamers’ subscription time. Specifically, we study the gameplay hours of online gamers and investigate whether strong patterns are embedded in their game hours. Our ultimate goal is to provide a prediction model of online gamers, which takes a player’s game hours as the input and predicts whether the player will leave in the near future. Our study is based on real-life traces collected from World of Warcraft, a famous MMORPG (Massively Multiplayer Online Role- Playing Game). The traces contain the gameplay histories of 34, 524 players during a two-year period. We believe that our study would be useful for building a prediction model of players’ future game hours and unsubscription decisions; i.e., decisions not to renew subscriptions.
.
Online gaming is one of the most profitable businesses on the Internet. Of all the genres of online games, MMORPGs (Massive Multiplayer Online Role Playing Games) have become the most popular among network gamers, and now attract millions of users who play in an evolving virtual world simultaneously over the Internet. To gain a better understanding of game traffic and contribute to the economic well-being of the Internet, we analyze a 1, 356-million-packet trace from a sizeable MMORPG called ShenZhou Online. This work is, as far as we know, the first formal analysis of MMORPG server traces.
We find that MMORPG and FPS (First-Person Shooting) games are similar in that they both generate small packets and require low bandwidths. In practice, the bandwidth requirement of MMORPGs is the lower of the two due to less real-time game playing. More distinctive features are the strong periodicity, temporal locality, irregularity, and self-similarity observed in MMORPG traffic. The periodicity is due to a common practice in game implementation, where game state updates are accumulated within a fixed time window before transmission. The temporal locality in game traffic is largely due to the game’s nature, whereby one action leads to another. The irregularity, which is unique to MMORPG traffic, is due to the diversity of the game’s design so that the behavior of users can vary drastically, depending on the quest at hand. The self-similarity of the aggregate traffic is due to the heavy-tailed active/idle activities of individual players. Moreover, we show that the arrival of game sessions within one hour can be modelled by a Poisson model, while the duration of game sessions is heavy-tailed.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
Enhanced data collection methods can help uncover the true extent of child abuse and neglect. This includes Integrated Data Systems from various sources (e.g., schools, healthcare providers, social services) to identify patterns and potential cases of abuse and neglect.
Generative Classifiers: Classifying with Bayesian decision theory, Bayes’ rule, Naïve Bayes classifier.
Discriminative Classifiers: Logistic Regression, Decision Trees: Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Attribute selection measures- Gini impurity; Entropy, Regularization Hyperparameters, Regression Trees, Linear Support vector machines.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Computational Social Science:The Collaborative Futures of Big Data, Computer Science, and Social Sciences
1. 2
Sheng-Wei (Kuan-Ta) Chen
Institute of Information Science
Academia Sinica
Computational Social Science
The Collaborative Futures of Big Data, Computer
Science, and Social Sciences
5. 6
Social Life is Hard to See
We can interview friends, but we cannot
interview a friendship
Fleeting interaction
In private
Tedious to record over time,
especially in large groups
6. 7
Bigger Problems
Social phenomena involve many individuals
interacting to produce collective entities
firms, markets, cultures, political parties, social movements,
audiences
“Micro-Macro” problem (aka “Emergence”)
Micro-macro problems are hard to study
empirically
Difficult to collect observational data about individuals,
networks, and populations at same time
Even more difficult to do “macro” scale experiments
7. 8
1890 US Census
1st time Hollerith machines were used to
tabulate US Census data
(population: 62,947,714)
8. 9
The Era of Big Data
Past: Government data, national survey data
Today: A variety of new data sources
Economic data: trade, finance, e-cash / e-wallet, ...
GIS data: satellite, GPS loggers, laser scanning cars, …
Sensor data: video surveillance, smart phones, wearable
devices, mobile apps, beacons, …
16. 18
Web as a Record of Social Interaction
Public web pages / discussions
Twitter, Facebook, blogs, news groups, wikis, MMOGs,
Instagram, LastFM, Flickr, Spotify
Private email, Whatsapp, LINE, Slack
Text, images, sounds: speeches, commercials
19. 23
Computational Social Science
An instrument-enabled scientific discipline
microbiology
microscope
radio astronomy
radar
nanoscience
electron microscope
23. 27
Technical Challenges
Computational infrastructures for dealing with
More data: analyzing large amounts of data
Fuzzy data: cleaning up inprecise and noisy data
New kinds of data: processing real-time sensor streams and
web data
Need for new substantive ideas
Need for new statistical methods (WHY in
addition to WHAT and HOW)
26. 31
WE ARE WHAT WE SAY
Linguistics
Schwartz, H. Andrew, et al. "Personality, gender, and age in the language of social media:The
open-vocabulary approach." PloS one 8.9 (2013): e73791.
Macroscope
27. 32
Dataset
700 million words, phrases, and topic instances
collected from 75,000 volunteers’ FB posts
Record users’ personality (5-factor), gender and
age
36. 43
Scaling up the Lab
Social science experimental heavily constrained
by scale and speed
Unit of analysis was individuals or small groups
Experiments took months to design and run
Potentially “virtual labs” lift both constraints
State of the art ~ 5000 workers, but in principle
could construct subject panel ~ 100K – 1M
Could shrink hypothesis-testing cycle to days or
hours
37. 44
MOOD CONTAGION (& MANIPULATION)
ON FACEBOOK
Social Psychology
Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. "Experimental evidence of massive-scale
emotional contagion through social networks.” Proceedings of the National Academy of Sciences111.24 (2014):
8788-8790.
Virtual Lab
38. 45
Facebook Mood Contagion
0.7 million (~ 0.04%) users on Facebook
3 million posts manipulated in one week
Hide some “positive” or “negative” emotional
posts from users (in the experimental group)
39. 46
Observations
Negative posts hidden
People who see more positive posts, tend
to post more positively, and vice versa.
Facebook users’ emotion can be easily
manipulated by changing ALGORITHMS
Positive posts hidden
40. 47
Ethical Issues (!)
Unethical experiment because it’s conducted
without users’ consent
Serious invasion of users’ perceptions about
their friend circles (and the society)
Well, Facebook's data use policy states that users'
information will be used "for internal operations, including
troubleshooting, data analysis, testing, research and
service improvement," meaning that any user can become
a lab rat.
41. 48
FACEBOOK “I VOTED” BUTTON
Social Psychology & Politics
Bond, Robert M., et al. "A 61-million-person experiment in social influence and political
mobilization." Nature 489.7415 (2012): 295-298.
Virtual Lab
42. 49
“I Voted” Button
Direct messages to 61 million users on FB
Informational: 1% users received
Social: 98% users received
Control group: 1% (no message received)
Informational
Social
44. 53
2% more likely to click “I voted” button and 0.3%
more likely to seek information about a polling place,
and 0.4% more likely to head to the polls.
45. 54
Real-world Consequence (!)
In total there were about 60,000 votes of
turnout, and estimated 280,000 indirect
turnout (out of 61 million users)
What if Facebook did not randomize the
control/experimental groups?
47. 63
Empirical Modeling
Traditional mathematical or computational
modeling
Tends to rely on many, often unrealistic, assumptions
Not generally tested in detail against data
Result is proliferation of models that exist in parallel and are
often incompatible with each other
New sources/scales of data allow both to
learn/test models and also calibrate them
Observations Models Lab Field Observations
50. 66
PREDICTION OF COUNTY-LEVEL
HEART DISEASE MORTALITY
Medicine and Linguistics
Empirical Modeling
Eichstaedt, Johannes C., et al. "Psychological language on twitter predicts county-level heart disease
mortality." Psychological science 26.2 (2015): 159-169.
51. 68
Datsets
Heart disease
Arteriosclerotic heart disease
mortality rates during 2009 -- 2010
Predictors
826 million tweets collected between June 2009 and March
2010
Socioeconomic (income and education)
Demographic (percentages of Black, Hispanic, married, and
female residents)
Health status (diabetes, obesity, smoking, and hypertension)
57. 76
YOU ARE WHAT YOU LIKE
Social Psychology
Empirical Modeling
Kosinski, Michal, David Stillwell, andThore Graepel. "Private traits and attributes are predictable from digital
records of human behavior." Proceedings of the National Academy of Sciences 110.15 (2013): 5802-5805.
58. 78
Personality Prediction
Personality traits
Gender, age, relationship status, # friends
Sexual orientation, ethnicity, religion, political
inclination
Addictive substances (alcohol, drugs, cigarette),
parental separation
IQ, 5-Factor model, satisfaction with Life
60. 80
Ground truth
Political Inclination
Sexual Orientation
Democrat Republican
Democratic GOP (Grand Old Party)
Democratic Party Republican Party
Homosexual Heterosexual
1 / 0 1 / 0
64. 87
Prediction Results
Solid: Pearson corr. coef. between pred. & actual values
Transparent: baseline acc. of the questionnaire, in terms of test-
retest reliability
70. 178
LOTS of Big Questions
The polarization of global economic inequality
What explains the success of social movements?
The emergence of pro-sociality behavior
The causality of video gaming and propensity of
violence?
The politics of censorship
The causality of social selection and social
influence?
…
71. 179
The Data Divide
Social scientists have good questions but…
IT tools are not part of their toolkits
Not clear that we will/should make the investment
Computer scientists have powerful methods
but…
Trained to resolve technical problems
It seems there are less “methodological”
contributions
72. 180
The Challenges
Education and habits of social and computer
scientists
Different ways of thinking
Different methodologies
Differences in framing questions and defining contributions
Data access and fragmentation issue
Data privacy issue
Ethics issue
Organizational issue
73. 182
Institutional Innovations
New platforms and protocols for data
management
Better coordination of data collection, storage, sharing
Recruitment and management of subject pools, field panels
Integrated research designs
Coordination across theoretical, experimental and
observational studies
Collaborative interdisciplinary teams
For a given data set, often unclear what the most interesting
question is
For a given question, often unclear how to collect the right
data
74. 184
Techniques to collect,
manage, and process large
datasets
Knowledge about social
theories, methods, and
issues
Computational
Social Science