1. The document provides an overview of the first lecture in an Introduction to Computational Social Science course. 2. It defines computational social science and discusses its main areas which include big data, social networks, social complexity, and simulation. 3. The lecture also explores some examples of computational social science research such as modeling the spread of disease, tracking news and meme propagation online, and simulating water demand.
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Lauri Eloranta
Third lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
The emerging field of computational social science (CSS) is devoted to the pursuit of interdisciplinary social science research from an information processing perspective, through the medium of advanced computing and information technologies.
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
Fourth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...Lauri Eloranta
Sixth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
With the world moving to the cloud, the need to conduct testing has simultaneously arisen. This PPT will shed light on the key factors under cloud computing and the types of testing performed for the same. Get to know more on Cloud Service Models, Key Characteristics, Cloud Testing, Functional testing, Performance and Benchmark testing, Network resting, Interoperability and Compatibility testing, cloud testing tools, and cloud testing methodology through this PPT as well as stay tuned for the upcoming ones.
DSpace 7 - The Power of Configurable EntitiesAtmire
Presented at the Open Repositories 2019 conference in Hamburg.
"DSpace 7 has been extended with “Configurable Entities” in response to a growing need for describing more types of objects and relations between objects as well as compound objects; examples include: authors, projects, datasets, grants, lecture series, ... .
This talk will do a deeper dive into the new Configurable Entities feature, including how to configure your DSpace to support different object models and how users can create the relations between items. New concepts in DSpace 7 such as relations between items, virtual metadata, display options per object type, … will be introduced.
Defining an object model through configuration in DSpace 7 is made possible without using specific hardcoded Java classes for the specific objects. To achieve this the concept starts from the current DSpace Item object and extends it, also allowing institutions to keep using DSpace out-of-the-box with its familiar object model. The entities in a custom object model are items that can be typed, and relations between items of different types can be created. Several different object models can be defined and can exist alongside one another in the same repository."
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Lauri Eloranta
Third lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
The emerging field of computational social science (CSS) is devoted to the pursuit of interdisciplinary social science research from an information processing perspective, through the medium of advanced computing and information technologies.
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Lauri Eloranta
Fourth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Simulation in Social Sciences - Lecture 6 in Introduction to Computational S...Lauri Eloranta
Sixth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
With the world moving to the cloud, the need to conduct testing has simultaneously arisen. This PPT will shed light on the key factors under cloud computing and the types of testing performed for the same. Get to know more on Cloud Service Models, Key Characteristics, Cloud Testing, Functional testing, Performance and Benchmark testing, Network resting, Interoperability and Compatibility testing, cloud testing tools, and cloud testing methodology through this PPT as well as stay tuned for the upcoming ones.
DSpace 7 - The Power of Configurable EntitiesAtmire
Presented at the Open Repositories 2019 conference in Hamburg.
"DSpace 7 has been extended with “Configurable Entities” in response to a growing need for describing more types of objects and relations between objects as well as compound objects; examples include: authors, projects, datasets, grants, lecture series, ... .
This talk will do a deeper dive into the new Configurable Entities feature, including how to configure your DSpace to support different object models and how users can create the relations between items. New concepts in DSpace 7 such as relations between items, virtual metadata, display options per object type, … will be introduced.
Defining an object model through configuration in DSpace 7 is made possible without using specific hardcoded Java classes for the specific objects. To achieve this the concept starts from the current DSpace Item object and extends it, also allowing institutions to keep using DSpace out-of-the-box with its familiar object model. The entities in a custom object model are items that can be typed, and relations between items of different types can be created. Several different object models can be defined and can exist alongside one another in the same repository."
DSpace-CRIS: new features and contribution to the DSpace mainstream4Science
The presentation focus on the latest releases of DSpace-CRIS, compatible with DSpace 5 and 6, with new exciting features. Particularly interesting is the recent integration between DSpace-CRIS and CKAN released as an independent module. The DSpace-CKAN Integration Module has already been released in open source (same license than DSpace) and it can easily adopted also by standard DSpace installations, both JSPUI or XMLUI.
Starting with DSpace-CRIS 5.6.1, along with the security fixes of DSpace JSPUI 5.6, the following features have been introduced: an extendible UI to deliver the bitstreams with dedicated viewers, a simple metadata editing of any DSpace object; the editing of archived items using the submission UI; a deduplication and duplicate-alert tool; improved ORCiD synchronization; improved submission form; improved security model for CRIS entities; creation of CRIS object as part of the submission process, automatic calculation of metrics; advanced import framework; on-demand DOI registration; template services.
DSpace-CKAN Integration Module allows users to directly preview the dataset content deposited in a CKAN instance from DSpace via a “curation task”. DSpace-CRIS and DSpace-CKAN will be supported by 4Science also for the future major versions of the platform and the roadmap to the DSpace 7 compatibility will be also presented.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
An introduction in the world of Social Network Analysis and a view on how this may help learning networks. History, data collection and several analysis techniques are shown.
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowPyData
By Sudheesh Katkam
PyData New York City 2017
Dremio is a new open source project for self-service data fabric. Dremio simplifies and accelerates access to data from any source and any size, including relational databases, NoSQL, Hadoop, Parquet, and text files. We'll show you how you can use Dremio to visually curate data from any source, then access via Pandas or Jupyter notebook for rapid access.
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting; which would you use in production?
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn?
At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with native Spark.ml, and compare it to scikit-learn along several dimensions: ease of use, productivity, feature set, and performance.
In some ways Spark.ml is still rather immature, but it also conveys new superpowers to those who know how to use it.
Basics of Computation and Modeling - Lecture 2 in Introduction to Computation...Lauri Eloranta
Second lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this talk, I present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
This workshop will introduce some of the main principles and techniques of Social Network Analysis (SNA). We will use examples from organizational and social media-based networks to understand concepts such as network density, diameter, centrality measures, community detection algorithms, etc. The session will also introduce Gephi, a popular program for SNA. Gephi is a free and open-source tool that is available for both Mac and PC computers.
By the end of the session, you will develop a general understanding of what SNA is, what research questions it can help you answer, and how it can be applied to your own research. You will also learn how to use Gephi to visualize and examine networks using various layout and community detection algorithms.
Instructor’s Bio: Dr. Anatoliy Gruzd is a Canada Research Chair in Social Media Data Stewardship, Associate Professor at the Ted Rogers School of Management at Ryerson University, and Director of Research at the Social Media Lab. Anatoliy is also a Member of the Royal Society of Canada’s College of New Scholars, Artists and Scientists; a co-editor of a multidisciplinary journal on Big Data and Society; and a founding co-chair of the International Conference on Social Media and Society. His research initiatives explore how social media platforms are changing the ways in which people and organizations communicate, collaborate and disseminate information and how these changes impact the norms and structures of modern society.
How to apply machine learning into your CI/CD pipelineAlon Weiss
A quick introduction to AIOps, the business reasons why the CI/CD pipeline needs to constantly improve, and how this can be accomplished with data that's already available with existing Machine Learning and other algorithms.
Learn how you can drive your business forward with confidence by making decisions based on actionable insights gained from organizational data in real-time.
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
Adoption of Apache Spark in the enterprise is increasing rapidly - it's become one of the fastest growing and most popular technologies in the Big Data ecosystem.
However, implementing an enterprise-ready, on-premises Spark deployment can be very complex and it requires expertise that is generally not available to all.
BlueData makes it easier to deploy Apache Spark on-premises. With BlueData, you can spin up virtual Spark clusters within minutes – providing secure, self-service, on-demand access to Big Data analytics and infrastructure. You can deploy Spark in standalone mode or with Hadoop / YARN. You can also build analytical pipelines and create Spark clusters using our RESTful APIs, and use web-based Zeppelin notebooks for interactive data analytics.
BlueData’s software platform leverages virtualization and Docker containers – combined with our own patent-pending innovations – to make it faster, and more cost-effective for enterprises to get up and running with a multi-tenant Spark deployment on-premises.
Learn more at www.bluedata.com
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...Lauri Eloranta
Final lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Complex Social Systems - Lecture 5 in Introduction to Computational Social Sc...Lauri Eloranta
Fifth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
DSpace-CRIS: new features and contribution to the DSpace mainstream4Science
The presentation focus on the latest releases of DSpace-CRIS, compatible with DSpace 5 and 6, with new exciting features. Particularly interesting is the recent integration between DSpace-CRIS and CKAN released as an independent module. The DSpace-CKAN Integration Module has already been released in open source (same license than DSpace) and it can easily adopted also by standard DSpace installations, both JSPUI or XMLUI.
Starting with DSpace-CRIS 5.6.1, along with the security fixes of DSpace JSPUI 5.6, the following features have been introduced: an extendible UI to deliver the bitstreams with dedicated viewers, a simple metadata editing of any DSpace object; the editing of archived items using the submission UI; a deduplication and duplicate-alert tool; improved ORCiD synchronization; improved submission form; improved security model for CRIS entities; creation of CRIS object as part of the submission process, automatic calculation of metrics; advanced import framework; on-demand DOI registration; template services.
DSpace-CKAN Integration Module allows users to directly preview the dataset content deposited in a CKAN instance from DSpace via a “curation task”. DSpace-CRIS and DSpace-CKAN will be supported by 4Science also for the future major versions of the platform and the roadmap to the DSpace 7 compatibility will be also presented.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
An introduction in the world of Social Network Analysis and a view on how this may help learning networks. History, data collection and several analysis techniques are shown.
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowPyData
By Sudheesh Katkam
PyData New York City 2017
Dremio is a new open source project for self-service data fabric. Dremio simplifies and accelerates access to data from any source and any size, including relational databases, NoSQL, Hadoop, Parquet, and text files. We'll show you how you can use Dremio to visually curate data from any source, then access via Pandas or Jupyter notebook for rapid access.
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting; which would you use in production?
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn?
At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with native Spark.ml, and compare it to scikit-learn along several dimensions: ease of use, productivity, feature set, and performance.
In some ways Spark.ml is still rather immature, but it also conveys new superpowers to those who know how to use it.
Basics of Computation and Modeling - Lecture 2 in Introduction to Computation...Lauri Eloranta
Second lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To address these problems, many companies are building custom “ML platforms” that automate this lifecycle, but even these platforms are limited to a few supported algorithms and to each company’s internal infrastructure. In this talk, I present MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library and development tool of their choice to reliably build and share ML applications. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
This workshop will introduce some of the main principles and techniques of Social Network Analysis (SNA). We will use examples from organizational and social media-based networks to understand concepts such as network density, diameter, centrality measures, community detection algorithms, etc. The session will also introduce Gephi, a popular program for SNA. Gephi is a free and open-source tool that is available for both Mac and PC computers.
By the end of the session, you will develop a general understanding of what SNA is, what research questions it can help you answer, and how it can be applied to your own research. You will also learn how to use Gephi to visualize and examine networks using various layout and community detection algorithms.
Instructor’s Bio: Dr. Anatoliy Gruzd is a Canada Research Chair in Social Media Data Stewardship, Associate Professor at the Ted Rogers School of Management at Ryerson University, and Director of Research at the Social Media Lab. Anatoliy is also a Member of the Royal Society of Canada’s College of New Scholars, Artists and Scientists; a co-editor of a multidisciplinary journal on Big Data and Society; and a founding co-chair of the International Conference on Social Media and Society. His research initiatives explore how social media platforms are changing the ways in which people and organizations communicate, collaborate and disseminate information and how these changes impact the norms and structures of modern society.
How to apply machine learning into your CI/CD pipelineAlon Weiss
A quick introduction to AIOps, the business reasons why the CI/CD pipeline needs to constantly improve, and how this can be accomplished with data that's already available with existing Machine Learning and other algorithms.
Learn how you can drive your business forward with confidence by making decisions based on actionable insights gained from organizational data in real-time.
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
Adoption of Apache Spark in the enterprise is increasing rapidly - it's become one of the fastest growing and most popular technologies in the Big Data ecosystem.
However, implementing an enterprise-ready, on-premises Spark deployment can be very complex and it requires expertise that is generally not available to all.
BlueData makes it easier to deploy Apache Spark on-premises. With BlueData, you can spin up virtual Spark clusters within minutes – providing secure, self-service, on-demand access to Big Data analytics and infrastructure. You can deploy Spark in standalone mode or with Hadoop / YARN. You can also build analytical pipelines and create Spark clusters using our RESTful APIs, and use web-based Zeppelin notebooks for interactive data analytics.
BlueData’s software platform leverages virtualization and Docker containers – combined with our own patent-pending innovations – to make it faster, and more cost-effective for enterprises to get up and running with a multi-tenant Spark deployment on-premises.
Learn more at www.bluedata.com
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...Lauri Eloranta
Final lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Complex Social Systems - Lecture 5 in Introduction to Computational Social Sc...Lauri Eloranta
Fifth lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Lauri Eloranta
Seventh lecture of the course CSS01: Introduction to Computational Social Science at the University of Helsinki, Spring 2015.(http://blogs.helsinki.fi/computationalsocialscience/).
Lecturer: Lauri Eloranta
Questions & Comments: https://twitter.com/laurieloranta
Digital Transformation in Social Sciences - What is Computational Social Science?
A keynote presentation given at the Digital Humanities Morning held at the University of Helsinki, 12.05.2015
By: Lauri Eloranta
twitter: @laurieloranta
Social media data for Social science researchDavide Bennato
This is the talk I gave at the Lipari Summer School on Computational social science 2013. What are relationship between social science and big data? With a focus on Twitter and its social media mining tools
http://www.tecnoetica.it/2013/08/07/lipari-summer-school-computational-social-science-big-data-e-twitter/
My Tribute to a great man, Political Hero and my great grandfather Gov. Demetrio Larena y Sande
I hope relatives and cousins would give a tribute and do some historical research about Demetrio Larena better than this research
Computational Social Science – what is it and what can(‘t) it do?Christian Bokhove
Title: Computational Social Science – what is it and what can(‘t) it do?
What is your talk about?
In Computational Social Science (CSS) we use computer science algorithms to analyse qualitative data at scale. In this talk I define CSS, describe what the opportunities and barriers are in using such methods, and give examples from published research, for example on analysing thousands of Ofsted documents.
What are the key messages of your talk?
The use of CSS methods makes it is possible to analyse some data sources at scale that previously would be unrealistic to analyse ‘by hand’.
What are the implications for practice or research from your talk?
CSS allows both more qualitative and more quantitative researchers to analyse unstructured data sources at scale.
Short Biography
Dr Christian Bokhove is an Associate Professor in Mathematics. In his research, he combines conventional qualitative and quantitative methods with novel computational methods.
Digital Humanities in Practice, DHC 2012Monica Bulger
This paper presents findings of a fieldwork study that explored research practices, challenges, and directions in contemporary digital humanities scholarship. The study was conducted in the period April-October, 2010, as part of two research projects of the Royal Netherlands Academy of Arts and Sciences and the Oxford Internet Institute. The studies included observations, focus groups, and in-depth interviews with digital humanities scholars, policymakers, and funders, with a focus on developers and users of digital resources for humanities research. The study involved 92 participants from over 25 institutions in 5 countries.
Presented by: Monica Bulger, Eric T. Meyer, and Sally Wyatt, with Smiljana Antonijevic
Presentation given at the HEA Social Sciences learning and teaching summit 'Exploring the implications of ‘the era of big data’ for learning and teaching'.
A blog post outlining the issues discussed at the summit is available via: http://bit.ly/1lCBUIB
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Jisc
The analysis of government data, data held by business, the web, social science survey data will support new research directions and findings. Big Data is one of David Willetts’ 8 great technologies, and in order to secure the UK’s competitive advantage new investments have been made by the Economic Social Science Research Council ( ESRC) in Big Data, for example the Business Datasafe and Understanding Populations investments. In this session the benefits of the use of Big Data in social science , and the ESRCs Big Data strategy will be explained by Professor David De Roure.of the Oxford e-Research Centre and advisor to the ESRC.
Keynote Address, International Conference of the Learning Sciences, London Festival of Learning
Transitioning Education’s Knowledge Infrastructure:
Shaping Design or Shouting from the Touchline?
Abstract: Bit by bit, a data-intensive substrate for education is being designed, plumbed in and switched on, powered by digital data from an expanding sensor array, data science and artificial intelligence. The configurations of educational institutions, technologies, scientific practices, ethics policies and companies can be usefully framed as the emergence of a new “knowledge infrastructure” (Paul Edwards).
The idea that we may be transitioning into significantly new ways of knowing – about learning and learners – is both exciting and daunting, because new knowledge infrastructures redefine roles and redistribute power, raising many important questions. For instance, assuming that we want to shape this infrastructure, how do we engage with the teams designing the platforms our schools and universities may be using next year? Who owns the data and algorithms, and in what senses can an analytics/AI-powered learning system be ‘accountable’? How do we empower all stakeholders to engage in the design process? Since digital infrastructure fades quickly into the background, how can researchers, educators and learners engage with it mindfully? If we want to work in “Pasteur’s Quadrant” (Donald Stokes), we must go beyond learning analytics that answer research questions, to deliver valued services to frontline educational users: but how are universities accelerating the analytics innovation to infrastructure transition?
Wrestling with these questions, the learning analytics community has evolved since its first international conference in 2011, at the intersection of learning and data science, and an explicit concern with those human factors, at many scales, that make or break the design and adoption of new educational tools. We are forging open source platforms, links with commercial providers, and collaborations with the diverse disciplines that feed into educational data science. In the context of ICLS, our dialogue with the learning sciences must continue to deepen to ensure that together we influence this knowledge infrastructure to advance the interests of all stakeholders, including learners, educators, researchers and leaders.
Speaking from the perspective of leading an institutional analytics innovation centre, I hope that our experiences designing code, competencies and culture for learning analytics sheds helpful light on these questions.
The workshop opens with a discussion of how to repurpose digital "methods of the medium" for social and cultural scholarly research, including its limitations, critiques and ethics. Subsequently participants are trained in using digital methods in hands-on sessions. How to use crawlers for dynamic URL sampling and issue network mapping? How to employ scrapers to create a bias or partisanship diagnostic instrument? We also consider how to deploy online platforms for social research. How to transform Wikipedia from an online encyclopaedia to a device for cross-cultural memory studies? How to make use of social media so as to profile the preferences and tastes of politicians’ friends, and also locate most engaged with content? How to make use of Twitter analytics to debanalize tweets, and provide compelling accounts of events on the ground? Finally, the workshop turns to the question of employing web data and metrics as societal indices more generally.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
3. DATA MINING
DATAAND SOCIETY
BIG DATA
PREDICTIVE ANALYSIS
DIGITAL METHODS
DIGITAL HUMANITIES
SOCIAL NETWORK ANALYSIS
PROGRAMMING IN SOCIAL SCIENCE
IT IS A JUNGLE
OUT THERECOMPLEX SYSTEMS
DATA SCIENCE
HADOOP/MAP REDUCE
REACTIVE PROGRAMMING
PERSONAL DATA
MY DATA
OPEN DATA
IOT / WEARABLES
BUZZ
HYPE
BUZZ
HYPE
BUZZ
HYPE
THE BACKGROUNDIMAGE “JUNGLE”BY LUKE JONES
IS UNDERCREATIVECOMMONS LICENSE.
SEE ORIGINALIMAGEHERE. SEE LICENSE TERMS HERE.
4. NOT THAT MUCH
TALKINGAND
EVEN LESS
DOINGONLYAFEW PIONEERS
INTHE DESERTED CSS SCENE IN FINLAND
THE BACKGROUNDIMAGE “DESERT”BY MOYAN BRENN
IS UNDERCREATIVECOMMONS LICENSE.
SEE ORIGINALIMAGEHERE. SEE LICENSE TERMS HERE.
5.
6. • Practicalities
• What is computational social science?
• Areas of Computational Social Science
• (Big) Data & automated information extraction
• Social Networks
• Social Complexity
• Simulation
• Research examples
• Lecture 1 Reading
LECTURE 1OVERVIEW
8. • The slides and all materials will be online at
http://blogs.helsinki.fi/computationalsocialscience/
• Course consists of
• 8 Lectures
• A Research Plan Assignment (required, if you want study credits, 5op)
• Any questions?
• Contact lecturer Lauri Eloranta at firstname dot lastname @helsinki.fi
PRACTICALITIESGENERAL
10. • Course Book
• Cioffi-Revilla, Claudio (2014). Introduction to
Computational Social Science. Springer-
Verlag, London.
• Further
Reading:
LITERATURECOURSEBOOK
11. • The full eBook is available via Helsinki
University Library:
https://helka.linneanet.fi/cgi-
bin/Pwebrecon.cgi?BBID=2753081
LITERATURECOURSEBOOK
12. LITERATUREADDITIONALREADING
• There will be additional reading given for each lecture
• Research articles on the topic at hand, some will be given for “homework
reading”
• The full list of articles can be found at:
http://blogs.helsinki.fi/computationalsocialscience
13. • Write a short research plan where you apply a computational social
science method to a research problem
• Length: 8 pages for Master’s students, 10 pages for PhD students
• Focus on research method <-> research data <-> research problem
• How to write a research plan, general instructions:
• http://www.uta.fi/cmt/en/doctoralstudies/apply/Tutkimussuunnitelmaohje
et_EN%5B1%5D.pdf
• https://into.aalto.fi/display/endoctoraltaik/Research+Plan
ASSIGNMENTGENERAL
14. • Assignment DL is Friday 2.10.2015 at EOD/Midnight.
• All assignments are returned in PDF-format
• How to save my work in pdf-format ? You can ”Save as PDF” or ”Print to PDF” in MS
Word
• Include your name, student ID and contact details
• Assignments are returned to the lecturer Lauri Eloranta via email:
firstname dot lastname @ helsinki.fi
• Grading is done in one month’s time, and you will receive the study
credits on or before 30.10.2015.
ASSIGNMENTHOWTO RETURN THEASSIGNMENT
15. • Contains six course, covering different aspects of computational social
science
• Full stydy block 25-30 op.
• Basic courses (mandatory)
• Introduction to Computational Social Science (5 op) (I period)
• Introduction to Programming in Social Science (5 op) (II period)
• Special courses
• Data extraction (5 op) (IV period)
• Network Analysis (5 op) (in 2016 – 2017)
• Complex Systems (5 op) (III period)
• Simulation (5 op) (in 2016 – 2017)
COMPUTATIONALSOCIAL
SCIENCE STUDYBLOCK
17. “In short, a computational social science is
emerging [field] that leverages the capacity
to collect and analyze data with an
unprecedented breadth and depth and
scale.” (Lazer et al. 2009.)
Lazer, D. et al. 2009. Computational Social Science. Science. 6 February 2009: Vol. 323, no. 5915, pp. 721-723.
18. • “In short, a computational social science is emerging [field] that
leverages the capacity to collect and analyze data with an
unprecedented breadth and depth and scale.”
• Lazer, D. et al. 2009. Computational Social Science. Science. 6 February
2009: Vol. 323, no. 5915, pp. 721-723.
LAZER ETAL. 2009
19. • “The increasing integration of technology into our lives has created
unprecedented volumes of data on society’s everyday behaviour. Such
data opens up exciting new opportunities to work towards a quantitative
understanding of our complex social systems, within the realms of a
new discipline known as Computational Social Science. Against a
background of financial crises, riots and international epidemics, the
urgent need for a greater comprehension of the complexity of our
interconnected global society and an ability to apply such insights in
policy decisions is clear. (Conte et al. 2012)
• Conte, R. 2012. Manifesto of Computational Social Science. The
European Physical Journal Special Topics. November 2012: Vol. 214,
Issue 1, pp. 325-346.
CSS MANIFESTO(CONTE ETAL. 2012)
20. • “Computational social science refers to the academic sub-disciplines
concerned with computational approaches to the social sciences. Fields
include computational economics and computational sociology.
It is a multi-disciplinary and integrated approach to social survey
focusing on information processing by means of advanced information
technology. The computational tasks include the analysis of social
networks and social geographic systems.”
• (Wikipedia 2015, http://en.wikipedia.org/wiki/Computational_social_science)
WIKIPEDIA
21. • “The new field of Computational Social Science can be
defined as the interdisciplinary investigation of the social
universe of many scales, ranging from individual actors to
the largest groupings, through the medium of computation.”
(Cioffi-Revilla, 2014.)
CIOFFI-REVILLA, 2014
Cioffi-Revilla, Claudio (2014). Introduction to Computational Social Science.
Springer-Verlag, London.
29. Computational Social Science
proposes revolutionary opportunities
for the social sciences, but it has still
some challenges in relation to
methods, interdisciplinary
cooperation and research ethics.
30. 1. Solving increasingly complex problems: The problems of global
world are complex: computational methods might be able to solve
these complex issues
2. The rise of data: The amounts of data has exploded during the 21st
century
3. IT and Instrumental revolution: all the new tools and possibilities
4. Complex systems: modeling our dynamic organisations and societies
5. Social networks: modeling human behavior as networks
6. Making predictions and simulations: predicting future from the past
7. Interdisciplinary field: (social sciences, math, computer science…)
8. Many problems and challenges, especially regarding research
ethics
CSS COMPONENTS
31. • Information processing paradigm has two aspects in relation
to CSS:
1. Information processing is substantive to the complex
systems of society that CSS researches: This means that
information processing is takes part in forming and
evolution of complex systems.
2. Information processing is methodological in the sense
that it serves as the core instrument of CSS
COMPUTATIONAL
PARADIGM OF SOCIETY
(Cioffi-Revilla, 2014.)
33. • Areas of Computational Social Science
1. (Big) Data & automated data extraction
• Generate, retrieve, sort, modify, transform, … data
2. Social Networks
• Network analysis and social networks
3. Social Complexity
• Social complexity, complex adaptive systems, complex
systems modeling
4. Simulation
FOUR MAINAREAS OF CSS
(Cioffi-Revilla, 2014.)
34. • Data and automated information extraction can be seen as foundation
for the other areas of CSS
• Raw data can be used as:
1. Data for its own sake: as research data -> data is the subject of
research
2. Data for modeling or validating other phenomena via. e.g. network
analysis, complex systems analysis or simulation
• Data is generated, retrieved, modified, transformed,… for research
purposes via computational automation
BIG DATA&AUTOMATED
INFORMATION EXTRACTION
(Cioffi-Revilla, 2014.)
35. • A long tradition in network analysis (much older field than CSS)
• Social Networks (Facebook, Twitter, etc.) just one part of network
analysis
• Many other social interactions can be modeled as networks -> thus
social networks are not technology dependent as such
• -> e.g. modeling family as network
• -> e.g. modeling a project as network
SOCIALNETWORKS
(Cioffi-Revilla, 2014.)
36. • Society seen as a complex adaptive system:
• Phase transitions
• Adaptation (multi stage process)
• Need -> intent -> capacity -> implementation
• Goal
• Information processing in many parts of Complex adaptive systems
• To help adaptation, allocating resources, coordination, …
• Family as and complex adaptive system:
• Development, hardships, births, deaths, successes, failures
• Adaptation over decades
SOCIALCOMPLEXITY
(Cioffi-Revilla, 2014.)
37. • Three types of systems
1. Natural systems
2. Human systems
3. Artificial systems
• Artificial systems (or artifacts) exist because they have a function: they
serve as adaptive buffers between humans and nature
• Humans pursue the strategy of building artifacts to achieve goals
• Two kinds of artificial systems working in synergy
• Tanglible (e.g. roads, buildings)
• Intanglibe ( e.g. organisations, social structures)
SIMON’STHEORYOFARTIFACTS
ANDSOCIALCOMPLEXITY
(Cioffi-Revilla, 2014.)
38. • Large (and old) research field
• Two main areas of simulation
1. Variable-Oriented Models
• System Dynamics Models (e.g. modeling a nuclear plant)
• Queuing Models (e.g modeling how a box office line behaves)
2. Object-Oriented Models
• Cellular automate (e.g. Game of life: http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life,
http://pmav.eu/stuff/javascript-game-of-life-v3.1.1/)
• Agent based models (eg. Modeling the communication of a project
organisation of many individuals)
• Also, Evolutionary Models
SIMULATION
(Cioffi-Revilla, 2014.)
39. • 4 main areas of Computational Social Science
1. Big data and automatic information extraction
2. Social networks
3. Social complexity
4. Simulation
• Typically all of these working together
• CSS has a lot of problems, especially concerning privacy and ethics
• CSS is not a silver bullet and it does not replace other social science
fields or methods: Instead, CSS complements other research fields and
methods
SUMMARY
41. • Tracking and predicting how flu or other contagious diseases spread
• Based on network and social media analysis and modeling
• Many different variations, one of the first: Google Flu Trends, based on
flu related search queries
• For example:
• Achrekar, H.; Gandhe, A.; Lazarus, R.; Ssu-Hsin Yu; Benyuan Liu, 2011. Predicting Flu
Trends using Twitter data. Computer Communications Workshops (INFOCOM
WKSHPS), 2011 IEEE Conference on , vol., no., pp.702,707, 10-15 April 2011
MODELINGTHE SPREAD
OF DISEASESALREADYANEPIDEMOLOGYCLASSIC
43. • Leskovec, J.; Backstrom, L.; Kleinberg, J. 2009. Meme-tracking and the dynamics of
the news cycle. Proceedings of the 15th ACM ACM SIGKDD international conference
on Knowledge discovery and data mining, Pages 497-506 , 2009 - dl.acm.org
• Tracking new topics, ideas, and "memes" across the Web has been an issue of considerable interest.
Recent work has developed methods for tracking topic shifts over long time scales, as well as abrupt
spikes in the appearance of particular named entities. However, these approaches are less well suited to
the identification of content that spreads widely and then fades over time scales on the order of days -
the time scale at which we perceive news and events.
• We develop a framework for tracking short, distinctive phrases that travel relatively intact through on-line
text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad
class of memes that exhibit wide spread and rich variation on a daily basis.
MODELING NEWS CYCLE
DYNAMICS
44.
45. • Athanasiadis, I. N.; Mentes, A. K.; Mitkas, P. A.; Mylopoulos, Y. A. 2005. A Hybrid Agent-
Based Model for Estimating Residential Water Demand SIMULATION March 2005 81:
175-187, doi:10.1177/0037549705053172
• Picardi, C. and Saeed, K. 1979.The dynamics of water policy in southwestern Saudi
Arabia Anthony. SIMULATION, October 1979; vol. 33, 4: pp. 109-118.
SUSTAINABLE WATER
DEMAND MANAGEMENT
MODELING
46. • Venturini, T.; Laffite, N. B.; Cointet, J-P.; Gray, I.; Zabban, V.; De Pryck, K. 2014.Three
maps and three misunderstandings: A digital mapping of climate diplomacy. Big Data
& Society July-December 2014 1: 2053951714543804, first published on August 5, 2014
doi:10.1177/2053951714543804
CLIMATE DIPLOMACY
MAPPING
47. • Can electoral popularity be predicted using socially generated big
data? Information Technology. Volume 56, Issue 5, Pages 246–253,
ISSN (Online) 2196-7032, ISSN (Print) 1611-2776, DOI: 10.1515/itit-
2014-1046, September 2014
• Today, our more-than-ever digital lives leave significant footprints in cyberspace. Large scale collections
of these socially generated footprints, often known as big data, could help us to re-investigate different
aspects of our social collective behaviour in a quantitative framework. In this contribution we discuss one
such possibility: the monitoring and predicting of popularity dynamics of candidates and parties through
the analysis of socially generated data on the web during electoral campaigns. Such data offer
considerable possibility for improving our awareness of popularity dynamics. However they also suffer
from significant drawbacks in terms of representativeness and generalisability. In this paper we discuss
potential ways around such problems, suggesting the nature of different political systems and contexts
might lend differing levels of predictive power to certain types of data source. We offer an initial
exploratory test of these ideas, focussing on two data streams, Wikipedia page views and Google
search queries. On the basis of this data, we present popularity dynamics from real case examples of
recent elections in three different countries.
PREDICTING ELECTIONS?
48. • DIGIVAALIT 2015
• http://www.hiit.fi/digivaalit-2015
• Researching the parliamentary elections 2015 in Finland, focusing on
digital media data (Twitter, Facebook)
• Trying to understand how media is used and how public agenda is set
• CITIZEN MINDSCAPES
• http://challenge.helsinki.fi/blog/citizen-mindscapes-kansakunnan-
mielentila
• Diving deep into the unscoped virtual territories of a nation’s collective consciousness may reveal something remarkable. The
Finnish, hugely popular Suomi24 discussion forum has 1.9 million monthly visitors, who use the online town square to talk about
anything and everything close to their hearts. If this data could be harnessed into research use, what amazing things could we learn
about Finnish society? A team of media professionals at the forums owner company Aller and researchers at the National Consumer
Research Center plan to make use of this immense database.
DIGIVAALIT2015 & CITIZEN
MINDSCAPES
49. • Listen the “The Trust Engineers” podcast by Radiolab
• http://www.radiolab.org/story/trust-engineers/
• Think about and discuss different ethical research issues in relation to
what you heard
ETHICS
50. • Lazer, D. et al. 2009. Computational Social Science. Science. 6 February 2009: Vol. 323, no. 5915, pp.
721-723.
• Conte, R. 2012. Manifesto of Computational Social Science. The European Physical Journal Special
Topics. November 2012: Vol. 214, Issue 1, pp. 325-346.
• Anderson, C. 2008. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired.
http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
• Einav, L. and Levin, J. 2014. The Data Revolution and Economic Analysis. In Innovation Policy and the
Economy edited by Josh Lerner and Scott Stern. http://web.stanford.edu/~leinav/pubs/IPE2014.pdf
• King, G. 2011. Ensuring the Data-Rich Future of the Social Sciences. Science. 11 February 2011: Vol.
331 no. 6018 pp. 719-721.
• Wallach, H. 2014. Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and
Transparency. Medium.com. https://medium.com/@hannawallach/big-data-machine-learning-and-
thesocial-sciences-927a8e20460d
LECTURE 1 READING