The document discusses the need for web science to study the social web. It provides three key points:
1) The social web has transformed global communication but remains surprisingly unstudied as an entity. Understanding its complex emergent phenomena requires new analytical methods.
2) Simple micro-level rules of the web infrastructure give rise to complex macro-level behaviors that must be analyzed differently than traditional computer science. Properties like trust and reliability need new models.
3) Understanding socio-cultural factors is important for engineering future social machines and resolving conflicts between openness versus security/privacy requirements from different parts of society. New theoretical and computational methods are needed to effectively study the web at large scales.
Online Data Preprocessing: A Case Study ApproachIJECEIAES
Besides the Internet search facility and e-mails, social networking is now one of the three best uses of the Internet. A tremendous number of volunteers every day write articles, share photos, videos and links at a scope and scale never imagined before. However, because social network data are huge and come from heterogeneous sources, the data are highly susceptible to inconsistency, redundancy, noise, and loss. For data scientists, preparing the data and getting it into a standard format is critical because the quality of data is going to directly affect the performance of mining algorithms that are going to be applied next. Low-quality data will certainly limit the analysis and lower the quality of mining results. To this end, the goal of this study is to provide an overview of the different phases involved in data preprocessing, with a focus on social network data. As a case study, we will show how we applied preprocessing to the data that we collected for the Malaysian Flight MH370 that disappeared in 2014.
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Paul Gilbreath
Source: http://www.helioteixeira.org/ How to use Collective Intelligence techniques to ensure that your web application can extract valuable data from its usage and deliver that value right back to the users. (MODULE 1)
Video of the talk: https://www.youtube.com/watch?v=7k-u_TUew3o
Abstract: Social media has experienced immense growth in recent times. These platforms are becoming increasingly common for information seeking and consumption, and as part of its growing popularity, information overload pose a significant challenge to users. For instance, Twitter alone generates around 500 million tweets per day and it is impractical for users to have to parse through such an enormous stream to find information that are interesting to them. This situation necessitates efficient personalized filtering mechanisms for users to consume relevant, interesting information from social media.
Building a personalized filtering system involves understanding users interests and utilizing these interests to deliver relevant information to users. These tasks primarily include analyzing and processing social media text which is challenging due to its shortness in length, and the real-time nature of the medium. The challenges include: (1) Lack of semantic context: Social Media posts are on an average short in length, which provides limited semantic context to perform textual analysis. This is particularly detrimental for topic identification which is a necessary task for mining users interests; (2) Dynamically changing vocabulary: Most social media websites such as Twitter and Facebook generate posts that are of current (timely) interests to the users. Due to this real-time nature, information relevant to dynamic topics of interest evolve reflecting the changes in the real world. This in turn changes the vocabulary associated with these dynamic topics of interest making it harder to filter relevant information; (3) Scalability: The number of users on social media platforms are significantly large, which is difficult for centralized systems to scale to deliver relevant information to users. This dissertation is devoted to exploring semantic techniques and Semantic Web technologies to address the above mentioned challenges in building a personalized information filtering system for social media. Particularly, the necessary semantics (knowledge) is derived from crowd sourced knowledge bases such as Wikipedia to improve context for understanding short-text and dynamic topics on social media.
Online Data Preprocessing: A Case Study ApproachIJECEIAES
Besides the Internet search facility and e-mails, social networking is now one of the three best uses of the Internet. A tremendous number of volunteers every day write articles, share photos, videos and links at a scope and scale never imagined before. However, because social network data are huge and come from heterogeneous sources, the data are highly susceptible to inconsistency, redundancy, noise, and loss. For data scientists, preparing the data and getting it into a standard format is critical because the quality of data is going to directly affect the performance of mining algorithms that are going to be applied next. Low-quality data will certainly limit the analysis and lower the quality of mining results. To this end, the goal of this study is to provide an overview of the different phases involved in data preprocessing, with a focus on social network data. As a case study, we will show how we applied preprocessing to the data that we collected for the Malaysian Flight MH370 that disappeared in 2014.
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Paul Gilbreath
Source: http://www.helioteixeira.org/ How to use Collective Intelligence techniques to ensure that your web application can extract valuable data from its usage and deliver that value right back to the users. (MODULE 1)
Video of the talk: https://www.youtube.com/watch?v=7k-u_TUew3o
Abstract: Social media has experienced immense growth in recent times. These platforms are becoming increasingly common for information seeking and consumption, and as part of its growing popularity, information overload pose a significant challenge to users. For instance, Twitter alone generates around 500 million tweets per day and it is impractical for users to have to parse through such an enormous stream to find information that are interesting to them. This situation necessitates efficient personalized filtering mechanisms for users to consume relevant, interesting information from social media.
Building a personalized filtering system involves understanding users interests and utilizing these interests to deliver relevant information to users. These tasks primarily include analyzing and processing social media text which is challenging due to its shortness in length, and the real-time nature of the medium. The challenges include: (1) Lack of semantic context: Social Media posts are on an average short in length, which provides limited semantic context to perform textual analysis. This is particularly detrimental for topic identification which is a necessary task for mining users interests; (2) Dynamically changing vocabulary: Most social media websites such as Twitter and Facebook generate posts that are of current (timely) interests to the users. Due to this real-time nature, information relevant to dynamic topics of interest evolve reflecting the changes in the real world. This in turn changes the vocabulary associated with these dynamic topics of interest making it harder to filter relevant information; (3) Scalability: The number of users on social media platforms are significantly large, which is difficult for centralized systems to scale to deliver relevant information to users. This dissertation is devoted to exploring semantic techniques and Semantic Web technologies to address the above mentioned challenges in building a personalized information filtering system for social media. Particularly, the necessary semantics (knowledge) is derived from crowd sourced knowledge bases such as Wikipedia to improve context for understanding short-text and dynamic topics on social media.
Digital FDLP Louisiana GODORT 2012 slides+notesJames Jacobs
Keynote talk at the Spring 2012 meeting of the Louisiana Government Documents Round Table (LA GODORT) in Shreveport, LA Friday March 23, 2012.
The last slide includes a list of citations for further reading.
Blind Spots and Broken Links: Access to Government InformationJames Jacobs
Panel presentation given by James R. Jacobs as part of a program at American Library Association's 2015 annual conference set up by the Federal & Armed Forces Libraries Round Table (FAFLRT). The program, "Open Government: Current Trends and Practices Concerning FOIA, Open Access, and Other Post-Wiki-Leaks Issues" also featured Anneliese Taylor, Assistant Director of Scholarly Communications & Collections at UCSF, who gave an in-depth and very interesting presentation on open access and the OSTP directive on "Expanding Public Access to the Results of Federally Funded Research"
Networks, swarms and policy: what collective intelligence means for policy ma...Alberto Cottica
Policy makers are taking up network thinking; citizens are self-organizing in smart swarms displaying collectivley intelligent behaviour. I address the implications of these phenomena for policy making, and look at some tools being built by a project called CATALYST that might help both citizens and policy makers.
Digital FDLP Louisiana GODORT 2012 slides+notesJames Jacobs
Keynote talk at the Spring 2012 meeting of the Louisiana Government Documents Round Table (LA GODORT) in Shreveport, LA Friday March 23, 2012.
The last slide includes a list of citations for further reading.
Blind Spots and Broken Links: Access to Government InformationJames Jacobs
Panel presentation given by James R. Jacobs as part of a program at American Library Association's 2015 annual conference set up by the Federal & Armed Forces Libraries Round Table (FAFLRT). The program, "Open Government: Current Trends and Practices Concerning FOIA, Open Access, and Other Post-Wiki-Leaks Issues" also featured Anneliese Taylor, Assistant Director of Scholarly Communications & Collections at UCSF, who gave an in-depth and very interesting presentation on open access and the OSTP directive on "Expanding Public Access to the Results of Federally Funded Research"
Networks, swarms and policy: what collective intelligence means for policy ma...Alberto Cottica
Policy makers are taking up network thinking; citizens are self-organizing in smart swarms displaying collectivley intelligent behaviour. I address the implications of these phenomena for policy making, and look at some tools being built by a project called CATALYST that might help both citizens and policy makers.
The Semantic Web is a vision of information that is understandable by computers. Although there is great exploitable potential, we are still in "Generation Zero'' of the Semantic Web, since there are few real-world compelling applications. The heterogeneity, the volume of data and the lack of standards are problems that could be addressed through some nature inspired methods. The paper presents the most important aspects of the Semantic Web, as well as its biggest issues; it then describes some methods inspired from nature - genetic algorithms, artificial neural networks, swarm intelligence, and the way these techniques can be used to deal with Semantic Web problems.
Birds Bears and Bs:Optimal SEO for Today's Search EnginesMarianne Sweeny
In February of 2012, Google began launching the Panda Update (bears), the first of many steps away from a link-based model of relevance to a user experience model of relevance. This bearish focus on relevance use algorithms to determine a positive user experience focused on click-through (does the user select the result), bounce rate (does the user take action once they arrive at the landing page) and conversion (does the landing page satisfy the user’s information need). Content and information design became the foundation for relevance. Sadly, no one at Google told the content strategists, user experience professionals and information architects about their new influence on search engine performance. In April of 2012, Google followed up with the Penguin update (birds), a direct assault on link building, a mainstay of traditional search engine optimization (SEO). The Penguin algorithm evaluates the context and quality of links pointing to a site. Website found to be “over optimized” with low quality links are removed from Google’s index. Matt Cutts, GOogle Webmaster and the public face of Google, summed this up best: “And so that’s the sort of thing where we try to make the web site, uh Google Bot smarter, we try to make our relevance more adaptive so that people don’t do SEO, we handle that...” Sadly, Google is short on detail about how they are handling SEO, what constitutes adaptive relevance and how user experience professionals, information architects and content strategists can contribute thought-processing biped wisdom to computational algorithmic adaptive relevance so that searchers find what they are looking for even when they do not know that that is. This presentation will provide a brief introduction to the inner workings of information retrieval, the foundation of all search engines, even Google. On this foundation, I will dive deep into the Bs of how to optimize Web sites for today’s search technology: Be focused, Be authoritative, Be contextual and Be engaging. Birds (Penguin), Bears (Panda) & Bees: Optimal SEO will provide insight into recent search engine changes, proscriptive optimization guidance for usability and content strategy and foresight into the future direction of search.
The Rijksmuseum Collection as Linked DataLora Aroyo
Presentation at ISWC2018: http://iswc2018.semanticweb.org/sessions/the-rijksmuseum-collection-as-linked-data/ of our paper published originally in the Semantic Web Journal: http://www.semantic-web-journal.net/content/rijksmuseum-collection-linked-data-2
Many museums are currently providing online access to their collections. The state of the art research in the last decade shows that it is beneficial for institutions to provide their datasets as Linked Data in order to achieve easy cross-referencing, interlinking and integration. In this paper, we present the Rijksmuseum linked dataset (accessible at http://datahub.io/dataset/rijksmuseum), along with collection and vocabulary statistics, as well as lessons learned from the process of converting the collection to Linked Data. The version of March 2016 contains over 350,000 objects, including detailed descriptions and high-quality images released under a public domain license.
FAIRview: Responsible Video Summarization @NYCML'18Lora Aroyo
Presentation at the NYC Media Lab (NYCML2018). There is a growing demand for news videos online, with more consumers preferring to watch the news than read or listen to it. On the publisher side, there is a growing effort to use video summarization technology in order to create easy-to-consume previews (trailers) for different types of broadcast programs. How can we measure the quality of video summaries and their potential to misinform? This workshop will inform participants about automatic video summarization algorithms and how to produce more “representative” video summaries. The research presented is from the FAIRview project and is supported by the Digital News Innovation Fund (DNI Fund), which is part of the Google News Initiative.
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...Lora Aroyo
Lora Aroyo, Chiel van den Akker, Marnix van Berchum, Lodewijk
Petram, Gerard Kuys, Tommaso Caselli, Jacco van Ossenbruggen, Victor de Boer, Sabrina Sauer, Berber Hagedoorn
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Lora Aroyo
The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to the volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, this assumption often creates issues in practice. Previous experiments we performed found that inter-annotator disagreement is usually never captured, either because the number of annotators is too small to capture the full diversity of opinion, or because the crowd data is aggregated with metrics that enforce consensus, such as majority vote. These practices create artificial data that is neither general nor reflects the ambiguity inherent in the data.
To address these issues, we proposed the method for crowdsourcing ground truth by harnessing inter-annotator disagreement. We present an alternative approach for crowdsourcing ground truth data that, instead of enforcing an agreement between annotators, captures the ambiguity inherent in semantic annotation through the use of disagreement-aware metrics for aggregating crowdsourcing responses. Based on this principle, we have implemented the CrowdTruth framework for machine-human computation, that first introduced the disagreement-aware metrics and built a pipeline to process crowdsourcing data with these metrics.
In this paper, we apply the CrowdTruth methodology to collect data over a set of diverse tasks: medical relation extraction, Twitter event identification, news event extraction and sound interpretation. We prove that capturing disagreement is essential for acquiring a high-quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with a majority vote, a method which enforces consensus among annotators. By applying our analysis over a set of diverse tasks we show that, even though ambiguity manifests differently depending on the task, our theory of inter-annotator disagreement as a property of ambiguity is generalizable.
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
Ambiguity in interpreting signs is not a new idea, yet the vast majority of research in machine interpretation of signals such as speech, language, images, video, audio, etc., tend to ignore ambiguity. This is evidenced by the fact that metrics for quality of machine understanding rely on a ground truth, in which each instance (a sentence, a photo, a sound clip, etc) is assigned a discrete label, or set of labels, and the machine’s prediction for that instance is compared to the label to determine if it is correct. This determination yields the familiar precision, recall, accuracy, and f-measure metrics, but clearly presupposes that this determination can be made. CrowdTruth is a form of collective intelligence based on a vector representation that accommodates diverse interpretation perspectives and encourages human annotators to disagree with each other, in order to expose latent elements such as ambiguity and worker quality. In other words, CrowdTruth assumes that when annotators disagree on how to label an example, it is because the example is ambiguous, the worker isn’t doing the right thing, or the task itself is not clear. In previous work on CrowdTruth, the focus was on how the disagreement signals from low quality workers and from unclear tasks can be isolated. Recently, we observed that disagreement can also signal ambiguity. The basic hypothesis is that, if workers disagree on the correct label for an example, then it will be more difficult for a machine to classify that example. The elaborate data analysis to determine if the source of the disagreement is ambiguity supports our intuition that low clarity signals ambiguity, while high clarity sentences quite obviously express one or more of the target relations. In this talk I will share the experiences and lessons learned on the path to understanding diversity in human interpretation and the ways to capture it as ground truth to enable machines to deal with such diversity.
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
Software systems are becoming ever more intelligent and more useful, but the way we interact with these machines too often reveals that they don’t actually understand people. Knowledge Representation and Semantic Web focus on the scientific challenges involved in providing human knowledge in machine-readable form. However, we observe that various types of human knowledge cannot yet be captured by machines, especially when dealing with wide ranges of real-world tasks and contexts. The key scientific challenge is to provide an approach to capturing human knowledge in a way that is scalable and adequate to real-world needs. Human Computation has begun to scientifically study how human intelligence at scale can be used to methodologically improve machine-based knowledge and data management. My research is focusing on understanding human computation for improving how machine-based systems can acquire, capture and harness human knowledge and thus become even more intelligent. In this talk I will show how the CrowdTruth framework (http://crowdtruth.org) facilitates data collection, processing and analytics of human computation knowledge.
Some project links:
- http://controcurator.org/
- http://crowdtruth.org/
- http://diveproject.beeldengeluid.nl/
- http://vu-amsterdam-web-media-group.github.io/linkflows/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
1. Social Web
Lecture VI
How can we STUDY the Social Web?: The Web Science
Lora Aroyo
The Network Institute
VU University Amsterdam
(based on slides from Les Carr, Nigel Shadbolt)
Monday, March 11, 13
2. The Web
the most used and one of the most transformative
applications in the history of computing, e.g. how the
Social Web has transformed the world's
communication
approximately 1010 people
more than 1011 web documents
Monday, March 11, 13
3. Web is NOT a Thing
• it’s not a verb, or a
noun
• it’s a performance, not
an object
• co-constructed with
society
• activity of individuals
who create interlinked
content that both
reflect and reinforce
the interlinkedness of
society and social
interaction ... and a record of
that performance
Monday, March 11, 13
4. The Web
Great success as a technology,
it’s built on significant computing infrastructure,
but
as an entity surprisingly unstudied
Monday, March 11, 13
5. Science & Engineering
• physical science: analytic discipline to find laws
that generate or explain observed phenomena
• CS is mainly synthetic: formalisms & algorithms
are created to support specific desired
behaviors
• Web Science: web needs to be studied &
understood as a phenomenon but also to be
engineered for future growth and capabilities
Monday, March 11, 13
6. Simple micro rules give
rise to complex macro
phenomena
• at microscale an infrastructure of artificial languages and
protocols: a piece of engineering
• however, interaction of people creating, linking and
consuming information generates web's behavior as
emergent properties at macroscale
• properties require new analytic methods to be
understood
• some properties are desirable and are to be engineered
in, others are undesirable and if possible engineered out
Monday, March 11, 13
7. A new way of software
development
• software applications designed based on appropriate
technology (algorithm, design) and with envisioned
'social' construct
• usually tested in the small, testing microscale properties
• a macrosystem evolving from people using the
microsystem and interacting in often unpredicted ways, is
far more interesting and must be analyzed in different
ways
• also the macrosystems exhibit challenges that do not
exist at microscale
Monday, March 11, 13
8. Evolution of Search
Engines
1: techniques designed to rank documents
2: people were gaming to influence algorithms &
improve their search rank
3: adapt search technologies to defeat this influence
Monday, March 11, 13
9. The Web Graph
• to understand the web, in good
CS tradition, we look at the graph
• nodes are web pages (HTML)
• edges are hypertext links
between nodes
• first analysis shows that in-degree
and out-degree follow power law
distribution => shown to hold for
large samples
• this gave insight into the growth of
the web
Monday, March 11, 13
10. Search Algorithms
• the Web graph also as
basis of algorithms for
search engines:
• HITS or PageRank
assume that inserting
a hyperlink symbolizes
an endorsement of
authority of the page
linked to
Monday, March 11, 13
11. User State is Important
• the original Web graph is too simple, starts from quasi static
HTML
• for personalization or customization different representations
(of sources) may be served to different requesters, e.g. cookies
• graph based models often do not account for this sort of user-
dependent state, and not fit for all the information behind the
servers, in Deep Web
• it’s not a simple HTTP-GET anymore (but HTTP-POST or
HTTP-GET with complex URI) that is the basis for defining
nodes in the graph
• URis that carry user state are heavily used in Web applications,
but are not in the model and largely unanalyzed
Monday, March 11, 13
12. According to Google
each day 20-25% of searches have not been seen before, i.e.
generate a new identifier
thus a new node in the graph
more than 20 million new links per day, 200 per second
do they follow the same power laws & growth models?
Monday, March 11, 13
13. validating such models is hard
According to Google
exponential growth of content
changes in number & power of servers
each day 20-25% of searches have not been seen before, i.e.
increasing adiversity in users
generate new identifier
thus a new node in the graph
more than 20 million new links per day, 200 per second
do they follow the same power laws & growth models?
Monday, March 11, 13
14. Social Web Sites
• modern websites (on the social web)
• have large script systems running in browser
• store personal information
many Social Web sites are not part of the (open) graph model
do these systems show a similar behavior? (macro)
are they stable? are they fair?
do they need to be regulated?
are the access restrictions, for personal
information, assured?
there is a need for understanding and intervening/engineering
Monday, March 11, 13
15. Wikipedia
• purely mathematical (technology-based) models do not capture the
whole story
• the Wikipedia structure (link labels) shows a Zipf-like distribution
just like other tag-based systems
• Wikipedia is built on MediaWiki software
• but other MediaWiki-based applications did not generate such
significant use
• the pure 'technological' explanation cannot explain it
• must be related to the 'social model' of how Wikipedia is
organized
this is referred to as the dynamics of a 'social machine' (already in TBL’s original vision of WWW)
Monday, March 11, 13
16. Collective Intelligence
• why do people contribute?
• how to maintain the connected
content?
• how are trust & provenance
represented, maintained and
repaired on the Web?
Monday, March 11, 13
17. Collective Intelligence
Motivation Example Mean
Fun “Writing in Wikipedia is fun” 6.10
Ideology “I think information should be free” 5.59
Values “I feel it’s important to help others” 3.96
Understanding “Writing in Wikipedia allows me to gain a new perspective on things” 3.92
Enhancement “Writing in Wikipedia makes me feel needed” 2.97
Protective “By writing in Wikipedia I feel less lonely” 1.97
Career “I can make new contacts that might help my career” 1.67
Social “People I am close to want me to write in Wikipedia” 1.51
Monday, March 11, 13
18. Social Machines
• today's interactive applications are very early
social machines limited by being largely isolated from
one another
• more effective social machines can be expected
• social processes in society interlink, so they
should also interlink on the web
• technology needed to allow user communities to
construct, share & adapt social machines to get
success through trial, use & refinement
Monday, March 11, 13
19. Next Generation
Social Machines
• what are fundamental theoretical properties of social
machines, what algorithms are needed to create them?
• what underlying architectural principles a needed to
effectively engineer new web components for this social
software?
• how can we extend current web infrastructure with
mechanisms that make the social properties of information
sharing explicit and conform to relevant social-policy
expectations?
• how do cultural differences affect development and use of
social mechanisms?
Monday, March 11, 13
20. Modeling the Social
Machines
• trustworthiness, reliability or silent expectations about
use of information
• privacy, copyright, legal rules
• we lack structures for formally representing &
reasoning over such properties
• thus, without scalable models for these issues it is
hard to help the web go in the best possible
direction
Monday, March 11, 13
23. Web Science is about
additionality
not the union of
disciplines, but
intersection
Monday, March 11, 13
24. Society is Diverse
different parts of society have different objectives and hence incompatible
Web requirements, e.g. openness, security, transparency, privacy
Monday, March 11, 13
25. Understanding the
Socio-Cultural
• POWER DISTANCE: The extent to which
power is distributed equally within a society
and the degree that society accepts this
distribution.
• UNCERTAINTY AVOIDANCE: The degree to
which individuals require set boundaries and
clear structures
• INDIVIDUALISM vs COLLECTIVISM: The degree
to which individuals base their actions on self-
interest versus the interests of the group.
• MASCULINITY vs FEMININITY: A measure of a
society's goal orientation
• TIME ORIENTATION: The degree to which a
society does or does not value long-term
commitments and respect for tradition.
Monday, March 11, 13
26. Understanding the
variation
• Ecology of the Web - structure
of the environment, producers
and consumers
• Populations (individuals and
species), traits/characteristics,
heredity, genotypes and
phenotypes
• Mechanisms - variation
(mutation, migration, genetic
drift), selection
• Outcomes - adaption, co-
evolution, competition, co-
operation, speciation,
extinction
Monday, March 11, 13
27. Understanding the
variation
• Ecology of the Web - structure
of the environment, producers
and consumers
• Populations (individuals and
species), traits/characteristics,
heredity, genotypes and
phenotypes
• Mechanisms - variation
(mutation, migration, HGT,
genetic drift), selection
• Outcomes - adaption, co-
evolution, competition, co-
operation, speciation,
extinction
Monday, March 11, 13
28. Understanding the
variation
• Ecology of the Web - structure
of the environment, producers
and consumers
• Populations (individuals and
species), traits/characteristics,
heredity, genotypes and
phenotypes
• Mechanisms - variation
(mutation, migration, HGT,
genetic drift), selection
• Outcomes - adaption, co-
evolution, competition, co-
operation, speciation,
extinction
Monday, March 11, 13
29. Understanding the
variation
• Ecology of the Web - structure
of the environment, producers
and consumers
• Populations (individuals and
species), traits/characteristics,
heredity, genotypes and
phenotypes
• Mechanisms - variation
(mutation, migration, HGT,
genetic drift), selection
• Outcomes - adaption, co-
evolution, competition, co-
operation, speciation,
extinction
Monday, March 11, 13
30. but
How to do the Science?
Monday, March 11, 13
31. it’s relationships, stupid!
not attributes
All the world's a net
by David Cohen
April, 2002 May, 2007
Monday, March 11, 13
32. • Leveraging recent advances in:
• Theories: about the social motivations for
creating, maintaining, dissolving and re-creating
links in multidimensional networks and about
emergence of macro-structures
• Data: Semantic Web/Web 2.0 provide the
technological capability to capture, store, merge,
and query relational metadata needed to more
effectively understand and enable communities
• Methods: qualitative and quantitative
methods to enable theoretically grounded
network predictions
• Computational infrastructure: Cloud
computing and petascale applications are
critical to face the computational challenges in
analyzing the data
Monday, March 11, 13
33. Network
Analysis
• is about linking social actors,
e.g. systematically
understanding and identifying
connections
• by using empirical data
• draws on graphic imagery
• relies on mathematical/
computational models
• Jacob Moreno - one of the
founders of social network
analysis; some of the earliest
graphical depictions of social
networks (1933)
Monday, March 11, 13
34. Think Networks!
Albert-László Barabási: Linked:The New Science of Networks
• everything is connected to everything else
• networks are pervasive - from the human brain
to the Internet to the economy to our group of
friends
• following underlying order and follow simple laws
• "new cartographers" are mapping networks in a
wide range of scientific disciplines
• social networks, corporations, and cells are more
similar than they are different
• new insights into the interconnected world
• new insights on robustness of the Internet, spread
of fads and viruses, even the future of democracy.
April, 2002
Monday, March 11, 13
36. Networks:
another perspective :-)
• Social Networks: It’s not what you
know, it’s who you know
• Cognitive Social Networks: It’s not
who you know, it’s who they think you know.
• Knowledge Networks: It’s not what
you know, it’s what they think you know
Monday, March 11, 13
37. Big Data Owners
Who can do macro analysis?
•Google, Bing,Yahoo!, Baidu
•Large scale, comprehensive data
•New forms of research alliance
How Billions of Trivial Data Points can Lead to
Understanding
Monday, March 11, 13
39. Open Data
• common standards for release of
public data
• common terms for data where
necessary
• licenses - CC variants
• exploitation & publication of
distributed and decentralized
information assets
Monday, March 11, 13
43. Web Science
Reflections
Is the Web changing faster than our ability to observe it?
How to measure or instrument the Web?
How to identify behaviors and patterns?
How to analyze the changing structure of the Web?
Monday, March 11, 13
44. Big Bang:
Web Information
• assumption of the open exchange of
information is being imposed on the society
• is the Web, open access, open data and
scientific and creative commons offer a
beneficial opportunity or dangerous cul-de-
sac?
Monday, March 11, 13
45. Open Questions
• How is the world changing as other parts of society
impose their requirements on the Web?, e.g. current
examples with SOTA/PIPA, ACTA requirements for
security and policing taking over free exchange of
information, unrestricted transfer of knowledge
• Are the public and open aspects of the Web a
fundamental change in society’s information
processes, or just a temporary glitch?, e.g. are open
source, open access, open science & creative commons
efficient alternatives to free-based knowledge transfer?
Monday, March 11, 13
46. Open Questions
• do we take Web for granted as provider of a free
and unrestricted information exchange?
• is Web Science the response to the pressure for the
Web to change - to respond to the issues of
security, commerce, criminality and privacy?
• What are the challenges for Web science?
•to explain how the Web impacts society?
•to predict the outcomes of proposed changes
to Web infrastructure on business & society?
Monday, March 11, 13
47. What can you do as a
Computer Scientist?
specifically for the Social Web
Monday, March 11, 13
48. Hands-on Teaser
• Q&A on Assignments
• Pitch of the Social Web Apps
image source: http://www.flickr.com/photos/bionicteaching/1375254387/
Monday, March 11, 13