10 Jahre Web Science
Ein Blick auf die nächsten 10 Jahre
http://www.webscience.org/webscience10/tv-channel-webscience10/
Steffen Staab
Chair of WSTNet
WAIS Univ. of
Southampton
WeST Univ. of Koblenz-
Landau
Wolfgang Nejdl
L3S
Leibniz Universität
Hannover
Nikolaus Forgó
L3S
Leibniz Universität
Hannover
World Wide Web
Work
Dating
17% marriages in US due to
online dating
Traveling
Learning
Leisure
Science
Open access papers cited
more often 11:7
https://flic.kr/p/F37KoU
Web Science Network of Laboratories
Wendy Hall - CeBIT 2013
10 Jahre Web Science Research
Initiative
Keynotes
• Ricardo Baeza-Yatest, Yahoo!
• Andrew Tomkins, Google
• Daniel Olmedilla, Facebook
• Jure Leskovec, Stanford & Pinterest
• Daniel Miller, UCL, ERC Grant
„Social Network Sites and Social
Science”
• Helen Margetts, Oxford Internet
Institute
Panels
10 Years of Web Science
Computational Social Science
Privacy and Internet Governance
8th ACM Web Science Conference 2016
22.-25. Mai 2016 in Hannover
http://websci17.org/
Troy, NY, USA, 26-18 June 2017
WWSSS – WSTNet Web Science Summer School
Koblenz 2016 St. Petersburg 2017
30/11/16 8Thomas Risse
Next one: St. Petersburg, July 2017
Web Science
the grand challenge Web Science Observatory
Researchers around the world
gathering and sharing data
and evidence
Sharing tools, methods and
techniques
Web Science Collaboratories
Longitudinal studies
Wendy Hall - CeBIT 2013
Spam
Attack on Copts
Gun running from Sudan
Verlieren wir die
Vergangenheit des Web?
ALEXANDRIA (ERC Advanced Grant, 2.5 Mio. Euro)
World Wide Web – Digitales Erbe der Gesellschaft
 Was bleibt vom Web in 100 / 1000 Jahren, wenn es
niemand bewahrt?
 Datensammlung durch Deutsche
Nationalbibliothek, British Library, Internet
Archive, u.a.
 Suche und Analyse durch ALEXANDRIA
 Entwicklung neuer Modelle und Algorithmen, die
es
ermöglichen, nicht nur auf die Gegenwart,
sondern auch auf die Vergangenheit des Web
zuzugreifen
Semantische und zeitliche Suche nach Rudolph Giuliani
1997
2000
2006
2014
Mayoral
Campaign
Mayoral
Campaign
Mayoral
Campaign 9/11
Post politics
endeavours
Senate,
Cancer,
Allegations
NumberofDocuments
Mayor
SoBigData - Social Mining & Big Data Ecosystem
Big Data Analytics & Social Mining
as a tool to measure,
understand and possibly
predict human behavior
Research infrastructure (RI) for
ethic-sensitive scientific
discoveries and advanced
applications of social data
mining to the various
dimensions of social life, as
recorded by “big data”.
Integrating key national infrastructures and
centers of excellence
CNR & Uni Pisa (SoBigData.it)
Social Data
Big Data Analytics and Social Mining Services
Uni Hannover/L3S (Alexandria)
German Web Archive (80 TB)
Services and expertise on Web Archives
Uni Sheffield (GATE Cloud)
Natural language processing and text mining
FhG IGD & FhG IAIS
Information Visualization and Visual Analytics
Aalto University
Data, services and competences on
social network analysis
Uni Tartu (E-Gov.data)
Estonian e-government and ehealth data
ETH Zürich:
Search engine for Open Data
1st Call SoBigData-funded Transnational Access
Forschungsaufenthalte (bis zu 2 Monate) bei SoBigData Partnern
zu den Themen:
* City of Citizens * Well-being and Economy
* Societal Debates * Migration Studies
Tracking User Behavior
About 75% of websites track user behavior across sites.
[Zhonghao Yu et al. WWW16]
Bias in the
Data
Bias in the
Algorithm
Bias in the
Social Machine
WebObservatory
Observing Bias in Social Networks
(Lerman et al 15)
Part of US election/Brexit misprediction?
Check out: http://www.kdnuggets.com/2016/07/big-data-bible-codes-bonferroni.html
„Torture the data, and it will confess to anything.“
Ronald Coase, economist, Nobel Prize Laureate.
Bonferroni Effect
DemocraZy
WashingtonPost
http://wpo.st/5WdH2
Reality Sensing, Mining and Augmentation for Mobile
Citizen-eGovernment Dialogue
Web for Everyone
Uber, the world‘s largest taxi company, owns no vehicles.
Facebook ...most popular media owner, creates no content.
Alibaba, the most valuable retailer, has no inventory.
Airbnb... largest accommodation provider, owns no real estate.
Data Oligopolists
Uber Whom do you take a ride with?
- the right picture – also for online dating ...
Facebook Which source do you trust?
- rumor checks change the trust ....
Alibaba Whom do you trust to buy from?
- others‘ ratings
Airbnb Whom do you want for a sleepover?
Vertrauen
Das Recht
27
Vor langer Zeit …
1984 won‘t be like 1984
Quelle: http://oldcomputers.net/macintosh.html
1981 (1987) – Volkszählung
Seither …
Computer überall
Trends
Cloud Mobile
Social Big
Trends
Gratismentalität Kontrollverlust
If the product is for free, you are the product
Zwei große Erzählungen
(1)
Digitale Agenda (2010 ff.)
Diagnose
30% of Europeans have still never used the internet;
Europe has only 1% penetration of fibre-based high-speed
networks whereas (Japan 12%, South Korea is at 15%)
EU spending on ICT research and development stands at only
40% of US levels.
Four times as many legal music downloads in the US as in the
EU
(2)
Europeans have a long tradition of declaring abstract
privacy rights in theory that they fail to enforce in practice.
Neuordnung des europäischen (Datenschutz)rechts
47
01/2012
Zentrale Versprechen
One Continent, one Law
Internetfit
Aber
Diskussion im Parlament
Albrecht: 350 Änderungsvorschläge
3.133 Änderungsanträge
Aber
Themen
Boundless Informant
Genie
XKeyScore
…
FAIRVIEW
Tempora
BULLRUN
Mail
Isolation
Control and
Tracking
PRISM
Und heute …
54
https://isc.sans.edu/diary/Port+7547+SOAP+Remote+Code+Execution+Attack+Against+DSL+Modems/21759
55
In particular, Austria is
experiencing a strong increase in
TR-069 traffic within the last 24
hours.
Ergebnis
04/2016
Broken Law
Delay
Unclarity/
Complexity
Speed
Irrelevance
Fragmentation
Web Science – The next 10 Years
Social challenges
 Discrimination
 Trust
 Moral AI
Legal challenges
 regulation of
infrastructure
for economic competition
 tracking everywhere
Political challenges
 Misinformation
 Participation
 Internet governance
Technical challenges
 Artificial Intelligence
 Security
 ...

10 Jahre Web Science

Editor's Notes

  • #4 Gestern fielen hundertausende Internet Anschluesse aus – heute waren mehrere Erfahrungsberichte wie Leute nicht mehr wussten, was sie tun sollten For example, in the US it is now reported that between 15-20% of newly married couples met their spouses on line (cf. http://www.statisticbrain.com/online-dating-statistics/). https://www.timeshighereducation.com/home/open-access-papers-gain-more-traffic-and-citations/2014850.article
  • #16 Out of 350K dierent sites visited by 200K users over a 7 day period, 273K sites contained trackers that were sending information that we deemed unsafe. Data elements that are only and always sent by a single user, or a reduced set of users, are considered unsafe with regard to privacy. 50% of news site carry at least 11 different trackers
  • #19 - The majority of your friends in facebook have more friends than you do here: a. the majority of your friends are colored b. the majority of your friends are non-colored (same network) Practical example might be media biases
  • #20 Big Data presents opportunities for data mining and machine learning previously unimaginable, given the vast size of datasets from which we are able to learn, cluster upon, find associations within, and generally search for insights not before attainable. Mining Big Data is not a plug-and-play, one-size-fit-all, (insert another cliche here) process, however; though there seems to be alarmingly little discussion anymore of their importance in relation to Big Data, statistical thinking, methods, and processes matter. It is possible that the lack of discussion is because most people understand this fundamental truth already, which I find doubtful. Perhaps I simply have not come across relevant such topics of late, and they do, in fact, exist. I also find this doubtful. I fear that oversight or an essential lack of understanding are more likely to blame. Big Data This article is not a blanket criticism of learning from Big Data; instead, it is much more accurately a reminder that time-tested statistical methods are more valid now than ever, in this Era of Big Data. In that regard, this discussion will focus on 2 particular statistical issues to be on the look out for in your own work and in the work of others mining and learning from Big Data. And for the practitioners out there, this is not about abstract statistical theory. This is about practicality. And the highly improbable probabilities that can be improperly gleaned from Big Data. The Bonferroni Principle There is a concept in statistics that goes like this: even in completely random datasets, you can expect particular events of interest to occur, and to occur in increasing numbers as the amount of data grows. These occurrences are nothing more than collections of random features that appear to be instances of interest, but are not. This bears repeating: even amounts of random data lead to what seem to be events of interest, and the number of these seemingly interesting events grows as does the size of the dataset. The Bonferroni Principle1 is a statistical method for accounting for these random events. To employ it, determine the number of expected random events of interest in the dataset, and if the observed number is significantly greater than this number, the chances of any observations providing useful insight are almost nonexistent. The Bonferroni Correction is a technique for helping to avoid such observations. Torture the data, and it will confess to anything. — Ronald Coase, economist, Nobel Prize Laureate. One of the most prominent and easy to understand examples of the Bonferroni Principle is that of the George W. Bush administration's Total Information Awareness data-collection and data mining plan of 20021. The criticism of the plan's effectiveness, and its relationship to the Bonferroni Principal, are as follows. Suppose we are looking for terrorists, from a potential pool made up of a very large number of individuals. Let's say that, in actuality, however, there is an incredibly small number of individuals who are terrorists. Now suppose these potential terrorists are thought to be deliberately visiting particular locations in pairs for meetings, but let's further suppose that these potential terrorists are actually non-terrorists moving about randomly. By using hard numbers for such a scenario and working out the probabilities, Rajaraman & Ullman gives the example of one billion potential "evil-doers," and though the actual number may be something very small (they give the example of 10 pairs), statistical probabilities could put the number of suspected pairs meeting at given locations due to pure randomness at 250,000 (again, in this particular example). Now, this is clearly a problem. In purely practical terms, imagine having to recruit, train, and pay enough police personnel to investigate each of these flagged individuals! If a Big Data mining practitioner had first computed some number which could be proven a reasonable number of expected random events (the Bonferroni Principle in action), the entire investigation would have been immediately recognized as flawed, given the near-absolute certainty that this Bonferroni number would have been less than a quarter of a million, the suspected number of significant events shown above. Knowing when our out-of-the-gate quantitative assumptions are off base is critically useful in the Era of Big Data. The Bonferroni Principle is one example of how Big Data can result in highly unlikely outcomes masquerading as statistically sound.