1. SoBigData is a proposed research infrastructure integrating existing national infrastructures for big data analytics and social mining across Europe.
2. It involves 12 participating organizations from 8 countries and aims to provide researchers access to skills, data, tools, and services for cutting-edge social mining experiments through 2019.
3. The proposal seeks funding through Horizon 2020 to establish a networked, virtual ecosystem for big data analysis and social data mining across Europe.
1. SoBigData
A Multidisciplinary Research Infrastructure for
Data Scientists on Big Social Data
Mark Coté – Department of Digital Humanities,
King’s College London
November 2015
SoBigData
Social Mining & Big Data Ecosystem
Proposal of a Research Infrastructure within the Horizon 2020 Call INFRAIA-1-2014-2015
Integrating and opening research infrastructures of European interest
Mathematics and ICT - Starting Communities
Distributed, multidisciplinary European infrastructure on Big Data and social data mining
Coordinator: Fosca GIANNOTTI, ISTI-CNR, Pisa, Italy fosca.giannotti@isti.cnr.it
Participant No Participant organisation name Country
1 - CNR Consiglio Nazionale delle Ricerche (PI: Fosca Giannotti) Italy
2 - USFD The University of Sheffield (PI: Hamish Cunningham) UK
3 - UNIPI Università di Pisa (PI: Dino Pedreschi) Italy
4 - FRH Fraunhofer IAIS and IGD (PI: Gennady Andrienko) Germany
5 - UT Tartu Ulikool (PI: Marlon Dumas) Estonia
6 - IMT Scuola IMT (Istituzioni, Mercati, Tecnologie) Lucca (PI:
Guido Caldarelli)
Italy
7 - LUH Gottfried Wilhelm Leibniz Universitaet Hannover (PI:
Wolfgang Nejdl)
Germany
8 - KCL King’s College London (PI: Tobias Blanke) UK
9 - SNS Scuola Normale Superiore di Pisa (PI: Fabrizio Lillo) Italy
10 - AALTO Aalto University (PI: Aristides Gionis) Finland
11 - ETHZ ETH Zurich (PI: Dirk Helbing) Switzerland
12 - TUDelft Technische Universiteit Delft (PI: Jeroen Van Den Hoven) Netherlands
2. SoBigData - A Research
Infrastructure for Data Scientists
• September 2015 – August 2019
• H2020 - Integrating Activity
• A research infrastructure for ethically sensitive scientific discoveries and
advanced applications of social data mining to the various dimensions of
social life, as recorded by ‘big data’.
• Multi-disciplinary including digital humanities and social sciences.
Primary stakeholders:
Big data analysts and social informatics researchers, who want to enhance their
algorithms to deal with social data, gain multi-disciplinary research skills, harmonise
existing data and analytics infrastructures, and engage other research communities in
the development of these key enabling technologies for the future digital economy and
society;
Economists, social science and humanities researchers, journalists, policy and
law makers, who have to analyse the avalanche of (big) social data, in order to gain
insight and actionable knowledge.
3. 1 – CNR: Consiglio Nazionale delle Ricerche (PI: Fosca Giannotti) Italy
2 – USFD: Sheffield Uni (PIs: Hamish Cunningham & Kalina Bontcheva) UK
3 – UNIPI: Università di Pisa (PI: Dino Pedreschi) Italy
4 – FRH: Fraunhofer IAIS and IGD (PI: Gennady Andrienko) Germany
5 – UT: Tartu Ulikool (PI: Marlon Dumas) Estonia
6 – IMT Scuola Istituzioni Mercati Tecnologie Lucca (PI: Guido Caldarelli) Italy
7 – LUH: Leibniz Universitaet Hannover (PI: Wolfgang Nejdl) Germany
8 – KCL: King’s College London (PI: Tobias Blanke) UK
9 – SNS: Scuola Normale Superiore di Pisa (PI: Fabrizio Lillo) Italy
10 – AALTO: Aalto University (PI: Aristides Gionis) Finland
11 – ETHZ: ETH Zurich (PI: Dirk Helbing) Switzerland
12 – TUDelft: Technische Universiteit Delft (PI: Jeroen Van Den Hoven)
Netherlands
SoBigData
Social Mining & Big Data Ecosystem
Proposal of a Research Infrastructure within the Horizon 2020 Call INFRAIA-1-2014-2015
Integrating and opening research infrastructures of European interest
Mathematics and ICT - Starting Communities
Distributed, multidisciplinary European infrastructure on Big Data and social data mining
Coordinator: Fosca GIANNOTTI, ISTI-CNR, Pisa, Italy fosca.giannotti@isti.cnr.it
Participant No Participant organisation name Country
1 - CNR Consiglio Nazionale delle Ricerche (PI: Fosca Giannotti) Italy
2 - USFD The University of Sheffield (PI: Hamish Cunningham) UK
3 - UNIPI Università di Pisa (PI: Dino Pedreschi) Italy
4 - FRH Fraunhofer IAIS and IGD (PI: Gennady Andrienko) Germany
5 - UT Tartu Ulikool (PI: Marlon Dumas) Estonia
6 - IMT Scuola IMT (Istituzioni, Mercati, Tecnologie) Lucca (PI:
Guido Caldarelli)
Italy
7 - LUH Gottfried Wilhelm Leibniz Universitaet Hannover (PI:
Wolfgang Nejdl)
Germany
8 - KCL King’s College London (PI: Tobias Blanke) UK
9 - SNS Scuola Normale Superiore di Pisa (PI: Fabrizio Lillo) Italy
10 - AALTO Aalto University (PI: Aristides Gionis) Finland
11 - ETHZ ETH Zurich (PI: Dirk Helbing) Switzerland
12 - TUDelft Technische Universiteit Delft (PI: Jeroen Van Den Hoven) Netherlands
http://sobigdata.eu/
SoBigData
Social Mining & Big Data Ecosystem
Proposal of a Research Infrastructure within the Horizon 2020 Call INFRAIA-1-2014-2015
Integrating and opening research infrastructures of European interest
Mathematics and ICT - Starting Communities
Distributed, multidisciplinary European infrastructure on Big Data and social data mining
Coordinator: Fosca GIANNOTTI, ISTI-CNR, Pisa, Italy fosca.giannotti@isti.cnr.it
Participant No Participant organisation name Country
1 - CNR Consiglio Nazionale delle Ricerche (PI: Fosca Giannotti) Italy
2 - USFD The University of Sheffield (PI: Hamish Cunningham) UK
3 - UNIPI Università di Pisa (PI: Dino Pedreschi) Italy
4 - FRH Fraunhofer IAIS and IGD (PI: Gennady Andrienko) Germany
5 - UT Tartu Ulikool (PI: Marlon Dumas) Estonia
6 - IMT Scuola IMT (Istituzioni, Mercati, Tecnologie) Lucca (PI:
Guido Caldarelli)
Italy
7 - LUH Gottfried Wilhelm Leibniz Universitaet Hannover (PI:
Wolfgang Nejdl)
Germany
8 - KCL King’s College London (PI: Tobias Blanke) UK
9 - SNS Scuola Normale Superiore di Pisa (PI: Fabrizio Lillo) Italy
10 - AALTO Aalto University (PI: Aristides Gionis) Finland
11 - ETHZ ETH Zurich (PI: Dirk Helbing) Switzerland
12 - TUDelft Technische Universiteit Delft (PI: Jeroen Van Den Hoven) Netherlands
Sept 2015 – Aug 2019
9. General goals
www.sobigdata.eu
• Create a European SoBigData Research Infrastructure
• Integrating key national infrastructures and centres of
excellence at European level in big data analytics and social
mining to create a networked, virtual ecosystem, the
SoBigData RI
• SoBigData will leverage these rich scientific assets (big data,
analytical tools andservices, and skills), to enable cutting-
edge, multi-disciplinary social mining experiments;
• Granting access (both virtual and trans-national on-site) to
the SoBigData RI to multidisciplinary scientists, innovators,
public bodies, citizen organizations, SMEs, as well as data
science students at any level of education.
10. Key developers of the SoBigData
network: From CNR to Sheffield
• Fosca Giannotti
• Dino Pedreschi
• Kalina Bontcheva
12. Horizon 2020 then …
http://ec.europa.eu/research/infra
structures/pdf/final-report-CEI-
2013.pdf
13. Horizon 2020 now …
Integrating Activity Call 2016-2017
• Open to researchers involved in RI’s across all
disciplines
• Predefined topics for advanced communities
• No predefined topics for starting communities
• 2 stage application process for starting communities
• Likely to be very competitive, only high quality
proposals will be funded
20. Existing national RIs to be integrated
• SoBigData.it CNR & University of Pisa & SNS & IMT
www.sobigdata.it SNA,HMA, WA
• GATE USFD, Sheffield UK http://gate.ac.uk TSMM
• IVAS Fraunhofer IGD, Darmstadt, DE
https://www.igd.fraunhofer.
• Alexandria LUH, Hannover, DE http://www.L3S.de WA
• Aalto Helsinki, Finland SNA
• E-GovData Tartu, Estonia http://www.cs.ut.ee/ SD
• Living Archive, Zurich, Switzerland SD
24. Big Social Data research at
King’s College London I:
Our Data Ourselves
Mining Youth Cultures in Mobile
Phone Environments
Exploring new ways of co-research
with youth communities and focus
groups
25. Big Social Data research at
King’s College London II:
Empowering Data Citizens
• Exploring new avenues
of research with open
cultural data
26. Trans-National and Virtual Access
• Transnational Access: provide on-site access to the infrastructures
and the accompanying world-leading research expertise. Calls for
on-site access will be launched for two types of projects: call-specific
and open call ones
• Virtual Access: This WP will offer virtual access to the SoBigData
to reuse of existing and newly created social data mining resources.
SoBigData 53
Short name of participant CNR USFD UNIPI FRH UT IMT
Person-months per participant: 18 12 5 4
Participant number 7 8 9 10 11 12
Short name of participant LUH KCL SNS AALTO ETHZ TUDelft
Person-months per participant: 5 4
Objectives
Provision of on-site access is granted to six thematic clusters of facilities provided by seven national
infrastructures, offering world-leading research expertise from multiple disciplines, as well as big data
computing platforms, big social data resources, and cutting-edge computational methods.
Description of the infrastructure
Name of the thematic clusters:
1. Text and Social Media Mining (TSMM)
2. Social Network Analysis (SNA)
3. Human Mobility Analytics (HMA)
4. Web Analytics (WA)
5. Visual Analytics (VA)
6. Social Data (SD)
In the following table we summarize the national infrastructures with the thematic cluster covered:
Name of
infrastructure
Location Web site Annual
operating cost
Thematic
clusters
SoBigData.it CNR, Pisa, IT
University of Pisa, IT
www.sobigdata.it 100,000 € TSMM,
SNA,
HMA, WA
GATE USFD, Sheffield UK http://gate.ac.uk 280,000 € TSMM
IVAS Fraunhofer IGD,
Darmstadt, DE
https://www.igd.fraunhofer.
de/en/Institut/Abteilungen/I
VA
50,000 € VA
Alexandria LUH, Hannover, DE http://www.L3S.de 50,000 € WA
Aalto Helsinki, Finland SNA
E-GovData Tartu, Estonia http://www.cs.ut.ee/ SD
Living
Archive
Zurich, Switzerland SD
28. WP2 Legal and Ethical Framework
WP1 Project Management
SoBigData RI
WP3
Disseminate
and
Sustain
WP5
Innovate
WP7
Virtual
Access
WP6
On-site
Access
WP8
Data Ecosystem
WP9 Services
Ecosystem
WP10 RI Creation
SoBigData Starting Community
WP4
Train
29. WP Leaders
• WP1: Management. Fosca Giannotti, CNR
• WP2: Ethics. Jeroen Van Den Hoven, TUDelft
• WP3: Dissemination. Hamish Cunningham, USFD
• WP4: Training. Tobias Blanke, KCL
• WP5: Innovation. Dirk Helbing, ETHZ
• WP6: Transactional Access. Kalina Bontcheva, USFD
• WP7: Virtual Access. Roberto Trasarti, CNR
• WP8: Big Data EcoSystem. Wolfgang Nejdl, LUH
• WP9: Big Data Analytics methods and techniques. Dino
Pedreschi, UNIPI
• WP10: SoBigData e-Infrastructure. Paolo Manghi, CNR
• WP11: Evaluation, Natalia Andrienko, FRH
30. WP by WP
WP 1: Management
WP 2: Legal and Ethical Framework
Ethics board and privacy-by-design framework
WP 3: Dissemination, Impact, and Sustainability
WP 4: Training
Summer schools, training modules, datathons
WP 5: Innovation activities aimed at industrial and
other non-academi stakeholders
Policy development, knowledge transfer
WP 6: Transnational Access
WP 7: Virtual Access
31. WP by WP
WP 8: Big Data Ecosystem
Data Management (including publicly available and
restricted data sets) and access mechanisms
WP 9: Big Data Analytics Methods and
Techniques
SNA, Mobility Analytics, Text Mining
Focus on integration and continuation
WP 10: SoBigData e-Infrastructure
VREs for data scientists
WP 11: Evaluation and Cross-disciplinary social
mining exploratories
32. Summary
• SoBigData is a Research Infrastructure for
Data Scientists
• September 2015 – August 2019
• Starting Community
• Integrating Activity to bring together
activities and services on BSD
• Strong Transnational Access Component
Editor's Notes
data analytics
FutureICT (€1B project suggestion)
- Out of the FP7
- Integrating ICT and society (techno-socio-economic systems)
5 or 10 million
An ever-growing, distributed data ecosystem for procurement, access and curation of big social data
An ever-growing, distributed platform of interoperable, social data mining methods and associated skills
GATE: Text Mining (
Thematic Clusters
[TSMM] Text and Social Media Mining
[SNA] Social Network Analysis
[HMA] Human Mobility Analytics
[WA] Web Analytics
[VA] Visual Analytics
[SD] Social Data
WP 4: Training
Summer schools, training modules, datathons