SlideShare a Scribd company logo
1 of 24
Social Mining & Big Data Ecosystem
Educating Data Scientists:
the SoBigData master
experience
www.sobigdata.eu
Fosca Giannotti, Valerio Grossi
ISTI-CNR Pisa
H2020-INFRAIA-2014-2015
Grant Agreement N. 654024
Modern science is data-intensive,
multidisciplinary, collaborative and global
– efficiency of data management (noSQL paradigms and
cloud computing play important role here) and
curation, search, sharing, transfer.
– managing the complexity of the analytical process is a
key issue (scalable distributed analytical methods and
and Visual Analytics are crucial here).
Firenze, 14 Nov 2016
Validation
Data
DemographicdataGeographicdataMovementdataTransportdata
Models
T-ClusteringT-Patterns
Forecasts
Big Data Analytics process
Firenze, 14 Nov 2016
Interdisciplinary and collaborative
• for sharing data/models/processes and results of
experiments (different level of interoperability and semantic
enrichment)
• to realize experiments by combining resources (data, methods
and results) belonging to different communities.
– This call for tools facilitating the govern of complex
analytical process in a workflow style or mega-modeling.
– This call also for sophisticate search that supports resource
discovery.
Firenze, 14 Nov 2016
Data scientist
A new kind of professional
has emerged, the data
scientist, who combines the
skills of software
programmer, statistician and
storyteller/artist to extract
the nuggets of gold hidden
under mountains of data.
Firenze, 14 Nov 2016
Four core points of a data scientist
• Data Procurement and Curation
• Making sense of Data
• Story-telling
• Respond step-by-step on technical correctness and
legal and ethical issues
Firenze, 14 Nov 2016
SoBigData is…
A Multidisciplinary European Infrastructure for Big Data and Social
Data Mining providing an integrated ecosystem for ethically
sensitive scientific discoveries and advanced applications of social
data mining on the various dimensions of social life, as recorded by
“big data”.
Firenze, 14 Nov 2016
Social Mining - Answer to:
Firenze, 14 Nov 2016
• Who will win US elections? What’s the elector’s current
intention of vote? How reliable is it?
• Which are the indicators of social well-being (beyond GDP)
and how can they be computed and monitored?
• How is the aging population effectively helped by the social
participation to digital community services?
• What is the link between media ownership and media
content? Is there bias in news reporting? And in content
reviews?
• Is an infective disease emerging? How is its diffusion model?
Firenze, 14 Nov 2016
Estimating traffic fluxes on road network with mobile phone
data
A
B
C
H
W
Firenze, 14 Nov 2016
Predicting Success
“Football is a simple game: 22 men chase a ball for 90
minutes and at the end, the Germans always win”
-- Gary Lieneker (after Italy 1990 Final)
Firenze, 14 Nov 2016
Managing Data does not means
Support discover
Provide access, Verify the quality of data, Clean errors, outliers, anomalier
Transform data in a format suitable for specific data analytical tools
It must include support for
• legal interoperability
– copyright management,
– licensing of single and derivative products
– terms of use
• fine-grained policies
– attribution,
– citation policy,
– provenance management
• Ethics issues
Managing Data: what this means?
Firenze, 14 Nov 2016
Metadata in the SoBigData RI
experience
• Huge datasets often describe human activities, which implies
privacy and ethical issues
• As a Research Infrastructure FAIRness is one of our main targets
– The success of the RI is directly connected to the fact that
datasets are Findable, Accessible, Interoperable and
Reusable
– The intellectual property has to be considered
– The design of a highly structured metadata schema allows
the RI to automatically grant or deny access to a dataset, to
force the acceptance of terms of use or signing NDAs…
SoBigData metadata structure
• A highly structured and detailed metadata structure
has been designed in order to provide information
about:
– Description of the dataset (to make it Findable)
– How the dataset has been produced
– Intellectual Property
– Privacy issues
– Who can access the data and how (terms of use,
NDA…)
• Mainly based on the DataCite standard
The ethics of SoBigData
• Gathering large quantities of data has serious consequences
that SoBigData is trying to address. These consequences range
from personal harm, to issues of autonomy, injustice and
inequality.
• In order to deal with these problems, SoBigData adheres to a
value-sensitive design approach. This approach consists in using
design solutions to overcome ethical dilemma’s, in this case
those between the utility of the data gathered vs. the
protection of the individuals subject to the research.
• In order to make the ideals of SoBigData successful, scientific
methods also need to be developed in order embed moral
principles in practice.
Ethics: the challenge for SoBigData
• How do we create an infrastructure in which such data
and methods can be disseminated and improved
upon?
1. A Massive Online Open Cource (MOOC) which instructs all
prospective researchers about the legal and ethical
dangers of big data research and the steps they can take to
minimise these;
2. A set of workflows that outline the steps researchers can
take when designing their approach;
3. Information pop-ups which redirect researchers to state-of-
the-art ethical methods.
Meta data definition: Ethics
Firenze, 14 Nov 2016
Meta data definition: Intellectual Properties
Firenze, 14 Nov 2016
Master in Big Data Analytics & Social Mining
http://www.sobigdata.eu/master/bigdata
Firenze, 14 Nov 2016
Firenze, 14 Nov 2016
Education
• Big Data Sensing
• Big Data Mining
• Big Data Story Telling
• Big Data Technology
• Big Data for Social Good
• Big Data Ethics
Firenze, 14 Nov 2016
Students: their studies
0
1
2
3
4
5
6
7
8
2015
2016
Firenze, 14 Nov 2016
Gender distribution
0
5
10
15
20
25
2014-2015 2015-2016
M
F
Firenze, 14 Nov 2016
Firenze, 14 Nov 2016

More Related Content

What's hot

The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...Anna De Liddo
 
Addressing non economical externalities
Addressing non economical externalitiesAddressing non economical externalities
Addressing non economical externalitiesBYTE Project
 
A-XLRM summary for BYTE case studies: Crisis, culture and health
A-XLRM summary for BYTE case studies: Crisis, culture and healthA-XLRM summary for BYTE case studies: Crisis, culture and health
A-XLRM summary for BYTE case studies: Crisis, culture and healthBYTE Project
 
Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era
Algorithmic Systems Transparency and Accountability in Big Data & Cognitive EraAlgorithmic Systems Transparency and Accountability in Big Data & Cognitive Era
Algorithmic Systems Transparency and Accountability in Big Data & Cognitive EraNozha Boujemaa
 
BYTE bdva Valencia Summit November 2016
BYTE bdva Valencia Summit November 2016BYTE bdva Valencia Summit November 2016
BYTE bdva Valencia Summit November 2016Trilateral Research
 
Cross-Disciplinary Insights on Big Data Challenges and Solutions
Cross-Disciplinary Insights on Big Data Challenges and SolutionsCross-Disciplinary Insights on Big Data Challenges and Solutions
Cross-Disciplinary Insights on Big Data Challenges and SolutionsBYTE Project
 
Phaedra II Technology foresight, 17 Nov 2016
Phaedra II Technology foresight, 17 Nov 2016Phaedra II Technology foresight, 17 Nov 2016
Phaedra II Technology foresight, 17 Nov 2016Trilateral Research
 
Digital notebooks - a Jisc perspective
Digital notebooks - a Jisc perspectiveDigital notebooks - a Jisc perspective
Digital notebooks - a Jisc perspectiveChristopher Brown
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Open Data: Barriers, Risks, and Opportunities
Open Data: Barriers, Risks, and OpportunitiesOpen Data: Barriers, Risks, and Opportunities
Open Data: Barriers, Risks, and OpportunitiesSlim Turki, Dr.
 
Research engagement in EUDAT| www.eudat.eu |
Research engagement in EUDAT| www.eudat.eu | Research engagement in EUDAT| www.eudat.eu |
Research engagement in EUDAT| www.eudat.eu | EUDAT
 
Open data ecosystems research talk at Copenhagen Business School on 25042014
Open data ecosystems research talk at Copenhagen Business School on 25042014Open data ecosystems research talk at Copenhagen Business School on 25042014
Open data ecosystems research talk at Copenhagen Business School on 25042014Matti Rossi
 
Customer Centricity at ATF 11Jun2014
Customer Centricity at ATF 11Jun2014Customer Centricity at ATF 11Jun2014
Customer Centricity at ATF 11Jun2014Rick Holgate
 
Holger Wollschläger | E-government at its best: Open, transparent and useful
Holger Wollschläger | E-government at its best: Open, transparent and usefulHolger Wollschläger | E-government at its best: Open, transparent and useful
Holger Wollschläger | E-government at its best: Open, transparent and usefulsemanticsconference
 
Data ecosystems: turning data into public value
Data ecosystems:  turning data into public valueData ecosystems:  turning data into public value
Data ecosystems: turning data into public valueSlim Turki, Dr.
 
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...CambridgeshireInsight
 
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...Anna Maria Tammaro
 

What's hot (20)

The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...
 
Addressing non economical externalities
Addressing non economical externalitiesAddressing non economical externalities
Addressing non economical externalities
 
A-XLRM summary for BYTE case studies: Crisis, culture and health
A-XLRM summary for BYTE case studies: Crisis, culture and healthA-XLRM summary for BYTE case studies: Crisis, culture and health
A-XLRM summary for BYTE case studies: Crisis, culture and health
 
Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era
Algorithmic Systems Transparency and Accountability in Big Data & Cognitive EraAlgorithmic Systems Transparency and Accountability in Big Data & Cognitive Era
Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era
 
BYTE bdva Valencia Summit November 2016
BYTE bdva Valencia Summit November 2016BYTE bdva Valencia Summit November 2016
BYTE bdva Valencia Summit November 2016
 
Cross-Disciplinary Insights on Big Data Challenges and Solutions
Cross-Disciplinary Insights on Big Data Challenges and SolutionsCross-Disciplinary Insights on Big Data Challenges and Solutions
Cross-Disciplinary Insights on Big Data Challenges and Solutions
 
Phaedra II Technology foresight, 17 Nov 2016
Phaedra II Technology foresight, 17 Nov 2016Phaedra II Technology foresight, 17 Nov 2016
Phaedra II Technology foresight, 17 Nov 2016
 
Data Science and its impact on society
Data Science and its impact on societyData Science and its impact on society
Data Science and its impact on society
 
Digital notebooks - a Jisc perspective
Digital notebooks - a Jisc perspectiveDigital notebooks - a Jisc perspective
Digital notebooks - a Jisc perspective
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Collaborate to Share
Collaborate to ShareCollaborate to Share
Collaborate to Share
 
Open Data: Barriers, Risks, and Opportunities
Open Data: Barriers, Risks, and OpportunitiesOpen Data: Barriers, Risks, and Opportunities
Open Data: Barriers, Risks, and Opportunities
 
Research Data Alliance Overview
Research Data Alliance OverviewResearch Data Alliance Overview
Research Data Alliance Overview
 
Research engagement in EUDAT| www.eudat.eu |
Research engagement in EUDAT| www.eudat.eu | Research engagement in EUDAT| www.eudat.eu |
Research engagement in EUDAT| www.eudat.eu |
 
Open data ecosystems research talk at Copenhagen Business School on 25042014
Open data ecosystems research talk at Copenhagen Business School on 25042014Open data ecosystems research talk at Copenhagen Business School on 25042014
Open data ecosystems research talk at Copenhagen Business School on 25042014
 
Customer Centricity at ATF 11Jun2014
Customer Centricity at ATF 11Jun2014Customer Centricity at ATF 11Jun2014
Customer Centricity at ATF 11Jun2014
 
Holger Wollschläger | E-government at its best: Open, transparent and useful
Holger Wollschläger | E-government at its best: Open, transparent and usefulHolger Wollschläger | E-government at its best: Open, transparent and useful
Holger Wollschläger | E-government at its best: Open, transparent and useful
 
Data ecosystems: turning data into public value
Data ecosystems:  turning data into public valueData ecosystems:  turning data into public value
Data ecosystems: turning data into public value
 
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
Cambridgeshire Insight Open Data: What we’ve learnt from the unexpected - He...
 
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
Data curator: who is s / he?
Findings of the IFLA Library Theory and Research...
 

Viewers also liked

Social Network Analysis Project
Social Network Analysis ProjectSocial Network Analysis Project
Social Network Analysis ProjectFrancesco Corucci
 
Fosca Giannotti - Università di Pisa & ISTI-CNR - Big Data and Social Data Mi...
Fosca Giannotti - Università di Pisa & ISTI-CNR - Big Data and Social Data Mi...Fosca Giannotti - Università di Pisa & ISTI-CNR - Big Data and Social Data Mi...
Fosca Giannotti - Università di Pisa & ISTI-CNR - Big Data and Social Data Mi...AmbasciatadelCanada
 
Dino pedreschi keynote ieee cist 2014 BIG DATA ANALYTICS & SOCIAL MINING
Dino pedreschi keynote ieee cist 2014 BIG DATA ANALYTICS & SOCIAL MININGDino pedreschi keynote ieee cist 2014 BIG DATA ANALYTICS & SOCIAL MINING
Dino pedreschi keynote ieee cist 2014 BIG DATA ANALYTICS & SOCIAL MININGieee-cist
 
Data management experiences in the European projects context: which lessons f...
Data management experiences in the European projects context: which lessons f...Data management experiences in the European projects context: which lessons f...
Data management experiences in the European projects context: which lessons f...Research Data Alliance
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningResearch Data Alliance
 
Soil Research Data Policies, Data availability and Access, and the Interopera...
Soil Research Data Policies, Data availability and Access, and the Interopera...Soil Research Data Policies, Data availability and Access, and the Interopera...
Soil Research Data Policies, Data availability and Access, and the Interopera...Research Data Alliance
 
Rda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedRda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedResearch Data Alliance
 

Viewers also liked (8)

Social Network Analysis Project
Social Network Analysis ProjectSocial Network Analysis Project
Social Network Analysis Project
 
Fosca Giannotti - Università di Pisa & ISTI-CNR - Big Data and Social Data Mi...
Fosca Giannotti - Università di Pisa & ISTI-CNR - Big Data and Social Data Mi...Fosca Giannotti - Università di Pisa & ISTI-CNR - Big Data and Social Data Mi...
Fosca Giannotti - Università di Pisa & ISTI-CNR - Big Data and Social Data Mi...
 
Dino pedreschi keynote ieee cist 2014 BIG DATA ANALYTICS & SOCIAL MINING
Dino pedreschi keynote ieee cist 2014 BIG DATA ANALYTICS & SOCIAL MININGDino pedreschi keynote ieee cist 2014 BIG DATA ANALYTICS & SOCIAL MINING
Dino pedreschi keynote ieee cist 2014 BIG DATA ANALYTICS & SOCIAL MINING
 
Rda in a_nutshell_january_2017
Rda in a_nutshell_january_2017Rda in a_nutshell_january_2017
Rda in a_nutshell_january_2017
 
Data management experiences in the European projects context: which lessons f...
Data management experiences in the European projects context: which lessons f...Data management experiences in the European projects context: which lessons f...
Data management experiences in the European projects context: which lessons f...
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social Mining
 
Soil Research Data Policies, Data availability and Access, and the Interopera...
Soil Research Data Policies, Data availability and Access, and the Interopera...Soil Research Data Policies, Data availability and Access, and the Interopera...
Soil Research Data Policies, Data availability and Access, and the Interopera...
 
Rda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedRda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updated
 

Similar to Educating Data Scientists: the SoBigData master experience

My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
 
Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020OpenAIRE
 
Locus Charter Presentation
Locus Charter Presentation Locus Charter Presentation
Locus Charter Presentation Suchith Anand
 
Research Data Alliance: Current Activities and Expected Impact
Research Data Alliance: Current Activities and Expected ImpactResearch Data Alliance: Current Activities and Expected Impact
Research Data Alliance: Current Activities and Expected ImpactHerman Stehouwer
 
A Socio-Technical Design Approach to Build Crowdsourced and Volunteered Geogr...
A Socio-Technical Design Approach to Build Crowdsourced and Volunteered Geogr...A Socio-Technical Design Approach to Build Crowdsourced and Volunteered Geogr...
A Socio-Technical Design Approach to Build Crowdsourced and Volunteered Geogr...José Pablo Gómez Barrón S.
 
The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...African Open Science Platform
 
Exploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraExploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraTogar Simatupang
 
Health Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big DataHealth Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big DataPhilip Bourne
 
dissertation proposal writing service
dissertation proposal writing servicedissertation proposal writing service
dissertation proposal writing servicePhd Assistance
 
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...Andrei Ciortea
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environmentphilipdurbin
 
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventSusanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventGigaScience, BGI Hong Kong
 
Me and My Big Data Project
Me and My Big Data Project Me and My Big Data Project
Me and My Big Data Project DIPRC2019
 
20170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 202020170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 2020OpenAIRE
 

Similar to Educating Data Scientists: the SoBigData master experience (20)

My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020
 
Locus Charter Presentation
Locus Charter Presentation Locus Charter Presentation
Locus Charter Presentation
 
Open Data is not Enough
Open Data is not EnoughOpen Data is not Enough
Open Data is not Enough
 
Research Data Alliance: Current Activities and Expected Impact
Research Data Alliance: Current Activities and Expected ImpactResearch Data Alliance: Current Activities and Expected Impact
Research Data Alliance: Current Activities and Expected Impact
 
A Socio-Technical Design Approach to Build Crowdsourced and Volunteered Geogr...
A Socio-Technical Design Approach to Build Crowdsourced and Volunteered Geogr...A Socio-Technical Design Approach to Build Crowdsourced and Volunteered Geogr...
A Socio-Technical Design Approach to Build Crowdsourced and Volunteered Geogr...
 
The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...
 
Exploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital EraExploring Research Opportunities in the Digital Era
Exploring Research Opportunities in the Digital Era
 
Big data
Big dataBig data
Big data
 
Health Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big DataHealth Policy and Management as it Relates to Big Data
Health Policy and Management as it Relates to Big Data
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
dissertation proposal writing service
dissertation proposal writing servicedissertation proposal writing service
dissertation proposal writing service
 
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
Hypermedia-driven Socio-technical Networks for Goal-driven Discovery in the W...
 
ppt1.pptx
ppt1.pptxppt1.pptx
ppt1.pptx
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"eventSusanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
Susanna Sansone at the Knowledge Dialogues/ODHK "Beyond Open"event
 
Me and My Big Data Project
Me and My Big Data Project Me and My Big Data Project
Me and My Big Data Project
 
From Aspiration to Reality: Open Smart Cities
From Aspiration to Reality: Open Smart CitiesFrom Aspiration to Reality: Open Smart Cities
From Aspiration to Reality: Open Smart Cities
 
20170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 202020170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 2020
 

More from Research Data Alliance

The Value of the Research Data Alliance to Individuals
The Value of the Research Data Alliance to IndividualsThe Value of the Research Data Alliance to Individuals
The Value of the Research Data Alliance to IndividualsResearch Data Alliance
 
The Value of the Research Data Alliance to Individuals
The Value of the Research Data Alliance to IndividualsThe Value of the Research Data Alliance to Individuals
The Value of the Research Data Alliance to IndividualsResearch Data Alliance
 
RDA Value for Infrastructure Providers
RDA Value for Infrastructure ProvidersRDA Value for Infrastructure Providers
RDA Value for Infrastructure ProvidersResearch Data Alliance
 
The Value of the Rda Value for Organisations Performing Research
The Value of the Rda Value for Organisations Performing ResearchThe Value of the Rda Value for Organisations Performing Research
The Value of the Rda Value for Organisations Performing ResearchResearch Data Alliance
 

More from Research Data Alliance (20)

RDA in a Nutshell - September 2020
RDA in a Nutshell - September 2020RDA in a Nutshell - September 2020
RDA in a Nutshell - September 2020
 
RDA in a Nutshell - August 2020
RDA in a Nutshell - August 2020RDA in a Nutshell - August 2020
RDA in a Nutshell - August 2020
 
RDA in a Nutshell - July 2020
RDA in a Nutshell - July 2020RDA in a Nutshell - July 2020
RDA in a Nutshell - July 2020
 
RDA in a Nutshell - June 2020
RDA in a Nutshell - June 2020RDA in a Nutshell - June 2020
RDA in a Nutshell - June 2020
 
RDA in a Nutshell - May 2020
RDA in a Nutshell - May 2020RDA in a Nutshell - May 2020
RDA in a Nutshell - May 2020
 
RDA in a Nutshell - April 2020
RDA in a Nutshell - April 2020RDA in a Nutshell - April 2020
RDA in a Nutshell - April 2020
 
RDA in a Nutshell - March 2020
RDA in a Nutshell - March 2020RDA in a Nutshell - March 2020
RDA in a Nutshell - March 2020
 
RDA in a Nutshell - February 2020
RDA in a Nutshell - February 2020RDA in a Nutshell - February 2020
RDA in a Nutshell - February 2020
 
RDA in a Nutshell - January 2020
RDA in a Nutshell - January 2020RDA in a Nutshell - January 2020
RDA in a Nutshell - January 2020
 
Rda in a Nutshell - December 2019
Rda in a Nutshell - December 2019Rda in a Nutshell - December 2019
Rda in a Nutshell - December 2019
 
Rda in a Nutshell - November 2019
Rda in a Nutshell - November 2019Rda in a Nutshell - November 2019
Rda in a Nutshell - November 2019
 
RDA in a Nutshell - October 2019
RDA in a Nutshell - October 2019RDA in a Nutshell - October 2019
RDA in a Nutshell - October 2019
 
The Value of the Research Data Alliance to Individuals
The Value of the Research Data Alliance to IndividualsThe Value of the Research Data Alliance to Individuals
The Value of the Research Data Alliance to Individuals
 
The Value of the Research Data Alliance to Individuals
The Value of the Research Data Alliance to IndividualsThe Value of the Research Data Alliance to Individuals
The Value of the Research Data Alliance to Individuals
 
RDA Value for Infrastructure Providers
RDA Value for Infrastructure ProvidersRDA Value for Infrastructure Providers
RDA Value for Infrastructure Providers
 
Rda in a nutshell september 2019
Rda in a nutshell september 2019Rda in a nutshell september 2019
Rda in a nutshell september 2019
 
The Value of the Rda Value for Organisations Performing Research
The Value of the Rda Value for Organisations Performing ResearchThe Value of the Rda Value for Organisations Performing Research
The Value of the Rda Value for Organisations Performing Research
 
RDA Value for Libraries
RDA Value for LibrariesRDA Value for Libraries
RDA Value for Libraries
 
The Value of the RDA for Funders
The Value of the RDA for FundersThe Value of the RDA for Funders
The Value of the RDA for Funders
 
Rda in a nutshell august 2019
Rda in a nutshell august 2019Rda in a nutshell august 2019
Rda in a nutshell august 2019
 

Recently uploaded

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Recently uploaded (20)

Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Educating Data Scientists: the SoBigData master experience

  • 1. Social Mining & Big Data Ecosystem Educating Data Scientists: the SoBigData master experience www.sobigdata.eu Fosca Giannotti, Valerio Grossi ISTI-CNR Pisa H2020-INFRAIA-2014-2015 Grant Agreement N. 654024
  • 2. Modern science is data-intensive, multidisciplinary, collaborative and global – efficiency of data management (noSQL paradigms and cloud computing play important role here) and curation, search, sharing, transfer. – managing the complexity of the analytical process is a key issue (scalable distributed analytical methods and and Visual Analytics are crucial here). Firenze, 14 Nov 2016
  • 4. Interdisciplinary and collaborative • for sharing data/models/processes and results of experiments (different level of interoperability and semantic enrichment) • to realize experiments by combining resources (data, methods and results) belonging to different communities. – This call for tools facilitating the govern of complex analytical process in a workflow style or mega-modeling. – This call also for sophisticate search that supports resource discovery. Firenze, 14 Nov 2016
  • 5. Data scientist A new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Firenze, 14 Nov 2016
  • 6. Four core points of a data scientist • Data Procurement and Curation • Making sense of Data • Story-telling • Respond step-by-step on technical correctness and legal and ethical issues Firenze, 14 Nov 2016
  • 7. SoBigData is… A Multidisciplinary European Infrastructure for Big Data and Social Data Mining providing an integrated ecosystem for ethically sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”. Firenze, 14 Nov 2016
  • 8. Social Mining - Answer to: Firenze, 14 Nov 2016 • Who will win US elections? What’s the elector’s current intention of vote? How reliable is it? • Which are the indicators of social well-being (beyond GDP) and how can they be computed and monitored? • How is the aging population effectively helped by the social participation to digital community services? • What is the link between media ownership and media content? Is there bias in news reporting? And in content reviews? • Is an infective disease emerging? How is its diffusion model?
  • 10. Estimating traffic fluxes on road network with mobile phone data A B C H W Firenze, 14 Nov 2016
  • 11. Predicting Success “Football is a simple game: 22 men chase a ball for 90 minutes and at the end, the Germans always win” -- Gary Lieneker (after Italy 1990 Final) Firenze, 14 Nov 2016
  • 12. Managing Data does not means Support discover Provide access, Verify the quality of data, Clean errors, outliers, anomalier Transform data in a format suitable for specific data analytical tools It must include support for • legal interoperability – copyright management, – licensing of single and derivative products – terms of use • fine-grained policies – attribution, – citation policy, – provenance management • Ethics issues Managing Data: what this means? Firenze, 14 Nov 2016
  • 13. Metadata in the SoBigData RI experience • Huge datasets often describe human activities, which implies privacy and ethical issues • As a Research Infrastructure FAIRness is one of our main targets – The success of the RI is directly connected to the fact that datasets are Findable, Accessible, Interoperable and Reusable – The intellectual property has to be considered – The design of a highly structured metadata schema allows the RI to automatically grant or deny access to a dataset, to force the acceptance of terms of use or signing NDAs…
  • 14. SoBigData metadata structure • A highly structured and detailed metadata structure has been designed in order to provide information about: – Description of the dataset (to make it Findable) – How the dataset has been produced – Intellectual Property – Privacy issues – Who can access the data and how (terms of use, NDA…) • Mainly based on the DataCite standard
  • 15. The ethics of SoBigData • Gathering large quantities of data has serious consequences that SoBigData is trying to address. These consequences range from personal harm, to issues of autonomy, injustice and inequality. • In order to deal with these problems, SoBigData adheres to a value-sensitive design approach. This approach consists in using design solutions to overcome ethical dilemma’s, in this case those between the utility of the data gathered vs. the protection of the individuals subject to the research. • In order to make the ideals of SoBigData successful, scientific methods also need to be developed in order embed moral principles in practice.
  • 16. Ethics: the challenge for SoBigData • How do we create an infrastructure in which such data and methods can be disseminated and improved upon? 1. A Massive Online Open Cource (MOOC) which instructs all prospective researchers about the legal and ethical dangers of big data research and the steps they can take to minimise these; 2. A set of workflows that outline the steps researchers can take when designing their approach; 3. Information pop-ups which redirect researchers to state-of- the-art ethical methods.
  • 17. Meta data definition: Ethics Firenze, 14 Nov 2016
  • 18. Meta data definition: Intellectual Properties Firenze, 14 Nov 2016
  • 19. Master in Big Data Analytics & Social Mining http://www.sobigdata.eu/master/bigdata Firenze, 14 Nov 2016
  • 21. Education • Big Data Sensing • Big Data Mining • Big Data Story Telling • Big Data Technology • Big Data for Social Good • Big Data Ethics Firenze, 14 Nov 2016