SlideShare a Scribd company logo
(Big) Data (Science) Skills
Big Data Value Association Summit in Madrid
17/06/2015
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
License
• This work is licensed under the license
CC BY-NC-SA 4.0 International
• http://purl.org/NET/rdflicense/cc-by-nc-sa4.0
• You are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
• Under the following conditions
• Attribution — You must attribute the work by inserting
• “[source Oscar Corcho]” at the footer of each reused slide
• a credits slide stating: “These slides are partially based on
“(Big) Data (Science) Skills” by O. Corcho”
• Non-commercial
• Share-Alike
Data Scientist: Technical and Soft Skills needed
• One of the two or
three pictures
expected from a talk
on skills…
• I may start going
through
• Each of these topics
• Discussing on the
specific skills needed
• However…
Sorry, looking for the reference to add here
What is Big Data?
Source: http://www.philipchircop.com/post/25783275888/seeing-the-full-elephant-its-a-tree-its-a
Big Data and the theory of ecological niches
Characteristics of an ecological niche
• A niche is defined by a spectrum of resource usage
• Species differ from each other in how efficient they are in
using resources that change continuously
• Characteristics of a niche
• Amplitude (range in which resources are used)
• Generic species (they can use a wide range of
resources)
• Specialist species (they require a very specific
combination of resources)
• Overlap (similarity among niches in their usage of resources)
• Competitive exclusion principle (Gause, 1934)
• If two species coexist in a stable environment, they do it as a
differentiation of their effective ecological niches.
Source: Javier Seoane. Ecología. Unidad Temática 21. Teoría del nicho ecológico
WHAT’S THE RELATIONSHIP
TO BIG DATA?
Well, that’s interesting, but…
Big Data Niche 1. HPC and e-Infrastructure Experts
Background: Computer Science (Systems)
System Administration
Terms used in their native language:
Blades, Infiniband, OpenMPI,
racks, HDF, TBs, Gflops
Their daily life:
Check system logs
Make sure that queues are active
Install a new rack
What’s Big Data for them?
A “commercial” term for something
that they have done for a long time
They really know how to configure
and monitor a Hadoop cluster
They would love seeing those talking
about Big Data executing processes
on fluid dynamics
Big Data Niche 2. Data Storage and Access Experts
Background: Computer Science
Database administration
Terms used in their native language:
SQL, NoSQL, Column store
Transacions, Hive, TBs/PBs/…,
TPS (Transactions per s)
Their daily life:
Optimise several queries
Run a new benchmark
Design an optimiser/physical operator
What’s Big Data for them?
A new opportunity to work on
optimisation algorithms
They know how to configure a database
They often laugh at those who deploy
a NoSQL solution for a problem
that can be solved with a
relational database
Big Data Niche 3. Machine Learning Experts
Background: Mathematics, Statistics,
Physics, Computer Science
Terms used in their native language:
Complexity, algorithm, p-value,
convergence, precision, recall
ROC curves, bayesian networks, R
Their daily life:
Read about a new problem
Write down a few formulae in the
whiteboard (even blackboards)
Prove that the algorithm terminates
What’s Big Data for them?
The same problems applied to data of
larger size, with new challenges
Problems are not only solved in
Haddop or a powerful NoSQL DB
Astonished by those who still mix up
correlation and causality
Big Data Niche 4. Slow-data Experts
Background: Computer Science, Statistics,
Library Sciences, Linguistics
Terms used in their native language:
Information model, vocabulary,
ontology, data quality, curation
Their daily life:
Receive a database schema
Talk to data producers and (re)users
Obtain consensus and transform data
What’s Big Data for them?
The difficulty lies on the variety of
data formats and structures
We may integrate data from varied
sources, although this is not
always possible
When you manage to integrate
heterogeneous data, you can achieve
better results
Big Data Niche 5. (Big Data) Consultants
Background: Computer Science, Economy,
…
Terms used in their native language:
Business model, business opportunity,
Big Data, Data Value Chain,
Hadoop, Spark, R, TBs, GFlops
Their daily life:
Read a Gartner Big Data report
Talk to potential customers
Transfer needs to technicians
What’s Big Data for them?
It’s the 4Vs, plus a few more
I have a PPT presentation with a
Big Data infrastructure,
architecture,
and previous projects, which I will
use to sell a project to my
customers
Are we missing any ecological niche?
• We have already seen a couple of ecological
niches…
• They all coexist
• Some of them are overlapping
Is there anyone that has not been yet
considered?
The evolution of a new species: the Data Scientist
Background: Computer Science+Statistics+
+Mathematics+Economy+
…
Terms used in their new exotic language:
HPC, databases, algorithms,
harmonisation, integration,
Hadoop, Spark, R, TBs, GFlops
Their daily life:
Learn about a new infraestructure
Code scripts to be run on Spark
Interpret results
Install a new framework
Read a few scientific papers
Make shiny presentations
Describe in their blog the activities
that they do, so that Big Data is
better known and understood
…
© Volker Markl: “Data Scientist” – “Jack of All Trades!”
Application
Data
Science
Control Flow
Iterative Algorithms
Error Estimation
Active Sampling
Sketches
Curse of Dimensionality
Decoupling
Convergence
Monte Carlo
Mathematical Programming
Linear Algebra
Stochastic Gradient Descent
Regression
Statistics
Hashing
Parallelization
Query Optimization
Fault Tolerance
Relational Algebra / SQL
Scalability
Data Analysis Language
Compiler
Memory Management
Memory Hierarchy
Data Flow
Hardware Adaptation
Indexing
Resource Management
NF2 /XQuery
Data Warehouse/OLAP
Domain Expertise (e.g., Industry 4.0, Medicine, Physics, Engineering, Energy, Logistics)
Real-Time
Data Scientists and Pi-shaped people
• Let’s now go into
the expected
discussion
Sorry, looking for the reference to add here
Will all species survive?
• If Big Data defines an ecosystem…
• Which species will survive?
• Will Data Scientists wipe out the other species?
• Or will they be able to live in perfect symbiosis?
What is the ideal training required
for the individuals of these
species so that they can survive?
Data Science starter kits. Are they effective?
Masters in Data Science, Big Data and alike (I)
Expert in Big Data
Expert in Data Science
Masters in Data Science, Big Data and alike (II)
Masters in Data Science, Big Data and alike (III)
Year 1
• Data handling
• Data analysis
• Advanced data analysis and data
management
• Visualization
• Applications
Year 2
Are we doing it right in terms of training?
• Probably it is all about lack of maturity in the area, but
syllabi do not seem to be perfectly compatible…
• It is not easy to believe that we can create Data
Scientists in only one year
• Should we train people to know a bit about everything?
• Or should we separate more clearly the species in our
ecosystem and specialise them better for their work?
How do we manage to keep a
healthy and stable ecosystem?
Shameless self-promotion
• Strategies for success in the
Digital-Data Revolution
• Separation of concerns
• Intellectual ramps
• Data-intensive knowledge
discovery
• Components and usage
patterns
• Data-intensive engineering
• Development vs enactment
• Data-intensive application
experiences
• In Science
• In Business
Can we learn from lessons
learned in Data-Intensive
Science?
Separation of concerns: three clear profiles
• Domain experts (WHAT)
• They know the problems they want to
solve
• They know the application domain
• They can create (scientific) workflows
• Data-intensive analysts (WHAT)
• They know a lot about (Big) data
analysis
• The may not know about the
infrastructure behind the scenes
• They do not necessarily know all the
details of the application domain
• Data-intensive engineers (HOW)
• They know a lot about distributed
computing/infraestructure/HPC/cloud
s/etc.
• They received the description of an
algorithm and they can make it more
efficient (parallelisation)
Separation of concerns: Differentiated tasks
[<select =
"1<= day(inp.first.start)<=5",
project="inp">,
<select =
"6<= day(inp.first.start)<=10",
project="inp">,
<select =
"11<= day(inp.first.start)<=15",
project="inp">,
... ]
Programmable
Filter
Project
outputs
inp
rules
distrib
"second.fURI ASC..."
Sort
outp
data
rule
Sort
outp
data
rule
Sort
outp
data
rule
Sort
outp
data
rule
["first,second"]
Tuple
Burst
outp
input
structcols inputs
Tuple
Burst
outp
input
structcols inputs
Tuple
Burst
outp
input
structcols inputs
Tuple
Burst
outp
input
structcols inputs
De
List opinp
De
List opinp
De
List opinp
De
List opinp
inp
CorrFarm
User and application diversity
System complexity
Iterative "what"
process
development
Mapping,
optimisation,
deployment and
execution
Accommodating and facilitating
Several application domains
Several tool sets
Several process representations
Several working practices
DISPEL representation
Composing and providing
Many autonomous resources
One enactment mechanism
A single platform
Gateway
Tool level
Enactment
level
Component
library
Conclusions
• We all know that there are big opportunities in Big Data
• But we need to be more productive. For that we need:
• Create real multidisciplinary teams with at least three roles
(application developers, data-intensive analysts and data-intensive
engineers)
• Understand that simply by using Hadoop, Spark or R we are not
necessarily doing Big Data
• The same as by coding in Java we are not necessarily
understanding object-oriented programming
• Understand that we have to interpret results adequately, from a
scientific point of view
• Understand the importance of homogeneising datasets, in order to
facilitate their integration (slow-data)
• Continue working on delivering tools that can be used to develop
Big Data applications more productively
• Should we also be funding this?
(Big) Data (Science) Skills
Big Data Value Association Summit in Madrid
17/06/2015
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho

More Related Content

What's hot

Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open data
DrupalDay
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
dgarijo
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
SSSW
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
PRELIDA Project
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
dgarijo
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Sören Auer
 
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
Dr. Haxel Consult
 
II-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent OfficeII-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent Office
Dr. Haxel Consult
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked data
Reza Ramezani
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
SSSW
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
Ajay Ohri
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
Sören Auer
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
Sören Auer
 
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...
Dr. Haxel Consult
 
CORFU-MTSR 2013
CORFU-MTSR 2013CORFU-MTSR 2013
Linked Open Data and Ontotext Projects
Linked Open Data and Ontotext ProjectsLinked Open Data and Ontotext Projects
Linked Open Data and Ontotext Projects
Vladimir Alexiev, PhD, PMP
 
The RDFIndex-MTSR 2013
The RDFIndex-MTSR 2013The RDFIndex-MTSR 2013
The RDFIndex-MTSR 2013
CARLOS III UNIVERSITY OF MADRID
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
Dr. Haxel Consult
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
Sören Auer
 

What's hot (20)

Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open data
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...
 
II-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent OfficeII-SDV 2017: Towards Semantic Search at the European Patent Office
II-SDV 2017: Towards Semantic Search at the European Patent Office
 
Question answering in linked data
Question answering in linked dataQuestion answering in linked data
Question answering in linked data
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...II-PIC 2017: Gain insight into technical, legal and business information thro...
II-PIC 2017: Gain insight into technical, legal and business information thro...
 
CORFU-MTSR 2013
CORFU-MTSR 2013CORFU-MTSR 2013
CORFU-MTSR 2013
 
Linked Open Data and Ontotext Projects
Linked Open Data and Ontotext ProjectsLinked Open Data and Ontotext Projects
Linked Open Data and Ontotext Projects
 
The RDFIndex-MTSR 2013
The RDFIndex-MTSR 2013The RDFIndex-MTSR 2013
The RDFIndex-MTSR 2013
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 

Viewers also liked

Matching Workforce Skills with Employer Needs Now & into the Future
Matching Workforce Skills with Employer Needs Now & into the FutureMatching Workforce Skills with Employer Needs Now & into the Future
Matching Workforce Skills with Employer Needs Now & into the Future
nado-web
 
e-skills reshaping the future of learning
e-skills reshaping the future of learninge-skills reshaping the future of learning
e-skills reshaping the future of learning
@cristobalcobo
 
Day of data: skills for the future
Day of data: skills for the futureDay of data: skills for the future
Day of data: skills for the future
Steven Miller
 
Navigating the Changing Economic and Demographic Realities of the 21st Century
Navigating the Changing Economic and Demographic Realities of the 21st Century Navigating the Changing Economic and Demographic Realities of the 21st Century
Navigating the Changing Economic and Demographic Realities of the 21st Century
nado-web
 
How to hack into the big data team
How to hack into the big data teamHow to hack into the big data team
How to hack into the big data team
Data Science Thailand
 
99 Facts on the Future of Business in the Digital Economy
99 Facts on the Future of Business in the Digital Economy99 Facts on the Future of Business in the Digital Economy
99 Facts on the Future of Business in the Digital Economy
removed_98c8d4827eb0208c4db118838b8f6010
 
Official Slideshare for What's the Future of Business by Brian Solis #WTF
Official Slideshare for What's the Future of Business by Brian Solis #WTFOfficial Slideshare for What's the Future of Business by Brian Solis #WTF
Official Slideshare for What's the Future of Business by Brian Solis #WTF
Brian Solis
 

Viewers also liked (7)

Matching Workforce Skills with Employer Needs Now & into the Future
Matching Workforce Skills with Employer Needs Now & into the FutureMatching Workforce Skills with Employer Needs Now & into the Future
Matching Workforce Skills with Employer Needs Now & into the Future
 
e-skills reshaping the future of learning
e-skills reshaping the future of learninge-skills reshaping the future of learning
e-skills reshaping the future of learning
 
Day of data: skills for the future
Day of data: skills for the futureDay of data: skills for the future
Day of data: skills for the future
 
Navigating the Changing Economic and Demographic Realities of the 21st Century
Navigating the Changing Economic and Demographic Realities of the 21st Century Navigating the Changing Economic and Demographic Realities of the 21st Century
Navigating the Changing Economic and Demographic Realities of the 21st Century
 
How to hack into the big data team
How to hack into the big data teamHow to hack into the big data team
How to hack into the big data team
 
99 Facts on the Future of Business in the Digital Economy
99 Facts on the Future of Business in the Digital Economy99 Facts on the Future of Business in the Digital Economy
99 Facts on the Future of Business in the Digital Economy
 
Official Slideshare for What's the Future of Business by Brian Solis #WTF
Official Slideshare for What's the Future of Business by Brian Solis #WTFOfficial Slideshare for What's the Future of Business by Brian Solis #WTF
Official Slideshare for What's the Future of Business by Brian Solis #WTF
 

Similar to (Big) Data (Science) Skills

Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
Dr. Ananth Krishnamoorthy
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
Abdelkader OUARED
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
Dr. Ananth Krishnamoorthy
 
On Big Data
On Big DataOn Big Data
On Big Data
arttan2001
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
WU (Vienna University of Economics and Business)
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Tomasz Bednarz
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 
Data science meetup - Spiros Antonatos
Data science meetup - Spiros AntonatosData science meetup - Spiros Antonatos
Data science meetup - Spiros Antonatos
Spiros Antonatos
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
Gérard Dupont
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
kammeyer
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
Big Data Week
 
Big data and you
Big data and you Big data and you
Big data and you
IBM
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
Raminder Singh
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
sarith divakar
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Philip Filleul
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
Mihai Criveti
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 

Similar to (Big) Data (Science) Skills (20)

Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
On Big Data
On Big DataOn Big Data
On Big Data
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Data science meetup - Spiros Antonatos
Data science meetup - Spiros AntonatosData science meetup - Spiros Antonatos
Data science meetup - Spiros Antonatos
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Big data and you
Big data and you Big data and you
Big data and you
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 

More from Oscar Corcho

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Oscar Corcho
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020
Oscar Corcho
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
Oscar Corcho
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Oscar Corcho
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data Sharing
Oscar Corcho
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Oscar Corcho
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación Lumínica
Oscar Corcho
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experience
Oscar Corcho
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case study
Oscar Corcho
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...
Oscar Corcho
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101
Oscar Corcho
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET
Oscar Corcho
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
Oscar Corcho
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidad
Oscar Corcho
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016
Oscar Corcho
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Oscar Corcho
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart Cities
Oscar Corcho
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
Oscar Corcho
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Oscar Corcho
 
Big Data - El Futuro a través de los Datos
Big Data - El Futuro a través de los DatosBig Data - El Futuro a través de los Datos
Big Data - El Futuro a través de los Datos
Oscar Corcho
 

More from Oscar Corcho (20)

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data Sharing
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación Lumínica
 
Towards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experienceTowards Reproducible Science: a few building blocks from my personal experience
Towards Reproducible Science: a few building blocks from my personal experience
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case study
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidad
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart Cities
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
Big Data - El Futuro a través de los Datos
Big Data - El Futuro a través de los DatosBig Data - El Futuro a través de los Datos
Big Data - El Futuro a través de los Datos
 

Recently uploaded

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
Steven Carlson
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 

Recently uploaded (20)

Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 

(Big) Data (Science) Skills

  • 1. (Big) Data (Science) Skills Big Data Value Association Summit in Madrid 17/06/2015 Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho
  • 2. License • This work is licensed under the license CC BY-NC-SA 4.0 International • http://purl.org/NET/rdflicense/cc-by-nc-sa4.0 • You are free: • to Share — to copy, distribute and transmit the work • to Remix — to adapt the work • Under the following conditions • Attribution — You must attribute the work by inserting • “[source Oscar Corcho]” at the footer of each reused slide • a credits slide stating: “These slides are partially based on “(Big) Data (Science) Skills” by O. Corcho” • Non-commercial • Share-Alike
  • 3. Data Scientist: Technical and Soft Skills needed • One of the two or three pictures expected from a talk on skills… • I may start going through • Each of these topics • Discussing on the specific skills needed • However… Sorry, looking for the reference to add here
  • 4. What is Big Data? Source: http://www.philipchircop.com/post/25783275888/seeing-the-full-elephant-its-a-tree-its-a
  • 5. Big Data and the theory of ecological niches
  • 6. Characteristics of an ecological niche • A niche is defined by a spectrum of resource usage • Species differ from each other in how efficient they are in using resources that change continuously • Characteristics of a niche • Amplitude (range in which resources are used) • Generic species (they can use a wide range of resources) • Specialist species (they require a very specific combination of resources) • Overlap (similarity among niches in their usage of resources) • Competitive exclusion principle (Gause, 1934) • If two species coexist in a stable environment, they do it as a differentiation of their effective ecological niches. Source: Javier Seoane. Ecología. Unidad Temática 21. Teoría del nicho ecológico
  • 7. WHAT’S THE RELATIONSHIP TO BIG DATA? Well, that’s interesting, but…
  • 8. Big Data Niche 1. HPC and e-Infrastructure Experts Background: Computer Science (Systems) System Administration Terms used in their native language: Blades, Infiniband, OpenMPI, racks, HDF, TBs, Gflops Their daily life: Check system logs Make sure that queues are active Install a new rack What’s Big Data for them? A “commercial” term for something that they have done for a long time They really know how to configure and monitor a Hadoop cluster They would love seeing those talking about Big Data executing processes on fluid dynamics
  • 9. Big Data Niche 2. Data Storage and Access Experts Background: Computer Science Database administration Terms used in their native language: SQL, NoSQL, Column store Transacions, Hive, TBs/PBs/…, TPS (Transactions per s) Their daily life: Optimise several queries Run a new benchmark Design an optimiser/physical operator What’s Big Data for them? A new opportunity to work on optimisation algorithms They know how to configure a database They often laugh at those who deploy a NoSQL solution for a problem that can be solved with a relational database
  • 10. Big Data Niche 3. Machine Learning Experts Background: Mathematics, Statistics, Physics, Computer Science Terms used in their native language: Complexity, algorithm, p-value, convergence, precision, recall ROC curves, bayesian networks, R Their daily life: Read about a new problem Write down a few formulae in the whiteboard (even blackboards) Prove that the algorithm terminates What’s Big Data for them? The same problems applied to data of larger size, with new challenges Problems are not only solved in Haddop or a powerful NoSQL DB Astonished by those who still mix up correlation and causality
  • 11. Big Data Niche 4. Slow-data Experts Background: Computer Science, Statistics, Library Sciences, Linguistics Terms used in their native language: Information model, vocabulary, ontology, data quality, curation Their daily life: Receive a database schema Talk to data producers and (re)users Obtain consensus and transform data What’s Big Data for them? The difficulty lies on the variety of data formats and structures We may integrate data from varied sources, although this is not always possible When you manage to integrate heterogeneous data, you can achieve better results
  • 12. Big Data Niche 5. (Big Data) Consultants Background: Computer Science, Economy, … Terms used in their native language: Business model, business opportunity, Big Data, Data Value Chain, Hadoop, Spark, R, TBs, GFlops Their daily life: Read a Gartner Big Data report Talk to potential customers Transfer needs to technicians What’s Big Data for them? It’s the 4Vs, plus a few more I have a PPT presentation with a Big Data infrastructure, architecture, and previous projects, which I will use to sell a project to my customers
  • 13. Are we missing any ecological niche? • We have already seen a couple of ecological niches… • They all coexist • Some of them are overlapping Is there anyone that has not been yet considered?
  • 14. The evolution of a new species: the Data Scientist Background: Computer Science+Statistics+ +Mathematics+Economy+ … Terms used in their new exotic language: HPC, databases, algorithms, harmonisation, integration, Hadoop, Spark, R, TBs, GFlops Their daily life: Learn about a new infraestructure Code scripts to be run on Spark Interpret results Install a new framework Read a few scientific papers Make shiny presentations Describe in their blog the activities that they do, so that Big Data is better known and understood …
  • 15. © Volker Markl: “Data Scientist” – “Jack of All Trades!” Application Data Science Control Flow Iterative Algorithms Error Estimation Active Sampling Sketches Curse of Dimensionality Decoupling Convergence Monte Carlo Mathematical Programming Linear Algebra Stochastic Gradient Descent Regression Statistics Hashing Parallelization Query Optimization Fault Tolerance Relational Algebra / SQL Scalability Data Analysis Language Compiler Memory Management Memory Hierarchy Data Flow Hardware Adaptation Indexing Resource Management NF2 /XQuery Data Warehouse/OLAP Domain Expertise (e.g., Industry 4.0, Medicine, Physics, Engineering, Energy, Logistics) Real-Time
  • 16. Data Scientists and Pi-shaped people • Let’s now go into the expected discussion Sorry, looking for the reference to add here
  • 17. Will all species survive? • If Big Data defines an ecosystem… • Which species will survive? • Will Data Scientists wipe out the other species? • Or will they be able to live in perfect symbiosis? What is the ideal training required for the individuals of these species so that they can survive?
  • 18. Data Science starter kits. Are they effective?
  • 19. Masters in Data Science, Big Data and alike (I) Expert in Big Data Expert in Data Science
  • 20. Masters in Data Science, Big Data and alike (II)
  • 21. Masters in Data Science, Big Data and alike (III) Year 1 • Data handling • Data analysis • Advanced data analysis and data management • Visualization • Applications Year 2
  • 22. Are we doing it right in terms of training? • Probably it is all about lack of maturity in the area, but syllabi do not seem to be perfectly compatible… • It is not easy to believe that we can create Data Scientists in only one year • Should we train people to know a bit about everything? • Or should we separate more clearly the species in our ecosystem and specialise them better for their work? How do we manage to keep a healthy and stable ecosystem?
  • 23. Shameless self-promotion • Strategies for success in the Digital-Data Revolution • Separation of concerns • Intellectual ramps • Data-intensive knowledge discovery • Components and usage patterns • Data-intensive engineering • Development vs enactment • Data-intensive application experiences • In Science • In Business Can we learn from lessons learned in Data-Intensive Science?
  • 24. Separation of concerns: three clear profiles • Domain experts (WHAT) • They know the problems they want to solve • They know the application domain • They can create (scientific) workflows • Data-intensive analysts (WHAT) • They know a lot about (Big) data analysis • The may not know about the infrastructure behind the scenes • They do not necessarily know all the details of the application domain • Data-intensive engineers (HOW) • They know a lot about distributed computing/infraestructure/HPC/cloud s/etc. • They received the description of an algorithm and they can make it more efficient (parallelisation)
  • 25. Separation of concerns: Differentiated tasks [<select = "1<= day(inp.first.start)<=5", project="inp">, <select = "6<= day(inp.first.start)<=10", project="inp">, <select = "11<= day(inp.first.start)<=15", project="inp">, ... ] Programmable Filter Project outputs inp rules distrib "second.fURI ASC..." Sort outp data rule Sort outp data rule Sort outp data rule Sort outp data rule ["first,second"] Tuple Burst outp input structcols inputs Tuple Burst outp input structcols inputs Tuple Burst outp input structcols inputs Tuple Burst outp input structcols inputs De List opinp De List opinp De List opinp De List opinp inp CorrFarm User and application diversity System complexity Iterative "what" process development Mapping, optimisation, deployment and execution Accommodating and facilitating Several application domains Several tool sets Several process representations Several working practices DISPEL representation Composing and providing Many autonomous resources One enactment mechanism A single platform Gateway Tool level Enactment level Component library
  • 26. Conclusions • We all know that there are big opportunities in Big Data • But we need to be more productive. For that we need: • Create real multidisciplinary teams with at least three roles (application developers, data-intensive analysts and data-intensive engineers) • Understand that simply by using Hadoop, Spark or R we are not necessarily doing Big Data • The same as by coding in Java we are not necessarily understanding object-oriented programming • Understand that we have to interpret results adequately, from a scientific point of view • Understand the importance of homogeneising datasets, in order to facilitate their integration (slow-data) • Continue working on delivering tools that can be used to develop Big Data applications more productively • Should we also be funding this?
  • 27. (Big) Data (Science) Skills Big Data Value Association Summit in Madrid 17/06/2015 Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho