SlideShare a Scribd company logo
Dr. Andrés Gómez
agomez@cesga.es
Feb. 2017
Data Science- Infraestruturasde suporte
(Data Science – Support Infrastructures)
CESGA Mission
“Contribute to the advancement of Science and
Technical Knowledge, by means of research and
application of high performance computing and
communications, as well as other information
technologies resources, in collaboration with other
institutions, for the profit of society”
Contribuir ao avanço da Ciência e a Técnica, mediante a investigação e aplicação de
computação e comunicações de altas prestações, bem como outros recursos das
tecnologias da informação, em colaboração com outras instituições, para o benefício da
Sociedade
CESGA activities
PT Academic Network to Geant
Universities (mainly from Galicia)
R&D&I centres (mainly from Galicia)
CSIC (around Spain)
Other institutions from Spain and Europe:
 Hospitals (ONLY R&D)
 Companies (mainly SMEs)
 Other non-profit R&D&I organizations
 Non-Fee Access for Europeans through:
 RES open calls
 PRACE open calls
Our Customers
CESGA ComputingInfrastructure
2.200 TB
FINIS TERRAE II:
HPC
7,712 cores
SVG:
HTC and
Cloud
~ 3.300 cores
Online Disk
1200 TB
Cloud for
Industry
240 cores
BigData
456
Cores
Remote Visualisation
80 cores
Infrastructures for Data Science
What isBig Data?
Why now:
 Produce data is very cheap (sensors, people, ….)
 Storage is also cheap
 Unstructured and high-dimensional data
Big Data consists of extensive datasets - primarily in the
characteristics of volume, variety, velocity, and/or variability - that
require a scalable architecture for efficient storage, manipulation,
and analysis
NIST Big Data Public Working Group. (2015). NIST Big Data Interoperability Framework: Volume 1, Definitions. NIST
Special Publication (Vol. 1). Gaithersburg, MD. Retrieved from http://dx.doi.org/10.6028/NIST.SP.1500-1
V’s Big Data Challenges
Volume Velocity
Variety
Veracity
Value
Added-Value or Knowledge
Variability
Adapted from: Demchenko, Y., Grosso, P., & Membrey, P. (2013). Addressing Big Data Issues in Scientific Data Infrastructure.
Collaboration Technologies and Systems (CTS), 2013 International Conference on (Pp. 48-55). IEEE., 48–55.
http://doi.org/10.1109/CTS.2013.6567203
What isData Science?
Data science is the extraction of actionable knowledge directly
from data through a process of discovery, or hypothesis
formulation and hypothesis testing.
NIST Big Data Public Working Group. (2015). NIST Big Data Interoperability Framework: Volume 1, Definitions. NIST
Special Publication (Vol. 1). Gaithersburg, MD. Retrieved from http://dx.doi.org/10.6028/NIST.SP.1500-1
Data Scientist: A
Champion !
Collaboration is
better
Architecture
(NBD-PWG), N. B. D. P. W. G. (2015). NIST Big Data Interoperability Framework: Volume 6, Reference Architecture (Vol. 6).
Gaithersburg, MD. Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-6.pdf
Big Data Requirements
Very Large Storage (TB, PB, EB,…)
Parallel Very Fast I/O (GB/s)
Computing capacity (move process to data)
Parallel processing.
Interactive, streamed and batch.
Visualisation (first step data analysis)
Advanced Data Analytics and ML packages
Remote Access
Etc
HETEROGENEOUS NEEDS &
USER PROFILES
HETEROGENEOUS
INFRASTRUCTURE &
ACCESS MODES
CESGASolucion:Static
Based on Hortonworks HDP
HARDWARE PLATFORM FOR BIG DATA
HDFS
YARN
MAP
REDUCE
HBASESPARK HIVE
Jupyter/Hue/Zeppelin/R
CESGASolucion:Dynamic
Create your own cluster for Data Science
HARDWARE PLATFORM FOR BIG DATA
DOCKER
MESOS
Your
Config
Cluster
CassandraSPARK SciDB
PaaS API
WEB Interface
CESGASolution:HPC
When data processing needs large computing
HARDWARE
PLATFORM FOR HPC
+ GPUs
HIGH PERFORMANCE
STORAGE: LUSTRE
HIGH SPEED COMM: IB
Theano TensorflowR Caffe
SLURM
WEB Interface/Remote Desktop SSH
CESGAData Scientist
CESGA has no Data Scientist
CESGA offers this service in collaboration
Open to collaborations in Portugal
Infraestructuras data science_portugal_ipca_industry_4.0_v2

More Related Content

What's hot

Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
IJERA Editor
 
20080719 Esof Open Data Voegler
20080719 Esof Open Data Voegler20080719 Esof Open Data Voegler
OPF CyVerse Austria
OPF CyVerse AustriaOPF CyVerse Austria
OPF CyVerse Austria
Konrad Lang
 
Jan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortiumJan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortium
"Open Access - Open Data" conference, 13th/14th December, 2010
 
David Carter publications and associated research grants 2015
David Carter publications and associated research grants 2015David Carter publications and associated research grants 2015
David Carter publications and associated research grants 2015
Dave Carter
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
Vivien Bonazzi
 
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
EUDAT
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
Edward Curry
 
Ands National Identifier Solution
Ands National Identifier SolutionAnds National Identifier Solution
Ands National Identifier Solution
Andrew Treloar
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
Petr Novotný
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
Wolfgang Kuchinke
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
Woodruff Solutions LLC
 
Big Data
Big DataBig Data
Big Data
Vinayak Kamath
 
Big data - Cassandra
Big data - CassandraBig data - Cassandra
Big data - Cassandra
Jen Wei Lee
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
EUDAT
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
EUDAT
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
Microsoft Azure for Research
 
Big data in action
Big data in actionBig data in action
Big data in action
Chad Richeson
 
20070919 Bkt Padua Esf Dfg Workshop Intro
20070919 Bkt Padua Esf Dfg Workshop Intro20070919 Bkt Padua Esf Dfg Workshop Intro

What's hot (20)

Big data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing PlatformsBig data Mining Using Very-Large-Scale Data Processing Platforms
Big data Mining Using Very-Large-Scale Data Processing Platforms
 
20080719 Esof Open Data Voegler
20080719 Esof Open Data Voegler20080719 Esof Open Data Voegler
20080719 Esof Open Data Voegler
 
OPF CyVerse Austria
OPF CyVerse AustriaOPF CyVerse Austria
OPF CyVerse Austria
 
Jan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortiumJan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortium
 
David Carter publications and associated research grants 2015
David Carter publications and associated research grants 2015David Carter publications and associated research grants 2015
David Carter publications and associated research grants 2015
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
The Data Lifecycle - EUDAT Summer School (Yann Le Franc)
 
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
 
Ands National Identifier Solution
Ands National Identifier SolutionAnds National Identifier Solution
Ands National Identifier Solution
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
Big Data
Big DataBig Data
Big Data
 
Big data - Cassandra
Big data - CassandraBig data - Cassandra
Big data - Cassandra
 
Modeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROVModeling Data Life Cycles with PROV
Modeling Data Life Cycles with PROV
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
 
Big data in action
Big data in actionBig data in action
Big data in action
 
20070919 Bkt Padua Esf Dfg Workshop Intro
20070919 Bkt Padua Esf Dfg Workshop Intro20070919 Bkt Padua Esf Dfg Workshop Intro
20070919 Bkt Padua Esf Dfg Workshop Intro
 

Similar to Infraestructuras data science_portugal_ipca_industry_4.0_v2

eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
e-ROSA
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Research Data Alliance
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
Han Woo PARK
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-Research
David De Roure
 
UK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing ParticipationUK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing Participation
Neil Chue Hong
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhury
maredata
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
Dr. Mohan K. Bavirisetty
 
Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015
Jisc
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
Dhruv Saxena
 
Big Data for the Social Sciences
Big Data for the Social SciencesBig Data for the Social Sciences
Big Data for the Social Sciences
David De Roure
 
PDT: Personal Data from Things, and its provenance
PDT: Personal Data from Things,and its provenancePDT: Personal Data from Things,and its provenance
PDT: Personal Data from Things, and its provenance
Paolo Missier
 
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
University of Bologna
 
e-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE Francee-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE France
Jean-François Lutz
 
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
DataScienceConferenc1
 
Research Methodology (how to choose Datasets ).pptx
Research Methodology (how to choose Datasets ).pptxResearch Methodology (how to choose Datasets ).pptx
Research Methodology (how to choose Datasets ).pptx
Zainab Alhassani
 
Research Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group SessionResearch Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group Session
amiraryani
 
Jisc visions: research
Jisc visions: researchJisc visions: research
Jisc visions: research
Jisc
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Carole Goble
 
data science
data sciencedata science
data science
skhraletta
 

Similar to Infraestructuras data science_portugal_ipca_industry_4.0_v2 (20)

eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생Mapping (big) data science (15 dec2014)대학(원)생
Mapping (big) data science (15 dec2014)대학(원)생
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-Research
 
UK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing ParticipationUK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing Participation
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhury
 
Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0Data Scientist Enablement roadmap 1.0
Data Scientist Enablement roadmap 1.0
 
Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Big Data for the Social Sciences
Big Data for the Social SciencesBig Data for the Social Sciences
Big Data for the Social Sciences
 
PDT: Personal Data from Things, and its provenance
PDT: Personal Data from Things,and its provenancePDT: Personal Data from Things,and its provenance
PDT: Personal Data from Things, and its provenance
 
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
 
e-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE Francee-infrastructures supporting open knowledge circulation - OpenAIRE France
e-infrastructures supporting open knowledge circulation - OpenAIRE France
 
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
 
Research Methodology (how to choose Datasets ).pptx
Research Methodology (how to choose Datasets ).pptxResearch Methodology (how to choose Datasets ).pptx
Research Methodology (how to choose Datasets ).pptx
 
Research Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group SessionResearch Data Alliance Plenary 9: DDRI Working Group Session
Research Data Alliance Plenary 9: DDRI Working Group Session
 
Jisc visions: research
Jisc visions: researchJisc visions: research
Jisc visions: research
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
data science
data sciencedata science
data science
 

More from Andrés Gómez

HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.
Andrés Gómez
 
A Web-platform for radiotherapy, a new workflow concept and an information sh...
A Web-platform for radiotherapy, a new workflow concept and an information sh...A Web-platform for radiotherapy, a new workflow concept and an information sh...
A Web-platform for radiotherapy, a new workflow concept and an information sh...
Andrés Gómez
 
Federated HPC Clouds Applied to Radiation Therapy
Federated HPC Clouds Applied to Radiation TherapyFederated HPC Clouds Applied to Radiation Therapy
Federated HPC Clouds Applied to Radiation Therapy
Andrés Gómez
 
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Andrés Gómez
 
Software libre y modelos de programación en la investigación con supercomputa...
Software libre y modelos de programación en la investigación con supercomputa...Software libre y modelos de programación en la investigación con supercomputa...
Software libre y modelos de programación en la investigación con supercomputa...
Andrés Gómez
 
Role of public supercomputing centers in the promotion of HPC on Cloud: the C...
Role of public supercomputing centers in the promotion of HPC on Cloud: the C...Role of public supercomputing centers in the promotion of HPC on Cloud: the C...
Role of public supercomputing centers in the promotion of HPC on Cloud: the C...
Andrés Gómez
 
VCOC BonFIRE presentation at FIRE Engineering Workshop 2012
VCOC BonFIRE presentation at FIRE Engineering Workshop 2012VCOC BonFIRE presentation at FIRE Engineering Workshop 2012
VCOC BonFIRE presentation at FIRE Engineering Workshop 2012
Andrés Gómez
 

More from Andrés Gómez (7)

HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.HPC on Cloud for SMEs. The case of bolt tightening.
HPC on Cloud for SMEs. The case of bolt tightening.
 
A Web-platform for radiotherapy, a new workflow concept and an information sh...
A Web-platform for radiotherapy, a new workflow concept and an information sh...A Web-platform for radiotherapy, a new workflow concept and an information sh...
A Web-platform for radiotherapy, a new workflow concept and an information sh...
 
Federated HPC Clouds Applied to Radiation Therapy
Federated HPC Clouds Applied to Radiation TherapyFederated HPC Clouds Applied to Radiation Therapy
Federated HPC Clouds Applied to Radiation Therapy
 
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
 
Software libre y modelos de programación en la investigación con supercomputa...
Software libre y modelos de programación en la investigación con supercomputa...Software libre y modelos de programación en la investigación con supercomputa...
Software libre y modelos de programación en la investigación con supercomputa...
 
Role of public supercomputing centers in the promotion of HPC on Cloud: the C...
Role of public supercomputing centers in the promotion of HPC on Cloud: the C...Role of public supercomputing centers in the promotion of HPC on Cloud: the C...
Role of public supercomputing centers in the promotion of HPC on Cloud: the C...
 
VCOC BonFIRE presentation at FIRE Engineering Workshop 2012
VCOC BonFIRE presentation at FIRE Engineering Workshop 2012VCOC BonFIRE presentation at FIRE Engineering Workshop 2012
VCOC BonFIRE presentation at FIRE Engineering Workshop 2012
 

Recently uploaded

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 

Recently uploaded (20)

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 

Infraestructuras data science_portugal_ipca_industry_4.0_v2

  • 1. Dr. Andrés Gómez agomez@cesga.es Feb. 2017 Data Science- Infraestruturasde suporte (Data Science – Support Infrastructures)
  • 2.
  • 3. CESGA Mission “Contribute to the advancement of Science and Technical Knowledge, by means of research and application of high performance computing and communications, as well as other information technologies resources, in collaboration with other institutions, for the profit of society” Contribuir ao avanço da Ciência e a Técnica, mediante a investigação e aplicação de computação e comunicações de altas prestações, bem como outros recursos das tecnologias da informação, em colaboração com outras instituições, para o benefício da Sociedade
  • 6. Universities (mainly from Galicia) R&D&I centres (mainly from Galicia) CSIC (around Spain) Other institutions from Spain and Europe:  Hospitals (ONLY R&D)  Companies (mainly SMEs)  Other non-profit R&D&I organizations  Non-Fee Access for Europeans through:  RES open calls  PRACE open calls Our Customers
  • 7. CESGA ComputingInfrastructure 2.200 TB FINIS TERRAE II: HPC 7,712 cores SVG: HTC and Cloud ~ 3.300 cores Online Disk 1200 TB Cloud for Industry 240 cores BigData 456 Cores Remote Visualisation 80 cores
  • 9. What isBig Data? Why now:  Produce data is very cheap (sensors, people, ….)  Storage is also cheap  Unstructured and high-dimensional data Big Data consists of extensive datasets - primarily in the characteristics of volume, variety, velocity, and/or variability - that require a scalable architecture for efficient storage, manipulation, and analysis NIST Big Data Public Working Group. (2015). NIST Big Data Interoperability Framework: Volume 1, Definitions. NIST Special Publication (Vol. 1). Gaithersburg, MD. Retrieved from http://dx.doi.org/10.6028/NIST.SP.1500-1
  • 10. V’s Big Data Challenges Volume Velocity Variety Veracity Value Added-Value or Knowledge Variability Adapted from: Demchenko, Y., Grosso, P., & Membrey, P. (2013). Addressing Big Data Issues in Scientific Data Infrastructure. Collaboration Technologies and Systems (CTS), 2013 International Conference on (Pp. 48-55). IEEE., 48–55. http://doi.org/10.1109/CTS.2013.6567203
  • 11. What isData Science? Data science is the extraction of actionable knowledge directly from data through a process of discovery, or hypothesis formulation and hypothesis testing. NIST Big Data Public Working Group. (2015). NIST Big Data Interoperability Framework: Volume 1, Definitions. NIST Special Publication (Vol. 1). Gaithersburg, MD. Retrieved from http://dx.doi.org/10.6028/NIST.SP.1500-1 Data Scientist: A Champion ! Collaboration is better
  • 12. Architecture (NBD-PWG), N. B. D. P. W. G. (2015). NIST Big Data Interoperability Framework: Volume 6, Reference Architecture (Vol. 6). Gaithersburg, MD. Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-6.pdf
  • 13. Big Data Requirements Very Large Storage (TB, PB, EB,…) Parallel Very Fast I/O (GB/s) Computing capacity (move process to data) Parallel processing. Interactive, streamed and batch. Visualisation (first step data analysis) Advanced Data Analytics and ML packages Remote Access Etc
  • 14. HETEROGENEOUS NEEDS & USER PROFILES HETEROGENEOUS INFRASTRUCTURE & ACCESS MODES
  • 15. CESGASolucion:Static Based on Hortonworks HDP HARDWARE PLATFORM FOR BIG DATA HDFS YARN MAP REDUCE HBASESPARK HIVE Jupyter/Hue/Zeppelin/R
  • 16. CESGASolucion:Dynamic Create your own cluster for Data Science HARDWARE PLATFORM FOR BIG DATA DOCKER MESOS Your Config Cluster CassandraSPARK SciDB PaaS API WEB Interface
  • 17. CESGASolution:HPC When data processing needs large computing HARDWARE PLATFORM FOR HPC + GPUs HIGH PERFORMANCE STORAGE: LUSTRE HIGH SPEED COMM: IB Theano TensorflowR Caffe SLURM WEB Interface/Remote Desktop SSH
  • 18. CESGAData Scientist CESGA has no Data Scientist CESGA offers this service in collaboration Open to collaborations in Portugal