Infraestructuras data science_portugal_ipca_industry_4.0_v2

Dr. Andrés Gómez
agomez@cesga.es
Feb. 2017
Data Science- Infraestruturasde suporte
(Data Science – Support Infrastructures)

CESGA Mission
“Contribute to the advancement of Science and
Technical Knowledge, by means of research and
application of high performance computing and
communications, as well as other information
technologies resources, in collaboration with other
institutions, for the profit of society”
Contribuir ao avanço da Ciência e a Técnica, mediante a investigação e aplicação de
computação e comunicações de altas prestações, bem como outros recursos das
tecnologias da informação, em colaboração com outras instituições, para o benefício da
Sociedade

Universities (mainly from Galicia)
R&D&I centres (mainly from Galicia)
CSIC (around Spain)
Other institutions from Spain and Europe:
 Hospitals (ONLY R&D)
 Companies (mainly SMEs)
 Other non-profit R&D&I organizations
 Non-Fee Access for Europeans through:
 RES open calls
 PRACE open calls
Our Customers

CESGA ComputingInfrastructure
2.200 TB
FINIS TERRAE II:
HPC
7,712 cores
SVG:
HTC and
Cloud
~ 3.300 cores
Online Disk
1200 TB
Cloud for
Industry
240 cores
BigData
456
Cores
Remote Visualisation
80 cores

Infrastructures for Data Science

What isBig Data?
Why now:
 Produce data is very cheap (sensors, people, ….)
 Storage is also cheap
 Unstructured and high-dimensional data
Big Data consists of extensive datasets - primarily in the
characteristics of volume, variety, velocity, and/or variability - that
require a scalable architecture for efficient storage, manipulation,
and analysis
NIST Big Data Public Working Group. (2015). NIST Big Data Interoperability Framework: Volume 1, Definitions. NIST
Special Publication (Vol. 1). Gaithersburg, MD. Retrieved from http://dx.doi.org/10.6028/NIST.SP.1500-1

V’s Big Data Challenges
Volume Velocity
Variety
Veracity
Value
Added-Value or Knowledge
Variability
Adapted from: Demchenko, Y., Grosso, P., & Membrey, P. (2013). Addressing Big Data Issues in Scientific Data Infrastructure.
Collaboration Technologies and Systems (CTS), 2013 International Conference on (Pp. 48-55). IEEE., 48–55.
http://doi.org/10.1109/CTS.2013.6567203

What isData Science?
Data science is the extraction of actionable knowledge directly
from data through a process of discovery, or hypothesis
formulation and hypothesis testing.
NIST Big Data Public Working Group. (2015). NIST Big Data Interoperability Framework: Volume 1, Definitions. NIST
Special Publication (Vol. 1). Gaithersburg, MD. Retrieved from http://dx.doi.org/10.6028/NIST.SP.1500-1
Data Scientist: A
Champion !
Collaboration is
better

Architecture
(NBD-PWG), N. B. D. P. W. G. (2015). NIST Big Data Interoperability Framework: Volume 6, Reference Architecture (Vol. 6).
Gaithersburg, MD. Retrieved from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-6.pdf

Big Data Requirements
Very Large Storage (TB, PB, EB,…)
Parallel Very Fast I/O (GB/s)
Computing capacity (move process to data)
Parallel processing.
Interactive, streamed and batch.
Visualisation (first step data analysis)
Advanced Data Analytics and ML packages
Remote Access
Etc

HETEROGENEOUS NEEDS &
USER PROFILES
HETEROGENEOUS
INFRASTRUCTURE &
ACCESS MODES

CESGASolucion:Static
Based on Hortonworks HDP
HARDWARE PLATFORM FOR BIG DATA
HDFS
YARN
MAP
REDUCE
HBASESPARK HIVE
Jupyter/Hue/Zeppelin/R

CESGASolucion:Dynamic
Create your own cluster for Data Science
HARDWARE PLATFORM FOR BIG DATA
DOCKER
MESOS
Your
Config
Cluster
CassandraSPARK SciDB
PaaS API
WEB Interface

CESGASolution:HPC
When data processing needs large computing
HARDWARE
PLATFORM FOR HPC
+ GPUs
HIGH PERFORMANCE
STORAGE: LUSTRE
HIGH SPEED COMM: IB
Theano TensorflowR Caffe
SLURM
WEB Interface/Remote Desktop SSH

CESGAData Scientist
CESGA has no Data Scientist
CESGA offers this service in collaboration
Open to collaborations in Portugal

Infraestructuras data science_portugal_ipca_industry_4.0_v2

Infraestructuras data science_portugal_ipca_industry_4.0_v2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Infraestructuras data science_portugal_ipca_industry_4.0_v2

Similar to Infraestructuras data science_portugal_ipca_industry_4.0_v2 (20)

More from Andrés Gómez

More from Andrés Gómez (7)

Recently uploaded

Recently uploaded (20)

Infraestructuras data science_portugal_ipca_industry_4.0_v2