Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data	Analytics and	Downscaling for
Climate Research in	a	Big Data	World
Status and	Perspectives
Christian	Pagé
High-Level	...
CERFACS/Toulouse:	High	Performance	Computing	Research	Centre
u Develop	scientific	and	technical	researchesin	order	to	impr...
Outline
u Motivations:	Scientific,	Technical,	Societal
u Current	Situation	and	Issues:	we	have	to	improve
u Background:	th...
Motivations:	Scientific,	Technical,	Societal
Scientific
§ Perform	efficient	Data	Analysis
• Large	number	of	realizations	(...
Motivations:	Scientific,	Technical,	Societal
Scientific
– Every developmentwork in climate modelling comprises comparison
...
Motivations:	Scientific,	Technical,	Societal
Technical
§ Process	large	data	volumes,	ideally	near(er)	the	data	storage
• D...
Motivations:	Scientific,	Technical,	Societal
Societal
§ Provide	climate	projections	data	to	climate	change	impact	
researc...
Current	situation
Climate	Research	Community
u Data	available	for	scientific	analysis:	a	very	large	
trend
§ Limitations	i...
Current	situation
Practical	Example:	Climate	Community
Federation	
Service
• Temperature at 850 hPa field (Aggregated file...
Background:	ESGF/ENES
7.05.2014EGU
!
IS-ENES ESGF Portals
• BADC (UK)
• DKRZ (Germany)
• IPSL (France)
• SMHI (Sweden)
• C...
Background:	Climate	Data	Life	Cycle
Background
ESGF	Future	Computing	Nodes:	API
u Goal:	perform	data	analysis	near	the	data	storage
§ Better	data	access
§ Mov...
Background:	EUDAT
B2ACCESS
B2Handle
Background:	EUDAT	CDI
14
Community	Repositories
(thematic	data	centres)
EUDAT	generic	data	
service	provider	
storage,	wor...
Background:	EGI
24	countries
1	coordinating	organization	– EGI.eu
Solutions:	Building	Blocks
u ESGF
§ Federation	of	Peer-to-Peer	Data	Nodes
§ Computing	Nodes	API	(Using	ISO-OGC	WPS)
§ Open...
Solutions:	Putting	it	all	together
u Bridge EUDAT	/	EGI	/	BDE	/	ESGF	/	IS-ENES
§ EUDAT	B2ACCESS	AAI	ó ESGF	OpenID
§ EUDAT	...
Support	||	Support	&&	Support
u Support	is	key for	users
§ Heterogeneous	base	of	users
• Climate	Science	Researchers	(seni...
• Infrastructure to access relevant climate data
– Global & Regional Climate Model projections
– Observations: satellite, ...
Questions!	J
• Infrastructure to access relevant climate data
– Global & Regional Climate Model projections
– Observations: satellite, ...
Solutions:	Putting	it	all	together
User User ...
EUDAT	B2ACCESS
AAI
AAIEGI
Upcoming SlideShare
Loading in …5
×

Data analytics and downscaling for climate research in a big data world

797 views

Published on

Presented by Christian Pagé (CERFACS) during the 2nd BDE SC5 workshop, 11 October 2016, in Brussels, Belgium

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data analytics and downscaling for climate research in a big data world

  1. 1. Data Analytics and Downscaling for Climate Research in a Big Data World Status and Perspectives Christian Pagé High-Level Research Engineer, Data Scientist and Project Manager Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique (CERFACS)
  2. 2. CERFACS/Toulouse: High Performance Computing Research Centre u Develop scientific and technical researchesin order to improve advanced computing methods u Access to computers with new architecture u Transfer this scientific knowledge and technical methods for application to big industrial sectors u Train high qualified people u 2015: The tenth computer set at CERFACS since 1996 occupies the 388° place in the top 500 delivering a peak power of 242 Tflop/s
  3. 3. Outline u Motivations: Scientific, Technical, Societal u Current Situation and Issues: we have to improve u Background: the Big Players u Solutions: working on it § Building Blocks § Putting it all together u Support || Support && Support u Summary and Perspectives
  4. 4. Motivations: Scientific, Technical, Societal Scientific § Perform efficient Data Analysis • Large number of realizations (ensemble of scenarios) • Uncertainties range estimation • Process Higher spatial and temporal resolution • Easily share intermediate results with collaborators § Achieve a more robust and flexible Data Life Cycle • More robust experiments setup – Explore several experiment configurations to answer scientific questions • Reproducible experiments
  5. 5. Motivations: Scientific, Technical, Societal Scientific – Every developmentwork in climate modelling comprises comparison of realizations • I introduced this small change.... • What happened to my model? • Does it work? • Does it work in the expected way? • Are there consequences I did not expect? • ...
  6. 6. Motivations: Scientific, Technical, Societal Technical § Process large data volumes, ideally near(er) the data storage • Data Analytics • Data Life Cycle § Streamline the data processing workflow § Proper metadata description of the data objects § Properly track provenance information § Interconnect e-infrastructures and research infrastructures services, interfaces & platforms • EUDAT <=> ESGF
  7. 7. Motivations: Scientific, Technical, Societal Societal § Provide climate projections data to climate change impact researchers, facilitators, practitioners • Ease access with better intuitive interfaces • Provide more common data formats • Generate tailored products from data processing workflows http://climate4impact.eu
  8. 8. Current situation Climate Research Community u Data available for scientific analysis: a very large trend § Limitations in data access means limitations in data analytics and scientific results u Download locally then Analyze: a workflow that cannot be sustained § Climate researchers § Impact researchers
  9. 9. Current situation Practical Example: Climate Community Federation Service • Temperature at 850 hPa field (Aggregated files 30 levels) • 10 climate models • 1960-1990 & 2040-2070 = 60 years = 21 915 days • Daily fields = 1 field per day • Global spatial scale 100 km resolution TOTAL: 6 754 500 fields to download ~100 Kb per 2D field = 626 Gb After the analysis post-processing • Anomaly of the average of the two periods over a specific country for each climate model • Result: 10 times 2D fields over a small domain • Estimated datasize after post-processing: 1 Mb Data reduction...
  10. 10. Background: ESGF/ENES 7.05.2014EGU ! IS-ENES ESGF Portals • BADC (UK) • DKRZ (Germany) • IPSL (France) • SMHI (Sweden) • CMCC (Italy) • DMI (Denmark) ESGF Data Nodes 2015: • 40 worldwide • 18 in Europe (coordinated in IS-ENES) IS-ENES climate4impact Portal • KNMI (Netherlands) • Interlinked with Uni. Cantabria downscaling portal (Spain) CLIPC Portal • Climate Information Portal for CopernicusAck: Michael Lautenschlager, DKRZ
  11. 11. Background: Climate Data Life Cycle
  12. 12. Background ESGF Future Computing Nodes: API u Goal: perform data analysis near the data storage § Better data access § Move away from the download/analyze workflow
  13. 13. Background: EUDAT B2ACCESS B2Handle
  14. 14. Background: EUDAT CDI 14 Community Repositories (thematic data centres) EUDAT generic data service provider storage, workflows, processing, archive
  15. 15. Background: EGI 24 countries 1 coordinating organization – EGI.eu
  16. 16. Solutions: Building Blocks u ESGF § Federation of Peer-to-Peer Data Nodes § Computing Nodes API (Using ISO-OGC WPS) § OpenID Authentication/Authorization u EUDAT § GEF API for deploying calculations (workflows) § B2 Services for orchestration and storage u IS-ENES § Data Analytics and Statistical Downscaling Services => climate4impact.eu platform u Big Data Europe (BDE) Software & Platform
  17. 17. Solutions: Putting it all together u Bridge EUDAT / EGI / BDE / ESGF / IS-ENES § EUDAT B2ACCESS AAI ó ESGF OpenID § EUDAT GEF API ó • ESGF Computing API WPS • EGI Federated Cloud • Big Data Europe Platform Technologies • IS-ENES Data Analytics and Statistical Downscaling Services => climate4impact.eu platform
  18. 18. Support || Support && Support u Support is key for users § Heterogeneous base of users • Climate Science Researchers (seniors, post-docs, PhD students, ...) • Climate change impact modelers • Facilitators (climate change impact studies) • Scientific researchers from other sciences u All kind of supports § Contextual/Dynamic Help § Use Cases / Storylines § Background Information § "Hotline"
  19. 19. • Infrastructure to access relevant climate data – Global & Regional Climate Model projections – Observations: satellite, reanalyses, surface • Community Services with standard interfaces – On-demand downscaling services – On-demand product calculations » Climate indices and indicators » Bias-correction – On-demand calculations » Reduce datasize to be transferred over the network » Ease access to calculations to end users – Support to heterogeneous users • EUDAT B2 Services + IS-ENES/ESGF + EGI + BDE => Bridge e-infrastructure to research infrastructures – Provide Data Analytics & Tailoring to users – Enhance and Ease Data Sharing & Discovery – Provide support for "Long Tail of Science" (LToS) • Big Data Techniques Exploration – Data Mining for Geophysical Data – Storage Precision Summary and Perspectives
  20. 20. Questions! J
  21. 21. • Infrastructure to access relevant climate data – Global & Regional Climate Model projections – Observations: satellite, reanalyses, surface • Services with standard interfaces – On-demand downscaling services – On-demand product calculations » Climate indices and indicators » Bias-correction – On-demand calculations » Reduce datasize to be transferred over the network • ESGF Computing Node Calculation Services » Tailor data to users (data extraction, subsetting, averaging, ...) • Big Data Techniques Exploration – Data Mining for Geophysical Data – Storage Precision Summary and Perspectives
  22. 22. Solutions: Putting it all together User User ... EUDAT B2ACCESS AAI AAIEGI

×