This document summarizes work on auto-tuning facilities for supercomputers in operation. It discusses the FIBER approach, which performs auto-tuning at install-time, before execution, and run-time to minimize software stack requirements. It applies ppOpen-AT, an auto-tuning language, to an explicit finite difference method application called Seism3D. Various loop optimizations are explored, including loop splitting, fusion into one-dimensional and two-dimensional loops, to improve parallelism and pre-fetching opportunities. Performance is then evaluated on the target application and optimizations.
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0Takahiro Katagiri
This is a material for overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0, which is numerical simulation software of a seismic wave analysis with function of automatic performance tuning (AT). Project of ppOpen-HPC is developing and supporting for this software. The effect of AT is shown with respect to several recent computer environments, such as multi-core (Ivy Bridge) and many-core (Xeon Phi).
By Tobias Grosser, Scalable Parallel Computing Laboratory
The COSMO climate and weather model delivers daily forecasts for Switzerland and many other nations. As a traditional HPC application it was developed with SIMD-CPUs in mind and large manual efforts were required to enable the 2016 move to GPU acceleration. As today's high-performance computer systems increasingly rely on accelerators to reach peak performance and manual translation to accelerators is both costly and difficult to maintain, we propose a fully automatic accelerator compiler for the automatic translation of scientific Fortran codes to CUDA GPU accelerated systems. Several challenges had to be overcome to make this reality: 1) improved scalability, 2) automatic data placement using unified memory, 3) loop rescheduling to expose coarse-grained parallelism, 4) inter-procedural loop optimization, and 5) plenty of performance tuning. Our evaluation shows that end-to-end automatic accelerator compilation is possible for non-trivial portions of the COSMO climate model, despite the lack of complete static information. Non-trivial loop optimizations previously implemented manually are performed fully automatically and memory management happens fully transparently using unified memory. Our preliminary results show notable performance improvements over sequential CPU code (40s to 8s reduction in execution time) and we are currently working on closing the remaining gap to hand-tuned GPU code. This talk is a status update on our most recent efforts and also intended to gather feedback on future research plans towards automatically mapping COSMO to FPGAs.
Tobias Grosser Bio
Tobias Grosser is a senior researcher in the Scalable Parallel Computing Laboratory (SPCL) of Torsten Hoefler at the Computer Science Department of ETH Zürich. Supported by a Google PhD Fellowship he received his doctoral degree from Universite Pierre et Marie Curie under the supervision of Albert Cohen. Tobias' research is taking place at the border of low-level compilers and high-level program transformations with the goal of enabling complex - but highly-beneficial - program transformations in a production compiler environment. He develops with the Polly loop optimizer a loop transformation framework which today is a community project supported throught the Polly Labs research laboratory. Tobias also developed advanced tiling schemes for the efficient execution of iterated stencils. Today Tobias leads the heterogeneous compute efforts in the Swiss University funded ComPASC project and is about to start a three year NSF Ambizione project on advancing automatic compilation and heterogenization techniques at ETH Zurich.
Email
bgerofi@riken.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
Overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0Takahiro Katagiri
This is a material for overview of ppOpen-AT/Static for ppOpen-APPL/FDM ver. 0.2.0, which is numerical simulation software of a seismic wave analysis with function of automatic performance tuning (AT). Project of ppOpen-HPC is developing and supporting for this software. The effect of AT is shown with respect to several recent computer environments, such as multi-core (Ivy Bridge) and many-core (Xeon Phi).
By Tobias Grosser, Scalable Parallel Computing Laboratory
The COSMO climate and weather model delivers daily forecasts for Switzerland and many other nations. As a traditional HPC application it was developed with SIMD-CPUs in mind and large manual efforts were required to enable the 2016 move to GPU acceleration. As today's high-performance computer systems increasingly rely on accelerators to reach peak performance and manual translation to accelerators is both costly and difficult to maintain, we propose a fully automatic accelerator compiler for the automatic translation of scientific Fortran codes to CUDA GPU accelerated systems. Several challenges had to be overcome to make this reality: 1) improved scalability, 2) automatic data placement using unified memory, 3) loop rescheduling to expose coarse-grained parallelism, 4) inter-procedural loop optimization, and 5) plenty of performance tuning. Our evaluation shows that end-to-end automatic accelerator compilation is possible for non-trivial portions of the COSMO climate model, despite the lack of complete static information. Non-trivial loop optimizations previously implemented manually are performed fully automatically and memory management happens fully transparently using unified memory. Our preliminary results show notable performance improvements over sequential CPU code (40s to 8s reduction in execution time) and we are currently working on closing the remaining gap to hand-tuned GPU code. This talk is a status update on our most recent efforts and also intended to gather feedback on future research plans towards automatically mapping COSMO to FPGAs.
Tobias Grosser Bio
Tobias Grosser is a senior researcher in the Scalable Parallel Computing Laboratory (SPCL) of Torsten Hoefler at the Computer Science Department of ETH Zürich. Supported by a Google PhD Fellowship he received his doctoral degree from Universite Pierre et Marie Curie under the supervision of Albert Cohen. Tobias' research is taking place at the border of low-level compilers and high-level program transformations with the goal of enabling complex - but highly-beneficial - program transformations in a production compiler environment. He develops with the Polly loop optimizer a loop transformation framework which today is a community project supported throught the Polly Labs research laboratory. Tobias also developed advanced tiling schemes for the efficient execution of iterated stencils. Today Tobias leads the heterogeneous compute efforts in the Swiss University funded ComPASC project and is about to start a three year NSF Ambizione project on advancing automatic compilation and heterogenization techniques at ETH Zurich.
Email
bgerofi@riken.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
TF2.0 is designed to improve usability and productivity. As a TF's enthusiastic user, I am very excited. Personally, I think the most important thing about usability is "how does TF provide a user-friendly API?" Aside from the other aspects in TF 2.0, this post was a quick review from an API usage perspective.
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)Jyh-Miin Lin
Welcome to pynufft's Documentation!
Python non-uniform fast Fourier transform was designed and developed for image reconstruction in Python. Pynufft was written in pure Python and is based on numerical libraries, such as Numpy, Scipy (matplotlib for displaying examples). CUDA computing is experimentally supported.
Pynufft can be installed from Pypi (pip install pynufft). The source can be obtained from https://github.com/jyhmiinlin/pynufft.
Papr reduction for ofdm oqam signals via alternative signal methodeSAT Journals
Abstract
We deemed the PAPR reduction problem for OFDM/OQAM system. The PAPR reduction is the serious problem for
implementations of both OFDM and OFDM/OQAM systems due to their high PAPR. The OFDM/OQAM signal is generated by
summing over M time-shifted OFDM/OQAM symbols, where the successive symbols are interdependent with each other. The AS
(Alternative-Signal) method directly leads to the independent AS (AS-I) and joint AS (AS-J) algorithms. The AS-I algorithm
reduces the PAPR symbol by symbol with low complexity and AS-J applies optimal joint PAPR reduction among M
OFDM/OQAM symbols with much higher complexity. A sequential optimization procedure denoted AS-S have been proposed to
balance the computation complexity and system performance in this paper. AS-S algorithm, which adopts a sequential
optimization procedure over time with computational complexity linearly increasing with M. The Simulation results have been
provided for performance comparison of AS-I, AS-J, and AS-S algorithms.
Keywords—Peak-to-Average power ratio (PAPR), orthogonal frequency division multiplexing with offset quadrature
amplitude modulation (OFDM/OQAM), Alternative-signal(AS),cyclic prefix(CP).
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
By Hiroshi Nakashima, Kyoto University / RIKEN AICS
As a part of the evaluation of Post-K’s compilers, we have been investigating compiled codes of vectorizable kernel loops in a particle-in-cell simulation program. This talk will reveal how the latest version of LLVM compiler (v1.4) works on the loops together with the qualitative and quantitative comparison with the code generated by Intel’s compiler for KNL.
Hiroshi Nakashima Bio
Currently working as a professor of Kyoto University’s supercomputer center (ACCMS) for R&D on HPC programming and supercomputer system architecture, as well as a visiting senior researcher of RIKEN AICS for the evaluation of Post-K computer and its compilers.
Email
h.nakashima@media.kyoto-u.ac.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
OTOY founder and CEO, Jules Urbach, announces a colossal update to the OctaneRender ecosystem, including the pricing and availability of its highly anticipated OctaneRender 3 software and OctaneRender Cloud rendering service, and a detailed roadmap outlining the future of Octane’s development towards a 4.0 release in 2017 with full integration of OTOY’s advanced real-time path tracing engine, Brigade.
29. September bis 4. Oktober 2013, Dagstuhl Seminar 13401, Automatic Application Tuning for HPC Architectures, Session: infrastructures, 10:30-11:00, October 1st (TUE) , 2013.
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Takahiro Katagiri
In this study, we show a new ability of auto-tuning (AT) by utilizing selection of code variants based on totally different implementations of numerical computations. The selection function of the AT is carefully designed to apply ppOpen-AT, which is a computer language to adapt AT functions to simulation codes of actual use in ppOpen-HPC project. The AT is evaluated with ppOpen-APPL/FDM (Seism_3D), which is a simulation code of seismic wave based on Finite Difference Method (FDM). According to results of performance evaluation with an advanced multi-core processor, the Xeon Phi, crucial speedups are found by utilizing the selection of AT. Moreover, the best code variants were varied according to parallel executions, i.e. the number of MPI processes and OpenMP threads in hybrid MPI/OpenMP.
TF2.0 is designed to improve usability and productivity. As a TF's enthusiastic user, I am very excited. Personally, I think the most important thing about usability is "how does TF provide a user-friendly API?" Aside from the other aspects in TF 2.0, this post was a quick review from an API usage perspective.
A minimal introduction to Python non-uniform fast Fourier transform (pynufft)Jyh-Miin Lin
Welcome to pynufft's Documentation!
Python non-uniform fast Fourier transform was designed and developed for image reconstruction in Python. Pynufft was written in pure Python and is based on numerical libraries, such as Numpy, Scipy (matplotlib for displaying examples). CUDA computing is experimentally supported.
Pynufft can be installed from Pypi (pip install pynufft). The source can be obtained from https://github.com/jyhmiinlin/pynufft.
Papr reduction for ofdm oqam signals via alternative signal methodeSAT Journals
Abstract
We deemed the PAPR reduction problem for OFDM/OQAM system. The PAPR reduction is the serious problem for
implementations of both OFDM and OFDM/OQAM systems due to their high PAPR. The OFDM/OQAM signal is generated by
summing over M time-shifted OFDM/OQAM symbols, where the successive symbols are interdependent with each other. The AS
(Alternative-Signal) method directly leads to the independent AS (AS-I) and joint AS (AS-J) algorithms. The AS-I algorithm
reduces the PAPR symbol by symbol with low complexity and AS-J applies optimal joint PAPR reduction among M
OFDM/OQAM symbols with much higher complexity. A sequential optimization procedure denoted AS-S have been proposed to
balance the computation complexity and system performance in this paper. AS-S algorithm, which adopts a sequential
optimization procedure over time with computational complexity linearly increasing with M. The Simulation results have been
provided for performance comparison of AS-I, AS-J, and AS-S algorithms.
Keywords—Peak-to-Average power ratio (PAPR), orthogonal frequency division multiplexing with offset quadrature
amplitude modulation (OFDM/OQAM), Alternative-signal(AS),cyclic prefix(CP).
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
By Hiroshi Nakashima, Kyoto University / RIKEN AICS
As a part of the evaluation of Post-K’s compilers, we have been investigating compiled codes of vectorizable kernel loops in a particle-in-cell simulation program. This talk will reveal how the latest version of LLVM compiler (v1.4) works on the loops together with the qualitative and quantitative comparison with the code generated by Intel’s compiler for KNL.
Hiroshi Nakashima Bio
Currently working as a professor of Kyoto University’s supercomputer center (ACCMS) for R&D on HPC programming and supercomputer system architecture, as well as a visiting senior researcher of RIKEN AICS for the evaluation of Post-K computer and its compilers.
Email
h.nakashima@media.kyoto-u.ac.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
OTOY founder and CEO, Jules Urbach, announces a colossal update to the OctaneRender ecosystem, including the pricing and availability of its highly anticipated OctaneRender 3 software and OctaneRender Cloud rendering service, and a detailed roadmap outlining the future of Octane’s development towards a 4.0 release in 2017 with full integration of OTOY’s advanced real-time path tracing engine, Brigade.
29. September bis 4. Oktober 2013, Dagstuhl Seminar 13401, Automatic Application Tuning for HPC Architectures, Session: infrastructures, 10:30-11:00, October 1st (TUE) , 2013.
Towards Automatic Code Selection with ppOpen-AT: A Case of FDM - Variants of ...Takahiro Katagiri
In this study, we show a new ability of auto-tuning (AT) by utilizing selection of code variants based on totally different implementations of numerical computations. The selection function of the AT is carefully designed to apply ppOpen-AT, which is a computer language to adapt AT functions to simulation codes of actual use in ppOpen-HPC project. The AT is evaluated with ppOpen-APPL/FDM (Seism_3D), which is a simulation code of seismic wave based on Finite Difference Method (FDM). According to results of performance evaluation with an advanced multi-core processor, the Xeon Phi, crucial speedups are found by utilizing the selection of AT. Moreover, the best code variants were varied according to parallel executions, i.e. the number of MPI processes and OpenMP threads in hybrid MPI/OpenMP.
In this research, we show effect of auto-tuning (AT) for function of code selection to computational kernels for scientific and technology computations. ppOpen-AT, which is a computer language to specify AT function to arbitrary parts of program, is utilized to describe the code selection. The evaluation of AT in this research performed with advanced CPU architectures, such as the Intel Xeon Phi and the Intel Ivy Bridge. Results of preliminary experiment with a code based on Finite Difference Method (FDM) indicate that the effect of AT is crucial with compared to conventional AT framework without code selection.
本報告では,自動チューニング(AT)を実行するに当たり,コード最適化時に動的なコード生成とコンパイルを行わず,実行前に静的に生成したコードのみを利用するATソフトウェア構成方式のStatic Code Generation Auto-tuning (SCG-AT)を提案する.SCG-ATによるATを評価するにあたり「階層型AT処理」を実装した.差分法による地震波シミュレーションppOpen-APPL/FDMにおいて,従来のベクトル計算機向けコードと新規開発したスカラ計算機向けコードのコード選択処理を実装した.Xeon Phi,Ivy Bridge,およびFX10の3種の全く異なる計算機でSSG-ATによるコード選択のATを評価した.評価の結果,Xeon PhiとIvy Bridgeにおいてはスカラ計算機向けコードの選択により,従来行われていたAT方式では達成できない速度向上が達成できることを明らかにした.
------
ここに掲載した著作物の利用に関する注意 本著作物の著作権は情報処理学会に帰属します。本著作物は著作権者である情報処理学会の許可のもとに掲載するものです。ご利用に当たっては「著作権法」ならびに「情報処理学会倫理綱領」に従うことをお願いいたします。
Notice for the use of this material The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). This material is published on this web site with the agreement of the author (s) and the IPSJ. Please be complied with Copyright Law of Japan and the Code of Ethics of the IPSJ if any users wish to reproduce, make derivative work, distribute or make available to the public any part or whole thereof.
All Rights Reserved, Copyright (C) Information Processing Society of Japan.
Comments are welcome. Mail to address editj@ipsj.or.jp, please.
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...Takahiro Katagiri
We have developed a parallel eigensolver for very small-size matrices. Unlike conventional solvers, our design policy focusses on nature of non-blocking computations and reduced communications. A communication-avoiding approach for Householder pivot vectors is used to implement part of Householder inverse transformation. In addition to that, we implement some techniques for reducing communications by using non-blocking communications in tridiagonalization part. Performance of the solver with full nodes in the Fujitsu FX10 (76,800 cores) is also presented.
Impact of Auto-tuning of Kernel Loop Transformation by using ppOpen-ATTakahiro Katagiri
SPNS2013, December 5th -6th, 2013, Conference Room, 3F, Bldg.1, Earthquake Research Institute (ERI), The University of Tokyo, December 6th, 2013, ppOpen-HPC and Automatic Tuning (Chair: Hideyuki Jitsumoto), 1330-1400
Auto‐Tuning of Hierarchical Computations with ppOpen‐ATTakahiro Katagiri
We are now developing ppOpen-AT, which is a directive-base Auto-tuning (AT) language to specify fundamental AT functions, i.e., varying values of parameters, loop transformations, and code selection. Considering with expected hardware of Post Moore’s era, we focus on optimization for computations with deep hierarchy of 3D memory stack. ppOpen-AT provides code selection to optimize code with respect to layers of the memory. Performance evaluation of AT with a code of FDM will be shown by utilizing the Xeon Phi.
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsMarina Kolpakova
Explains compilers optimizations, gives taxanomy and examples. The examples are mostly compiler for ARM armv7-a and armv8-a targets, but most of optimizations are machine independent.
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...csandit
A large family of signal processing techniques consist of Fourier-transforming a signal,
manipulating the Fourier-transformed data in a simple way, and reversing the transformation.
We widely use Fourier frequency analysis in equalization of audio recordings, X-ray
crystallography, artefact removal in Neurological signal and image processing, Voice Activity
Detection in Brain stem speech evoked potentials, speech processing spectrograms are used to
identify phonetic sounds and so on. Discrete Fourier Transform (DFT) is a principal
mathematical method for the frequency analysis. The way of splitting the DFT gives out various
fast algorithms. In this paper, we present the implementation of two fast algorithms for the DFT
for evaluating their performance. One of them is the popular radix-2 Cooley-Tukey fast Fourier
transform algorithm (FFT) [1] and the other one is the Grigoryan FFT based on the splitting by
the paired transform [2]. We evaluate the performance of these algorithms by implementing
them on the Xilinx Virtex-II pro [3] and Virtex-5 [4] FPGAs, by developing our own FFT
processor architectures. Finally we show that the Grigoryan FFT is working fatser than
Cooley-Tukey FFT, consequently it is useful for higher sampling rates. Operating at higher
sampling rates is a challenge in DSP applications.
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...cscpconf
A large family of signal processing techniques consist of Fourier-transforming a signal,manipulating the Fourier-transformed data in a simple way, and reversing the transformation.We widely use Fourier frequency analysis in equalization of audio recordings, X-ray crystallography, artefact removal in Neurological signal and image processing, Voice Activity Detection in Brain stem speech evoked potentials, speech processing spectrograms are used to identify phonetic sounds and so on. Discrete Fourier Transform (DFT) is a principal mathematical method for the frequency analysis. The way of splitting the DFT gives out various fast algorithms. In this paper, we present the implementation of two fast algorithms for the DFT for evaluating their performance. One of them is the popular radix-2 Cooley-Tukey fast Fourier transform algorithm (FFT) [1] and the other one is the Grigoryan FFT based on the splitting by the paired transform [2]. We evaluate the performance of these algorithms by implementing
them on the Xilinx Virtex-II pro [3] and Virtex-5 [4] FPGAs, by developing our own FFT processor architectures. Finally we show that the Grigoryan FFT is working fatser than
Cooley-Tukey FFT, consequently it is useful for higher sampling rates. Operating at higher
sampling rates is a challenge in DSP applications
Profiling PyTorch for Efficiency & Sustainabilitygeetachauhan
From my talk at the Data & AI summit - latest update on the PyTorch Profiler and how you can use it for optimizations for efficiency. Talk also dives into the future and what we need to do together as an industry to move towards Sustainable AI
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
Jack Dongarra from the University of Tennessee presented these slides at Ken Kennedy Institute of Information Technology on Feb 13, 2014.
Listen to the podcast review of this talk: http://insidehpc.com/2014/02/13/week-hpc-jack-dongarra-talks-algorithms-exascale/
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesMarina Kolpakova
The slides give an idea about how to look pragmatically at software optimization and order optimization approaches according to this pragmatic point of view
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Towards Auto-tuning Facilities into Supercomputers in Operation - The FIBER approach and minimizing software-stack requirements -
1. Towards Auto-tuning Facilities
into Supercomputers in Operation
- The FIBER approach and
minimizing software-stack requirements -
Takahiro Katagiri (片桐 孝洋)
Information Technology Center,
The University of Tokyo
(東京大学 情報基盤センター)
1
2014 ATAT in HPSC, National Taiwan University,
March 15, 2014 (Saturday), Performance 10:10-10:30
Joint work with: Satoshi Ohshima(大島 聡史)
Masaharu Matsumoto(松本 正晴)
2. Overview
1. Background and ppOpen-HPC
Project
2. ppOpen-AT Basics
3. Adaptation to an FDM
Application
4. Performance Evaluation
5. Conclusion
2
3. Overview
1. Background and ppOpen-HPC
Project
2. ppOpen-AT Basics
3. Adaptation to an FDM
Application
4. Performance Evaluation
5. Conclusion
3
4. Background
High-Thread Parallelism (HTP)
◦ Multi-core and many-core processors are
pervasive.
Multicore CPUs: 8-16 cores, 16-64 Threads with Hyper
Threading (HT) or Simultaneous Multithreading (SMT)
Many Core CPU: Xeon Phi – 60 cores, 240 Threads
with HT.
◦ Utilizing parallelism with full-threads is important.
4
Performance Portability (PP)
◦ Keeping high performance in multiple computer environments.
Not only multiple CPUs, but also multiple compilers.
Run-time information, such as loop length and
number of threads, is important.
◦ Auto-tuning (AT) is one of candidates technologies to
establish PP in multiple computer environments.
5. ppOpen-HPC Project
Middleware for HPC and Its AT
◦ Supported by JST, CREST, from 2011FY to 2016FY.
◦ PI: Professor Kengo Nakajima (U. Tokyo)
ppOpen-HPC
◦ An open source infrastructure for reliable simulation
codes on post-peta (pp) scale parallel computers.
◦ consists of various types of libraries,
which covers 5 kinds of discretization methods for
scientific computations.
ppOpen-AT
◦ An auto-tuning language for ppOpen-HPC codes
◦ Using knowledge of previous project, that is
ABCLibScript Project.
◦ Auto-tuning language based on directives of AT. 5
6. 6
FVM DEMFDMFEM
Many-core CPUs GPU
Low Power
CPUs
Vector CPUs
MG
COMM
Auto-Tuning Facility
Code Generation for Optimization Candidates
Search for the best candidate
Automatic Execution for the optimization
Resource Allocation Facility
ppOpen-APPL
ppOpen-MATH
BEM
ppOpen-AT
User’s Program
GRAPH VIS MP
STATIC DYNAMIC
ppOpen-SYS FT
Specify
The Best
Execution
Allocations
Software Architecture of ppOpen-HPC
7. Overview
1. Background and ppOpen-HPC
Project
2. ppOpen-AT Basics
3. Adaptation to an FDM
Application
4. Performance Evaluation
5. Conclusion
7
9. A Scenario to Software Developers for
ppOpen-AT
9
Executable Code with
Optimization Candidates
and AT Function
Invocate dedicated
Preprocessor
Software
Developer
Description of AT by Using
ppOpen-AT
Program with AT
Functions
Optimization
that cannot be
established by
compilers
#pragma oat install unroll (i,j,k) region start
#pragma oat varied (i,j,k) from 1 to 8
for(i = 0 ; i < n ; i++){
for(j = 0 ; j < n ; j++){
for(k = 0 ; k < n ; k++){
A[i][j]=A[i][j]+B[i][k]*C[k][j]; }}}
#pragma oat install unroll (i,j,k) region end
■Automatic Generated
Functions
Optimization
Candidates
Performance Monitor
Parameter Search
Performance Modeling
Description By Software Developer
Optimizations for Source Codes,
Computer Resource, Power Consumption
10. Compiler Optimization and AT
1. Loop length is unclear in compile‐time.
Optimal loop split and loop fusion are specified in run‐time.
Run‐time compiling is on only research.
2. Loop split with data dependencies.
Some loop splits require increase of computations or memory
space.
Some compilers are providing directive, but the directive is not
standardized.
Code optimization is not also standardized between compilers.
3. Restrictions from Operation in Supercomputers.
Some supercomputer environments cannot supply required “software‐
stack”, or the software‐stack cannot be utilize due to restriction by operation.
Out of target for the system due to hardware restriction.
Ex) CAPS in the K‐computer.
Operation costs (budgets), vender strategy, etc…. 10
11. Overview
1. Background and ppOpen-HPC
Project
2. ppOpen-AT Basics
3. Adaptation to an FDM
Application
4. Performance Evaluation
5. Conclusion
11
13. Target Application
Seism3D
: Simulation software for
seismic wave analysis.
Strategic simulation software in Japan.
Developed by Professor Furumura
at the University of Tokyo.
◦ The code is re-constructed as
ppOpen-APPL/FDM.
Finite Differential Method (FDM)
3D simulation
◦ 3D arrays are allocated.
Data type: Single Precision (real*4)
13
Source: http://www.eri.u-
tokyo.ac.jp/furumura/tsunami
/tsunami.html
14. The Heaviest Loop (20%+ to Total Time)
14
!$omp parallel do private(k,j,i,RL1,RM1,RM2,RLRM2,DXVX1,DYVY1,DZVZ1,D3V3,DXVYDYVX1,
DXVZDZVX1,DYVZDZV1)
DO K = 1, NZ
DO J = 1, NY
DO I = 1, NX
RL1 = LAM (I,J,K)
RM1 = RIG (I,J,K)
RM2 = RM1 + RM1; RLRM2 = RL1+RM2;
DXVX1 = DXVX(I,J,K); DYVY1 = DYVY(I,J,K)
DZVZ1 = DZVZ(I,J,K); D3V3 = DXVX1 + DYVY1 + DZVZ1
SXX (I,J,K) = SXX (I,J,K) + (RLRM2*(D3V3)-RM2*(DZVZ1+DYVY1) ) * DT
SYY (I,J,K) = SYY (I,J,K) + (RLRM2*(D3V3)-RM2*(DXVX1+DZVZ1) ) * DT
SZZ (I,J,K) = SZZ (I,J,K) + (RLRM2*(D3V3)-RM2*(DXVX1+DYVY1) ) * DT
DXVYDYVX1 = DXVY(I,J,K)+DYVX(I,J,K)
DXVZDZVX1 = DXVZ(I,J,K)+DZVX(I,J,K);
DYVZDZVY1 = DYVZ(I,J,K)+DZVY(I,J,K)
SXY (I,J,K) = SXY (I,J,K) + RM1 * DXVYDYVX1 * DT
SXZ (I,J,K) = SXZ (I,J,K) + RM1 * DXVZDZVX1 * DT
SYZ (I,J,K) = SYZ (I,J,K) + RM1 * DYVZDZVY1 * DT
END DO
END DO
END DO
!$omp end parallel do
A Flow Dependency
15. Optimization Possibilities
Loop Splitting
◦ To reduce spill code.
◦ To maximize register usage.
Loop fusion (Loop Collapse)
◦ 3 nested loop -> The following two approaches.
◦ One nest loop
To increase outer loop parallelism for thread
parallelism.
◦ Two nested loop
To increase outer loop parallelism for thread
parallelism.
To utilize pre-fetching for the inner loop.
15
16. Loop fusion –
One dimensional (a loop collapse)
16
!$omp parallel do private(k,j,i,RL1,RM1,RM2,RLRM2,DXVX1,DYVY1,DZVZ1,D3V3,DXVYDYVX1,
DXVZDZVX1,DYVZDZV1)
DO KK = 1, NZ * NY * NX
K = (KK-1)/(NY*NX) + 1
J = mod((KK-1)/NX,NY) + 1
I = mod(KK-1,NX) + 1
RL1 = LAM (I,J,K)
RM1 = RIG (I,J,K)
RM2 = RM1 + RM1; RLRM2 = RL1+RM2;
DXVX1 = DXVX(I,J,K); DYVY1 = DYVY(I,J,K)
DZVZ1 = DZVZ(I,J,K); D3V3 = DXVX1 + DYVY1 + DZVZ1
SXX (I,J,K) = SXX (I,J,K) + (RLRM2*(D3V3)-RM2*(DZVZ1+DYVY1) ) * DT
SYY (I,J,K) = SYY (I,J,K) + (RLRM2*(D3V3)-RM2*(DXVX1+DZVZ1) ) * DT
SZZ (I,J,K) = SZZ (I,J,K) + (RLRM2*(D3V3)-RM2*(DXVX1+DYVY1) ) * DT
DXVYDYVX1 = DXVY(I,J,K)+DYVX(I,J,K)
DXVZDZVX1 = DXVZ(I,J,K)+DZVX(I,J,K);
DYVZDZVY1 = DYVZ(I,J,K)+DZVY(I,J,K)
SXY (I,J,K) = SXY (I,J,K) + RM1 * DXVYDYVX1 * DT
SXZ (I,J,K) = SXZ (I,J,K) + RM1 * DXVZDZVX1 * DT
SYZ (I,J,K) = SYZ (I,J,K) + RM1 * DYVZDZVY1 * DT
END DO
!$omp end parallel do
Merit: Loop length is huge.
This is good for OpenMP thread parallelism.
17. Loop fusion – Two dimensional
17
!$omp parallel do private(k,j,i,RL1,RM1,RM2,RLRM2,DXVX1,DYVY1,DZVZ1,D3V3,DXVYDYVX1,
DXVZDZVX1,DYVZDZV1)
DO KK = 1, NZ * NY
K = (KK-1)/NY + 1
J = mod(KK-1,NY) + 1
DO I = 1, NX
RL1 = LAM (I,J,K)
RM1 = RIG (I,J,K)
RM2 = RM1 + RM1; RLRM2 = RL1+RM2;
DXVX1 = DXVX(I,J,K); DYVY1 = DYVY(I,J,K)
DZVZ1 = DZVZ(I,J,K); D3V3 = DXVX1 + DYVY1 + DZVZ1
SXX (I,J,K) = SXX (I,J,K) + (RLRM2*(D3V3)-RM2*(DZVZ1+DYVY1) ) * DT
SYY (I,J,K) = SYY (I,J,K) + (RLRM2*(D3V3)-RM2*(DXVX1+DZVZ1) ) * DT
SZZ (I,J,K) = SZZ (I,J,K) + (RLRM2*(D3V3)-RM2*(DXVX1+DYVY1) ) * DT
DXVYDYVX1 = DXVY(I,J,K)+DYVX(I,J,K)
DXVZDZVX1 = DXVZ(I,J,K)+DZVX(I,J,K);
DYVZDZVY1 = DYVZ(I,J,K)+DZVY(I,J,K)
SXY (I,J,K) = SXY (I,J,K) + RM1 * DXVYDYVX1 * DT
SXZ (I,J,K) = SXZ (I,J,K) + RM1 * DXVZDZVX1 * DT
SYZ (I,J,K) = SYZ (I,J,K) + RM1 * DYVZDZVY1 * DT
ENDDO
END DO
!$omp end parallel do
Example:
Merit: Loop length is huge.
This is good for OpenMP thread parallelism.
This I-loop enables us an opportunity of pre-fetching.
18. 18
!$omp parallel do private(k,j,i,RL1,RM1,RM2,RLRM2,DXVX1,DYVY1,DZVZ1,D3V3,DXVYDYVX1,
DXVZDZVX1,DYVZDZV1)
DO K = 1, NZ
DO J = 1, NY
DO I = 1, NX
RL1 = LAM (I,J,K)
RM1 = RIG (I,J,K)
RM2 = RM1 + RM1; RLRM2 = RL1+RM2;
DXVX1 = DXVX(I,J,K); DYVY1 = DYVY(I,J,K)
DZVZ1 = DZVZ(I,J,K); D3V3 = DXVX1 + DYVY1 + DZVZ1
SXX (I,J,K) = SXX (I,J,K) + (RLRM2*(D3V3)-RM2*(DZVZ1+DYVY1) ) * DT
SYY (I,J,K) = SYY (I,J,K) + (RLRM2*(D3V3)-RM2*(DXVX1+DZVZ1) ) * DT
SZZ (I,J,K) = SZZ (I,J,K) + (RLRM2*(D3V3)-RM2*(DXVX1+DYVY1) ) * DT
ENDDO
DO I = 1, NX
RM1 = RIG (I,J,K)
DXVYDYVX1 = DXVY(I,J,K)+DYVX(I,J,K)
DXVZDZVX1 = DXVZ(I,J,K)+DZVX(I,J,K);
DYVZDZVY1 = DYVZ(I,J,K)+DZVY(I,J,K)
SXY (I,J,K) = SXY (I,J,K) + RM1 * DXVYDYVX1 * DT
SXZ (I,J,K) = SXZ (I,J,K) + RM1 * DXVZDZVX1 * DT
SYZ (I,J,K) = SYZ (I,J,K) + RM1 * DYVZDZVY1 * DT
END DO
END DO
END DO
Re-computation
(a copy) is needed.
⇒Compilers
do not apply it
without directive.
Perfect Splitting: Two 3-nested Loops
20. Candidates of Auto-generated Codes
#1 [Baseline] : Original three-nested loop.
#2 [Spilt] : Loop split for the k-loop
(separated two three-nested loops).
#3 [Split] : Loop split for the j-loop.
#4 [Split] : Loop split for the i-loop.
#5 [Fusion] : Loop fusion for the k-loop and j-loop
(a two-nested loop).
#6 [Split and Fusion] : Loop fusions for the k-loop
and j-loop for the loops in #2.
#7 [Fusion] : Loop fusions for the k-loop, j-loop,
and i-loop (loop collapse).
#8 [Split and Fusion] : Loop fusions for the k-loop, j-loop,
and i-loop for the loops in #2
(loop collapses for the separated two-loops).
20
21. Overview
1. Background and ppOpen-HPC
Project
2. ppOpen-AT Basics
3. Adaptation to an FDM
Application
4. Performance Evaluation
5. Conclusion
21
23. An Example of Seism3D Simulation
West part earthquake in Tottori prefecture in Japan
at year 2000. ([1], pp.14)
The region of 820km x 410km x 128 km is discretized with 0.4km.
NX x NY x NZ = 2050 x 1025 x 320 ≒ 6.4 : 3.2 : 1.
[1] T. Furumura, “Large-scale Parallel FDM Simulation for Seismic Waves and Strong Shaking”, Supercomputing News,
Information Technology Center, The University of Tokyo, Vol.11, Special Edition 1, 2009. In Japanese.
Figure : Seismic wave translations in west part earthquake in Tottori prefecture in Japan.
(a) Measured waves; (b) Simulation results; (Reference : [1] in pp.13)
24. Test Condition
Software version
◦ ppOpen-APPL/FDM version 0.2
◦ ppOpen-AT version 0.2
Target Kernels in ppOpen-APPL/FDM
◦ TOP 10 Kernels (All three-nested loops)
Update_stress
Update_vel
Update_spong
Other 7 kernels in finite differential computations.
AT Timing
◦ Before Execute-time
After fixing problem size and the number of threads by user.
Then, adapt AT in time for calling of the library routine.
All candidates of AT are evaluated. (Brute-force search)
◦ Only 8+3+6+7*3 = 38 candidates.
#Repeats for each kernel in the AT mode
◦ 100 times
24
25. The Xeon Phi Cluster System
Intel Xeon (Ivy Bridge) : HOST CPU
OS:Red Hat Enterprise Linux Server release 6.2
#Nodes:32 (Available: 14 nodes)
CPU:Intel Xeon E5‐2670 V2 @ 2.50GHz,2 sockets×10 cores
Hyper Threading:ON
Theoretical Peak Performance for 1 node of CPU:400 GFLOPS
Memory size on 1 node:64 GB
Interconnect:Infiniband
Compiler:Intel Fortran version 14.0.0.080 Build 20130728
Compiler Option:‐ipo20 ‐O3 ‐warn all ‐openmp ‐mcmodel=medium ‐shared‐intel
KMP_AFFINITY=granularity=fine, compact (all threads are on socket)
Intel Xeon Phi co‐processor (Xeon Phi) : Accelerator
CPU:Xeon Phi 5110P (B1 stepping) 1.053 GHz,60 core
Memory size:8 GB
Theoretical Peak Performance :1 TFLOPS ( = 1.053 GHz x 16 FLOPS x 60 core)
Connected one board on each node of the Cluster
Native mode
Compiler:Intel Fortran version 14.0.0.080 Build 20130728
Compiler Option:‐ipo20 ‐O3 ‐warn all ‐openmp ‐mcmodel=medium ‐shared‐intel
–mmiccl ‐align array64byte
KMP_AFFINITY=granularity=fine, balanced (all threads are equally distributed on
socket)
27. Execution Details
• ppOpen‐APPL/FDM ver.0.2
• ppOpen‐AT ver.0.2
• Target Problem Size
– NX * NY * NZ = 256 x 96 x 100 / node
– NX * NY * NZ = 32 * 16 * 20 / core (!= per MPI Process)
• Native mode for MIC
• Target MPI Processes and Threads on the Xeon Phi
– 1 node of the Xeon Phi with 4 HT (Hyper Threading)
– PXTY : X MPI Processes and Y Threads per process
– P240T1 : pure MPI with 4HT per core
– P120T2
– P60T4
– P16T15
– P8T30 : Minimum Hybrid MPI‐OpenMP execution for
ppOpen‐APPL/FDM, since it needs minimum 8 MPI Processes.
• The number of iterations for the kernels: 100
28. 2.11
2.32 2.33
2.96 3.14
1.29
1.70 1.74 1.91 1.97
0
1
2
3
4
P240T1 P120T2 P60T4 P16T15 P8T30
Without AT With AT
AT Effect (update_stress, Xeon Phi)[Seconds]
KMP_AFFINITY=balanced
‐align array64byte New Kernels
1.63
1.36 1.34
1.55 1.59
0
0.5
1
1.5
2
P240T1 P120T2 P60T4 P16T15 P8T30
Speedups
Best SW: 6 Best SW: 5 Best SW: 5 Best SW: 5 Best SW: 6
29. Conclusion
Loop fusion to obtain high parallelism
is one of key techniques for current
multi- and many-core architectures.
◦ Execution with 240 threads/MPI process
in the Xeon Phi.
◦ Strong scaling with more than 10,000+ cores
in the FX10.
To do AT in supercomputers
in operation, minimizing requirement
of “software-stack” is a practical way
to establish AT.
30. ppOpen-AT is free software!
ppOpen-AT version 0.2 is
available!
The licensing is MIT.
Please access the following page:
http://ppopenhpc.cc.u-tokyo.ac.jp/
30