This document summarizes the performance of HiFUN, a computational fluid dynamics (CFD) solver, on parallel computing platforms using Intel MPI library. It finds that HiFUN demonstrates high scalability, with a cell count of a few thousand achieving over 85% parallel efficiency. Testing on grids with 12.7 million to 63.5 million cells on platforms with up to 10,248 cores shows near ideal speedup and good parallel efficiency. HiFUN also exhibits strong algorithmic scalability, with convergence independent of core count. This makes HiFUN highly suitable for industrial CFD simulations.
Top 10 oil rig worker interview questions and answershenrywhiter
In this file, you can ref interview materials for oil rig worker such as types of interview questions, oil rig worker situational interview, oil rig worker behavioral interview…
Top 10 oil rig worker interview questions and answershenrywhiter
In this file, you can ref interview materials for oil rig worker such as types of interview questions, oil rig worker situational interview, oil rig worker behavioral interview…
Paroxysmal Nocturnal Hemoglobinuria (PNH) - A Pathologic SurveyJackson Reynolds
Slides from a Microsoft PowerPoint® presentation I delivered covering the basic clinical presentation, diagnosis, pathogenesis/pathophysiology, treatment, and prognosis of paroxysmal nocturnal hemoglobinuria (PNH). This presentation was given on October 3, 2018 at the Medical College of Georgia, Augusta Campus to an audience of clinical pathologists and second-year MD candidates.
Instrumenting the MG applicaiton of NAS Parallel BenchmarkMaria Stylianou
Course: Measurement Tools and Techniques (10-15min): Instrumenting the MG applicaiton of NAS Parallel Benchmark
Aim of this presentation: Show the steps followed for instrumenting the application.
The aim of the proposed research will be to develop software for implementing a parallel solution for the RSA decryption algorithm. Multithread and distributed computing methods will be used to reach the aimed objective. This effort will include the development of a hybrid OpenMP/MPI program to maximize the use of computational resources and, consequently, decrease the time to decrypt large ciphertexts.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
Stay up-to-date with the OpenACC Monthly Highlights. July's edition covers the OpenACC Summit 2021, GCC, upcoming GPU Hackathons and Bootcamps, Sunita Chandrasekaran named as PI for SOLLVE Project, recent research and more!
Paroxysmal Nocturnal Hemoglobinuria (PNH) - A Pathologic SurveyJackson Reynolds
Slides from a Microsoft PowerPoint® presentation I delivered covering the basic clinical presentation, diagnosis, pathogenesis/pathophysiology, treatment, and prognosis of paroxysmal nocturnal hemoglobinuria (PNH). This presentation was given on October 3, 2018 at the Medical College of Georgia, Augusta Campus to an audience of clinical pathologists and second-year MD candidates.
Instrumenting the MG applicaiton of NAS Parallel BenchmarkMaria Stylianou
Course: Measurement Tools and Techniques (10-15min): Instrumenting the MG applicaiton of NAS Parallel Benchmark
Aim of this presentation: Show the steps followed for instrumenting the application.
The aim of the proposed research will be to develop software for implementing a parallel solution for the RSA decryption algorithm. Multithread and distributed computing methods will be used to reach the aimed objective. This effort will include the development of a hybrid OpenMP/MPI program to maximize the use of computational resources and, consequently, decrease the time to decrypt large ciphertexts.
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
In this paper we describe about the novel implementations of depth estimation from a stereo
images using feature extraction algorithms that run on the graphics processing unit (GPU) which is
suitable for real time applications like analyzing video in real-time vision systems. Modern graphics
cards contain large number of parallel processors and high-bandwidth memory for accelerating the
processing of data computation operations. In this paper we give general idea of how to accelerate the
real time application using heterogeneous platforms. We have proposed to use some added resources to
grasp more computationally involved optimization methods. This proposed approach will indirectly
accelerate a database by producing better plan quality.
Stay up-to-date with the OpenACC Monthly Highlights. July's edition covers the OpenACC Summit 2021, GCC, upcoming GPU Hackathons and Bootcamps, Sunita Chandrasekaran named as PI for SOLLVE Project, recent research and more!
Functional Verification of Large-integers Circuits using a Cosimulation-base...IJECEIAES
Cryptography and computational algebra designs are complex systems based on modular arithmetic and build on multi-level modules where bit-width is generally larger than 64-bit. Because of their particularity, such designs pose a real challenge for verification, in part because large-integer‘s functions are not supported in actual hardware description languages (HDLs), therefore limiting the HDL testbench utility. In another hand, high-level verification approach proved its efficiency in the last decade over HDL testbench technique by raising the latter at a higher abstraction level. In this work, we propose a high-level platform to verify such designs, by leveraging the capabilities of a popular tool (Matlab/Simulink) to meet the requirements of a cycle accurate verification without bit-size restrictions and in multi-level inside the design architecture. The proposed high-level platform is augmented by an assertion-based verification to complete the verification coverage. The platform experimental results of the testcase provided good evidence of its performance and re-usability.
OpenACC and Open Hackathons Monthly Highlights: July 2022.pptxOpenACC
Stay up-to-date with the OpenACC and Open Hackathons Monthly Highlights. July’s edition covers the 2022 OpenACC and Hackathons Summit, NVIDIA’s Applied Research Accelerator Program, upcoming Open Hackathons and Bootcamps, recent research, new resources, and more!
Design and Implementation of Quintuple Processor Architecture Using FPGAIJERA Editor
The advanced quintuple processor core is a design philosophy that has become a mainstream in Scientific and engineering applications. Increasing performance and gate capacity of recent FPGA devices permit complex logic systems to be implemented on a single programmable device. The embedded multiprocessors face a new problem with thread synchronization. It is caused by the distributed memory, when thread synchronization is violated the processors can access the same value at the same time. Basically the processor performance can be increased by adopting clock scaling technique and micro architectural Enhancements. Therefore, Designed a new Architecture called Advanced Concurrent Computing. This is implemented on the FPGA chip using VHDL. The advanced Concurrent Computing architecture performs a simultaneous use of both parallel and distributed computing. The full architecture of quintuple processor core designed for realistic to perform arithmetic, logical, shifting and bit manipulation operations. The proposed advanced quintuple processor core contains Homogeneous RISC processors, added with pipelined processing units, multi bus organization and I/O ports along with the other functional elements required to implement embedded SOC solutions. The designed quintuple performance issues like area, speed and power dissipation and propagation delay are analyzed at 90nm process technology using Xilinx tool.
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers pseudo random number generation, the first-ever MONAI Bootcamp, upcoming GPU Hackathons and Bootcamps, and new resources!
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the first remote GPU Hackathons, a complete schedule of upcoming events, using OpenACC for a biophysics problem, NVIDIA HPC SDK, GCC 10, new resources and more!
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers working on applications for the new Frontier supercomputer, using OpenACC for weather forecasting, upcoming GPU Hackathons and Bootcamps, and new resources!
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
SENSOR SIGNAL PROCESSING USING HIGH-LEVEL SYNTHESIS AND INTERNET OF THINGS WI...pijans
Sensor routers play a crucial role in the sector of Internet of Things applications, in which the capacity for transmission of the network signal is limited from cloud systems to sensors and its reversal process. It describes a robust recognized framework with various architected layers to process data at high level synthesis. It is designed to sense the nodes instinctually with the help of Internet of Things where the applications arise in cloud systems. In this paper embedded PEs with four layer new design framework architecture is proposed to sense the devises of IOT applications with the support of high-level synthesis DBMF (database management function) tool.
Developing Real-Time Systems on Application ProcessorsToradex
Guaranteeing real-time and deterministic behavior on SoC-based systems can be challenging. In this blog post, we offer three approaches to add real-time control to systems that use a SoC running a feature-rich OS such as Linux. https://www.toradex.com/blog/developing-real-time-systems-on-application-processors
Stay up-to-date on the latest news, research and resources. This month's edition covers the Georgia Tech Open Hackathon, milestones in OpenACC development, upcoming Open Hackathons and Bootcamps, NVIDIA's developer program, and more!
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdfSaiReddy794166
The International Journal of Engineering and Science and Research is online journal in English published. The aim is to publish peer review and research articles without delay in the developing in engineering and science Research.The International Journal of Engineering and Science and Research is online journal in English published. The aim is to publish peer review and research articles without delay in the developing in engineering and science Research.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
In the ever-evolving landscape of technology, enterprise software development is undergoing a significant transformation. Traditional coding methods are being challenged by innovative no-code solutions, which promise to streamline and democratize the software development process.
This shift is particularly impactful for enterprises, which require robust, scalable, and efficient software to manage their operations. In this article, we will explore the various facets of enterprise software development with no-code solutions, examining their benefits, challenges, and the future potential they hold.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
1. Aerospace Supercomputing
Demonstrates the Parallelism
Advantage
High Resolution Flow Solver on Unstructured Meshes (HiFUN) Offers Extreme Scalable Performance
Overview
Simulation and Innovation Engineering Solutions (SandI) Pvt. Ltd. (www.sandi.co.in)
is a technology-driven company incubated from the Indian Institute of Science
(www.iisc.ernet.in), one of India’s premier research institutes. While the main focus of
the company is on promotion of the CFD flow solver HiFUN (High Resolution Flow Solver
on Unstructured Meshes), SandI is also involved in providing high-end CFD services to
the aerospace industry. One of the primary strengths of SandI is that it is continuously
supported by research and development initiatives from the Computational Aerodynamic
Laboratory (CAd Lab) in the Department of Aerospace Engineering at IISc. This enables
SandI to evolve current CFD tools and processes, while at the same time meeting ever-
increasing customer needs and demands.
HiFUN Supports Complex Simulations and Delivers Usable Data
The primary product of SandI, the state-of-the-art, general-purpose CFD solver HiFUN,
is robust, fast, and accurate, providing aerodynamic design data in a time-frame that is
most attractive to designers. The usefulness of HiFUN stems from its ability to handle
complex geometries and flow physics arising in a typical industrial environment. While
the use of unstructured data capable of handling arbitrary polyhedral volumes renders
the code HiFUN, the ability to simulate complex geometries with relative ease and
the use of a matrix-free implicit procedure resulting in rapid convergence to steady
state makes the solver both efficient and robust. The accuracy of HiFUN has been
amply demonstrated through participation in various international CFD code evaluation
exercises such as the AIAA Drag Prediction Workshop (http://aaac.larc.nasa.gov/tsab/
cfdlarc/aiaa-dpw) and AIAA High Lift Prediction Workshop (http://hiliftpw.larc.nasa.gov).
In the High Lift workshop in Chicago, U.S.—where 18 organizations from eight countries
participated—HiFUN was judged one of the very good CFD solvers. The other important
strength of HiFUN is its ability to scale over several thousand processor cores in a
typical massively parallel supercomputing environment. This feature is a boon to the
designer—who can expect to have a turnaround time independent of the problem size.
With these features, HiFUN has been successfully used in simulations for a wide range of
flow problems, from low subsonic speeds to hypersonic speeds (http://www.sandi.co.in).
HiFun and Parallel Performance
For a CFD solver like HiFUN, two important indicators of parallel performance are
parallel scalability and algorithmic scalability. For an iterative solver, parallel scalability
demands that the time taken by the solver per iteration should inversely reduce as
“The ability to simulate
complex geometries with
relative ease and the use
of a matrix-free implicit
procedure resulting in rapid
convergence to steady
state makes the solver both
efficient and robust.”
– Dr. Nikhil V Shende
Director
S & I Engineering Solutions Pvt. Ltd.
case study
Intel® Software Development Tools
Intel® Cluster Studio XE, Intel® Fortran Compiler,
and Intel® MPI Library
2. the number of compute cores increase.
Parallel scalability depends on balancing
the computational load across the cores,
while at the same time ensuring minimum
data communication across them. In the
present study, the software METIS
(http://glaros.dtc.umn.edu/gkhome/views/
metis), is employed to obtain optimal load
balance, based on a multilevel, multi-
constraint graph partitioning algorithm.
The other important indicator of parallel
performance, the algorithmic scalability,
effectively means that numerical
performance of the code is independent
of the number of compute cores employed
for computations. The algorithmic
scalability of the solver depends on the
ability of underlying serial algorithms to
be amenable to efficient parallelization
and their actual implementation in the
solver framework. The use of a novel
four-layer data structure enables HiFUN
to achieve a high level of algorithmic
scalability. HiFUN employs standard mode,
nonblocking communication MPI directives
to transfer data across the compute cores.
The parallel performance of HiFUN is
studied by simulating subsonic flow
past NASA Trapezoidal Wing (NASA Trap
Wing: http://hiliftpw.larc.nasa.gov/index-
workshop1.html). Trap Wing is a typical
high-lift configuration offering adequate
geometric complexity. Simulating the
resulting complex flow is a challenge to
the CFD community. Naturally, the grid
for adequately resolving such a complex
flow is large and makes this problem an
ideal candidate for evaluating the parallel
performance of a CFD solver. For this
study, the free stream Mach number is
0.2, the angle of attack is 28 degrees, and
the free stream Reynolds number based
on mean aerodynamic chord of the
wing is 4.2 million. The computations are
performed on three hybrid unstructured
grids consisting of prismatic and
tetrahedral elements. Table 1 gives
the size of each grid in terms of number
of cells.
Figure 1 depicts an unstructured surface
grid on NASA Trap Wing and figure 2
depicts typical pressure distribution on
the wing.
Compute Platforms
The parallel performance of HiFUN using
grid UG1 is studied on Endeavor, an Intel®
360-node HPC cluster. At the time of the
study, each node of Endeavor consists
of dual hexacore Intel® Xeon® X5670 B1
Step processors using 2.93 GHz with
24 GB RAM. The interconnect used for
connecting the nodes is InfiniBand QDR,
and message passing across the nodes is
achieved using Intel® MPI Library, 4.0.3.
The parallel performance of HiFUN
using grids UG2 and FG is studied on the
compute platform Pleiades, available with
NASA (http://www.nas.nasa.gov/hecc/
resources/pleiades.html). This system
consists of 4480 nodes of Intel Xeon
X5670 processors using 2.93 GHz and 128
nodes of Intel® Xeon® X5675 processors
using 3.06 GHz. Each node of Pleiades
consists of dual hexacore processors
with 24 GB RAM. The interconnect used
for connecting the nodes is InfiniBand
QDR host channel adapter and message
passing across the nodes is achieved using
Intel MPI Library, version 4.0.3.
The Intel MPI Library is a multifabric
message passing library that implements
the MPI, v2 (MPI-2) specification
(http://www.intel.com/go/mpi). It is the
commercially supported, high-performance
software product based on MPICH2 from
Argonne National Laboratory.
Results and Discussion
The parameters used to study parallel
performance of HiFUN are speedup and
parallel efficiency defined as follows:
Ideal speedup: The ratio of the
number of compute cores used for a
given run to the reference number of
compute cores.
Actual speedup: The ratio of time
per iteration using reference number
of cores to the time per iteration using
number of compute cores for a given run.
Parallel efficiency: The ratio of actual
speedup to ideal speedup.
A typical CFD problem is amenable to
coarse grain parallelism, given the large
quantum of computation compared to
the communication associated with a
core. Therefore, for a given grid size
with an increase in the number of cores,
the problem becomes more and more
communication dominant, effectively
reducing the parallel efficiency. Hence,
based on a problem size, the user should
choose the number of processor cores
that ensures parallel efficiency around
85 percent in order to achieve optimal
utilization of computing resources and
fast turnaround time. Often, the minimum
number of cells per core for ensuring an
acceptable threshold parallel efficiency
(say 85 percent)—what we refer to as
the C-count—can be a good indicator
to the level of parallelism a CFD solver
offers. In fact, the C-count can be a very
useful indicator in determining the optimal
number of cores on a given machine
for different grid sizes. We use these
performance parameters to study the
scalability offered by the code HiFUN in
conjunction with Intel MPI Library.
Grid ID Grid Type Number of Cells
UG1 Hybrid unstructured: prisms + tetrahedrons 12.7 million
UG2 Hybrid unstructured: prisms + tetrahedrons 38.5 million
FG Hybrid unstructured: prisms + tetrahedrons 63.5 million
Table 1. Grids used for the computations
Figure 1. Surface grid on NASA Trap Wing Figure 2. Surface pressure distribution
3. Parallel Scalability Using Grid UG1
Figures 3 and 4 depict speedup and
parallel efficiency curves obtained using
grid UG1. From these figures it is evident
that the C-count for 85 percent parallel
efficiency achieved using the HiFUN
code is about 3300 cells per core on
the Endeavor system. This, indeed, is an
indicator of the high levels of scalability
HiFUN offers.
Parallel Scalability Using Grid UG2
Figures 5 and 6 depict the speedup
and parallel efficiency curves obtained
using grid UG2. From Figure 6, it can be
seen that HiFUN exhibits ideal parallel
performance for 2048 cores. It is also
interesting to note that in spite of the
very small size of the grid UG2, the drop in
parallel efficiency to 57 percent for 10248
cores is not severe and may be attributed
to communication dominance.
Parallel Scalability Using Grid FG
Figures 7 and 8 depict speedup and
parallel efficiency curves obtained using
grid FG. From figure 8, it can be seen that
HiFUN exhibits near ideal speed up for
4096 cores. It is also worth noting that for
7168 cores on the Pleiades platform, the
parallel efficiency is about 88 percent and
the C-count for this grid is about 8800
cells per core. It is interesting to observe
that even on 10248 cores, with a modest
grid size of about 63.5 million volumes,
the code HiFUN offers a very reasonable
parallel efficiency of about 75 percent.
Algorithmic Scalability Using Grid FG
Quite often, good parallel scalability can
be demonstrated by significantly cutting
down the communication loads, but this
adversely impacts the performance
of the parallel solvers. Therefore, the
real test for a highly scalable code
is the demonstration of algorithmic
scalability. Here, in order to demonstrate
the algorithmic scalability of HiFUN,
computations are performed for same
flow conditions on 2048, 7168, and 10248
processor cores. In all these computations,
the code HiFUN is executed until steady
state, indicated by density residue falling
by ten decades.
Figure 3. Speedup curve using grid UG1
Figure 5. Speed p curve using grid UG2
Figure 7. Speedup curve using grid FG
Figure 9. Comparison—Solution convergence
Figure 4. Parallel efficiency using grid
Figure 6. Parallel efficiency using grid UG2
Figure 8. Parallel efficiency using grid FG
Figure 10. Comparison—Axial coefficients evolution