ACIC is a system which automatically searches for optimized I/O system configurations from many candidates for each individual HPC application running on a given cloud platform.
This work was published in SuperComputing 2013, Denver. See event http://sc13.supercomputing.org/schedule/event_detail.php-evid=pap127.html
This work investigates the performance of Big Data applications in virtualized Hadoop environments. An evaluation and comparison of the performance of applications running on a virtualized Hadoop cluster with separated data and computation layers against standard Hadoop installation is presented.
http://clds.sdsc.edu/wbdb2014.de/program
Optimizing High Performance Computing Applications for EnergyDavid Lecomber
Energy and power usage in high performance computing and supercomputing is a major issue for system owners and users - we take a look at what developers and administrators can do to reduce application energy costs
اسلایدهای کارگاه پردازش های موازی با استفاده از زیرساخت جی پی یو GPU
اولین کارگاه ملی رایانش ابری کشور
وحید امیری
vahidamiry.ir
دانشگاه صنعتی امیرکبیر - 1391
This work investigates the performance of Big Data applications in virtualized Hadoop environments. An evaluation and comparison of the performance of applications running on a virtualized Hadoop cluster with separated data and computation layers against standard Hadoop installation is presented.
http://clds.sdsc.edu/wbdb2014.de/program
Optimizing High Performance Computing Applications for EnergyDavid Lecomber
Energy and power usage in high performance computing and supercomputing is a major issue for system owners and users - we take a look at what developers and administrators can do to reduce application energy costs
اسلایدهای کارگاه پردازش های موازی با استفاده از زیرساخت جی پی یو GPU
اولین کارگاه ملی رایانش ابری کشور
وحید امیری
vahidamiry.ir
دانشگاه صنعتی امیرکبیر - 1391
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit
Talk Abstract
Accumulo has a solid theoretical foundation, endowing it with huge scalability, high reliability, and the makings of class-leading performance for NoSQL operations. Several publications show Accumulo achieving multi-petabyte scalability and outperforming other databases in its class by orders of magnitude. However, there are challenges arising in practice that slow down that performance and introduce bottlenecks.
The root of Accumulo's distributed scale and performance while maintaining consistency comes from a multi-level amplification. Zookeeper bootstraps the consistency with a highly durable quorum. The Accumulo root table uses buffering and caching to boost that performance for sorted key/value operations. With the metadata tablets and data tables, Accumulo continues to boost performance and divides and conqures a highly scalable key/value space to leverage the resources of a large cluster. The challenge arrises when metadata operations at the core of Accumulo bottleneck performance for the entire cluster.
In this talk we will describe the Accumulo metadata operations model in detail. With a couple of prototypical application scenarios, we will show a few areas that are current bottlenecks or that we can expect to be bottlenecks in the near future. We will also propose modifications to the current model and outline projects that the community can take on to keep Accumulo in the lead for performance and scalability.
Speaker
Adam Fuchs
Chief Technology Officer, Sqrrl
As the Chief Technology Officer and co-founder of Sqrrl, Adam Fuchs is responsible for ensuring that Sqrrl is leading the world in Big Data Infrastructure technology. Previously at the National Security Agency, Adam was an innovator and technical director for several database projects, handling some of the world’s largest and most diverse data sets. He is a co-founder of the Apache Accumulo project. Adam has a BS in Computer Science from the University of Washington and has completed extensive graduate-level course work at the University of Maryland.
Brief intro into the problem and perspectives of OpenCL and distributed heterogeneous calculations with Hadoop. For Big Data Dive 2013 (Belarus Java User Group).
In KDD2011, Vijay Narayanan (Yahoo!) and Milind Bhandarkar (Greenplum Labs, EMC) conducted a tutorial on "Modeling with Hadoop". This is the second half of the tutorial.
This was a presentation on my book MapReduce Design Patterns, given to the Twin Cities Hadoop Users Group. Check it out if you are interested in seeing what my my book is about.
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
Machine Learning at the Limit
John Canny, UC Berkeley
How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms.
Bio
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He is currently a Data Science Senior Fellow in Berkeley's new Institute for Data Science and holds a INRIA (France) International Chair. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, Quantcast and Microsoft. He currently works on several applications of data mining for human learning (MOOCs and early language learning), health and well-being, and applications in the sciences.
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big DataPingCAP
The performance of analytical query processing in data management systems depends primarily on the capabilities of the system's query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer.
In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture.
In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.
In this video from the HPC User Forum in Santa Fe, Yoonho Park from IBM presents: IBM Datacentric Servers & OpenPOWER.
"Big data analytics, machine learning and deep learning are among the most rapidly growing workloads in the data center. These workloads have the compute performance requirements of traditional technical computing or high performance computing, coupled with a much larger volume and velocity of data."
Watch the video: http://wp.me/p3RLHQ-gJv
Learn more: https://openpowerfoundation.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Mingliang Liu
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.
http://dl.acm.org/citation.cfm?id=2745876
جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...muzaffertahir9
ان مسلمانوں کے لئے لمحہ فکریہ جو چوہدویں صدی میں امام مہدی کے منتظر تھے
مقام شکر ہے کہ نام نہاد علماءنے یہ تسلیم کرلیا ہے کہ چوہدویں صدی ختم ہوگئی ہے ۔ سارا عالم اسلام چوہدویں صدی میں امام مہدی کا شدت سے منتظر تھا ۔ اور بعض علماءکہا کرتے تھے کہ چوہدویں صدی ختم نہ ہوگی جب تک امام مہدی ظاہر نہ ہوجائے ۔ اگر چوہدویں صدی میں امام مہدی کے ظہور کے عقیدہ میں مسلمان درست تھے اور یقیناً درست تھے تو معلوم ہوا کہ آنے والا تو چوہدویں صدی میں آچکا لیکن نام نہاد علماءنے اخفاءحق میں مہارت رکھنے کے باعث مسلمانوں کو سچے مہدی کی شناخت سے محروم رکھا ۔
میں ہی وہ مسیح موعود ہوں جس کے آخری زمانہ میں، جبکہ ضلالت پھیل جائے گی، آنے کا وعدہ دیا گیاہے۔ عیسیٰ یقیناًفوت ہو گیاہے اور مذہب تثلیث جھوٹ اورباطل ہے۔ توُ یقیناًاپنے دعویٰ نبوت میں اللہ پر افتراء کر رہا ہے ۔ نبوت تو ہمارے نبی کریم ﷺ پرختم ہو گئی۔ اور اب کوئی کتاب نہیں مگر قرآن جو سابقہ صحف سے بہتر ہے اوراب کوئی شریعت نہیں مگر شریعت محمدیہ۔ تاہم میں خیرالبشر ؐ کی زبانِ مبارک سے ’’نبی‘‘ کا نا م دیا گیاہوں اوریہ ظلّی بات ہے اور آپؐ کی پیروی کی برکات کا نتیجہ ہے ۔ میں اپنے آپ میں کوئی ذاتی خوبی نہیں دیکھتا بلکہ میں نے جو کچھ بھی پایاہے اُسی مقدس نفس کے واسطہ سے ہی پایاہے۔ اور اللہ تعالیٰ نے
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit
Talk Abstract
Accumulo has a solid theoretical foundation, endowing it with huge scalability, high reliability, and the makings of class-leading performance for NoSQL operations. Several publications show Accumulo achieving multi-petabyte scalability and outperforming other databases in its class by orders of magnitude. However, there are challenges arising in practice that slow down that performance and introduce bottlenecks.
The root of Accumulo's distributed scale and performance while maintaining consistency comes from a multi-level amplification. Zookeeper bootstraps the consistency with a highly durable quorum. The Accumulo root table uses buffering and caching to boost that performance for sorted key/value operations. With the metadata tablets and data tables, Accumulo continues to boost performance and divides and conqures a highly scalable key/value space to leverage the resources of a large cluster. The challenge arrises when metadata operations at the core of Accumulo bottleneck performance for the entire cluster.
In this talk we will describe the Accumulo metadata operations model in detail. With a couple of prototypical application scenarios, we will show a few areas that are current bottlenecks or that we can expect to be bottlenecks in the near future. We will also propose modifications to the current model and outline projects that the community can take on to keep Accumulo in the lead for performance and scalability.
Speaker
Adam Fuchs
Chief Technology Officer, Sqrrl
As the Chief Technology Officer and co-founder of Sqrrl, Adam Fuchs is responsible for ensuring that Sqrrl is leading the world in Big Data Infrastructure technology. Previously at the National Security Agency, Adam was an innovator and technical director for several database projects, handling some of the world’s largest and most diverse data sets. He is a co-founder of the Apache Accumulo project. Adam has a BS in Computer Science from the University of Washington and has completed extensive graduate-level course work at the University of Maryland.
Brief intro into the problem and perspectives of OpenCL and distributed heterogeneous calculations with Hadoop. For Big Data Dive 2013 (Belarus Java User Group).
In KDD2011, Vijay Narayanan (Yahoo!) and Milind Bhandarkar (Greenplum Labs, EMC) conducted a tutorial on "Modeling with Hadoop". This is the second half of the tutorial.
This was a presentation on my book MapReduce Design Patterns, given to the Twin Cities Hadoop Users Group. Check it out if you are interested in seeing what my my book is about.
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
Machine Learning at the Limit
John Canny, UC Berkeley
How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms.
Bio
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He is currently a Data Science Senior Fellow in Berkeley's new Institute for Data Science and holds a INRIA (France) International Chair. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, Quantcast and Microsoft. He currently works on several applications of data mining for human learning (MOOCs and early language learning), health and well-being, and applications in the sciences.
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big DataPingCAP
The performance of analytical query processing in data management systems depends primarily on the capabilities of the system's query optimizer. Increased data volumes and heightened interest in processing complex analytical queries have prompted Pivotal to build a new query optimizer.
In this paper we present the architecture of Orca, the new query optimizer for all Pivotal data management products, including Pivotal Greenplum Database and Pivotal HAWQ. Orca is a comprehensive development uniting state-of-the-art query optimization technology with own original research resulting in a modular and portable optimizer architecture.
In addition to describing the overall architecture, we highlight several unique features and present performance comparisons against other systems.
In this video from the HPC User Forum in Santa Fe, Yoonho Park from IBM presents: IBM Datacentric Servers & OpenPOWER.
"Big data analytics, machine learning and deep learning are among the most rapidly growing workloads in the data center. These workloads have the compute performance requirements of traditional technical computing or high performance computing, coupled with a much larger volume and velocity of data."
Watch the video: http://wp.me/p3RLHQ-gJv
Learn more: https://openpowerfoundation.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Combining Phase Identification and Statistic Modeling for Automated Parallel ...Mingliang Liu
Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time- and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPrime, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters to create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPrime benchmarks. They retain the original applications' performance characteristics, in particular their relative performance across platforms. Also, the result benchmarks, already released online, are much more compact and easy-to-port compared to the original applications.
http://dl.acm.org/citation.cfm?id=2745876
جس نے مہدی کے ظہور کا انکار کیا اس نے گویا ان باتوں کا انکار کیا جو محمد ؐ پر...muzaffertahir9
ان مسلمانوں کے لئے لمحہ فکریہ جو چوہدویں صدی میں امام مہدی کے منتظر تھے
مقام شکر ہے کہ نام نہاد علماءنے یہ تسلیم کرلیا ہے کہ چوہدویں صدی ختم ہوگئی ہے ۔ سارا عالم اسلام چوہدویں صدی میں امام مہدی کا شدت سے منتظر تھا ۔ اور بعض علماءکہا کرتے تھے کہ چوہدویں صدی ختم نہ ہوگی جب تک امام مہدی ظاہر نہ ہوجائے ۔ اگر چوہدویں صدی میں امام مہدی کے ظہور کے عقیدہ میں مسلمان درست تھے اور یقیناً درست تھے تو معلوم ہوا کہ آنے والا تو چوہدویں صدی میں آچکا لیکن نام نہاد علماءنے اخفاءحق میں مہارت رکھنے کے باعث مسلمانوں کو سچے مہدی کی شناخت سے محروم رکھا ۔
میں ہی وہ مسیح موعود ہوں جس کے آخری زمانہ میں، جبکہ ضلالت پھیل جائے گی، آنے کا وعدہ دیا گیاہے۔ عیسیٰ یقیناًفوت ہو گیاہے اور مذہب تثلیث جھوٹ اورباطل ہے۔ توُ یقیناًاپنے دعویٰ نبوت میں اللہ پر افتراء کر رہا ہے ۔ نبوت تو ہمارے نبی کریم ﷺ پرختم ہو گئی۔ اور اب کوئی کتاب نہیں مگر قرآن جو سابقہ صحف سے بہتر ہے اوراب کوئی شریعت نہیں مگر شریعت محمدیہ۔ تاہم میں خیرالبشر ؐ کی زبانِ مبارک سے ’’نبی‘‘ کا نا م دیا گیاہوں اوریہ ظلّی بات ہے اور آپؐ کی پیروی کی برکات کا نتیجہ ہے ۔ میں اپنے آپ میں کوئی ذاتی خوبی نہیں دیکھتا بلکہ میں نے جو کچھ بھی پایاہے اُسی مقدس نفس کے واسطہ سے ہی پایاہے۔ اور اللہ تعالیٰ نے
Taller pedagogía y tecnología de la evaluación formativa adelantada2para enviarAlfredo Prieto Martín
Se describen Apps y sistemas basados en papel para la respuesta personal en la evaluación formativa. Tecnologías y metodologías para el fomento de estudio previo y la flipped classroom
History graphic design - print • photo • digital • 3 dProductz
Graphic Design by Peter Craycroft 1985 - 2016 - The cutting edge of exhibit and retail design build 1985-2016. In this brief visual history I first take the opportunity to acknowledge and to thank the many, many wonderful people who have inspired me, counseled me and worked so very hard beside me to create great work. www.productz.biz provides links to many amazing people pictured here. The images speak for themselves.
Global Services Location Index 2016 | A.T. KearneyKearney
Now in its seventh edition, the A.T. Kearney Global Services Location Index tracks the contours of the offshoring landscape in 55 countries across three major categories: financial attractiveness, people skills and availability, and business environment. This year’s report finds a new business model threatening established concepts of offshoring and expanding the market: automation combined with business process as a service (BPaaS) has the potential to be an even more powerful force for disruptive change than automation alone.
How to optimize Hortonworks Apache Spark ML workloads on Power - POWER 8/9 architecture is the latest offering from IBM and OpenPower foundation. It is the perfect platform for optimizing Hortonworks Spark's performance. During this presentation we will walk the audience through steps required to optimize YARN, HDFS, and Spark on a Power cluster.
Step required:
1) Classify workload into CPU, Memory, IO or mixed (CPU, memory, IO) intensive
2) Characterize "out-of-box" Hortonworks spark workload to understand CPU, Memory, IO and Network performance characteristics
3) Floor Plan cluster resources
4) Tune "out-of-box" workload to navigate "Roofline" Performance space in the above named dimensions
5) If workload is Memory / IO/ Network intensive bound then tune SPARK to increase operational intensity operations/byte as much as possible to make it CPU bound
6) Divide search space into regions and perform exhaustive search.
7) Identify Performance bottlenecks by resource monitoring and tune the System, JVM or application layer by profiling application and hardware counters if required.
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this “black-arts” have got started. We have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, the best parameters and the best features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. Our evaluation with real open data demonstrates that our system could explore hundreds of predictive models and discovers the highly-accurate predictive model in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive amount of models on Spark, particularly from reliability and stability standpoints.
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this “black-arts” have got started. We have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, the best parameters and the best features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. Our evaluation with real open data demonstrates that our system could explore hundreds of predictive models and discovers the highly-accurate predictive model in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive amount of models on Spark, particularly from reliability and stability standpoints.
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
High performance computing in the cloud is enabling high scale compute- and graphics-intensive workloads across industries, ranging from aerospace, automotive, and manufacturing to life sciences, financial services, and energy. AWS provides application developers and end users with unprecedented computational power for massively parallel applications, in areas such as large-scale fluid and materials simulations, 3D content rendering, financial computing, and deep learning. This session provides an overview of HPC capabilities on AWS, describes the newest generations of accelerated computing instances (including P2), as well as highlighting customer and partner use-cases across industries.
Attendees learn about best practices for running HPC workflows in the cloud, including graphical pre- and post-processing, workflow automation, and optimization. Attendees also learn about new and emerging HPC use cases: in particular, deep learning training and inference, large-scale simulations, and high performance data analytics.
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
RAMSES: A new project in data-driven analytical modeling of distributed systems
RAMSES is a new DOE-funded project on the end-to-end analytical performance modeling of science workflows in extreme-scale science environments. It aims to link multiple threads of inquiry that have not, until now, been adequately connected: namely, first-principles performance modeling within individual sub-disciplines (e.g., networks, storage systems, applications), and data-driven methods for evaluating, calibrating, and synthesizing models of complex phenomena. What makes this fusion necessary is the drive to explain, predict, and optimize not just individual system components but complex end-to-end workflows. In this talk, I will introduce the goals of the project and some aspects of our technical approach.
Timely genome analysis requires a fresh approach to platform design for big data problems. Louisiana State University has tested enterprise cluster deployments of Redis with a unique solution that allows flash memory to act as extended RAM. Learn about how this solution allows large amounts of data to be handled with a fraction of the memory needed for a typical deployment.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
Even though there have been a large number of proposals to accelerate databases using specialized hardware, often the opinion of the community is pessimistic: the performance and energy efficiency benefits of specialization are seen to be outweighed by the limitations of the proposed solutions and the additional complexity of including specialized hardware, such as field programmable gate arrays (FPGAs), in servers. Recently, however, as an effect of stagnating CPU performance, server architectures started to incorporate various programmable hardware and the availability of such components brings opportunities to databases. In the light of a shifting hardware landscape and emerging analytics workloads, it is time to revisit our stance on hardware acceleration. In this talk we highlight several challenges that have traditionally hindered the deployment of hardware acceleration in databases and explain how they have been alleviated or removed altogether by recent research results and the changing hardware landscape. We also highlight a new set of questions that emerge around deep integration of heterogeneous programmable hardware in tomorrow’s databases.
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
Presentation for the sudoers Barcelona group 0ct 06 2015, on benchmarking Hadoop with ALOJA open source benchmarking platform. The presentation was mostly a live DEMO, posting some slides for the people who could not attend.
http://lanyrd.com/2015/sudoers-barcelona-october/
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
Innovative companies are building Internet of Things, mobile, content management, single view, and big data apps on top of MongoDB. In this session, we'll explore how the IBM POWER8 platform brings new levels of performance and ease of configuration to these solutions which already benefit from easier and faster design and development using MongoDB.
The slides for the first ever SnappyData webinar. Covers SnappyData core concepts, programming models, benchmarks and more.
SnappyData is open sourced here: https://github.com/SnappyDataInc/snappydata
We also have a deep technical paper here: http://www.snappydata.io/snappy-industrial
We can be easily contacted on Slack, Gitter and more: http://www.snappydata.io/about#contactus
Application Profiling at the HPCAC High Performance Centerinside-BigData.com
Pak Lui from the HPC Advisory Council presented this deck at the 2017 Stanford HPC Conference.
"To achieve good scalability performance on the HPC scientific applications typically involves good understanding of the workload though performing profile analysis, and comparing behaviors of using different hardware which pinpoint bottlenecks in different areas of the HPC cluster. In this session, a selection of HPC applications will be shown to demonstrate various methods of profiling and analysis to determine the bottleneck, and the effectiveness of the tuning to improve on the application performance from tests conducted at the HPC Advisory Council High Performance Center."
Watch the video presentation: http://wp.me/p3RLHQ-gpY
Learn more: http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
This talk will describe his research into using Hadoop to query and manage big geographic datasets, specifically OpenStreetMap(OSM). OSM is an “open-source” map of the world, growing at a large rate, currently around 5TB of data. The talk will introduce OSM, detail some aspects of the research, but also discuss his experiences with using the SpatialHadoop stack on Azure and Google Cloud.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Globus Compute wth IRI Workflows - GlobusWorld 2024
ACIC: Automatic Cloud I/O Configurator for HPC Applications
1. ACIC:
AUTOMATIC CLOUD I/O CONFIGURATOR
FOR HPC APPLICATIONS
Mingliang Liu*, Ye Jin^, Jidong Zhai*, Yan Zhai*,
Qianqian Shi*, Xiaosong Ma^, Wenguang Chen*
*Tsinghua University
^North Carolina State University
1SuperComputing 20136/28/2017
2. Background
2
• HPC in Cloud
• Dedicated for high-end cloud computing in science
• Trend to migrate HPC applications to cloud
SuperComputing 20136/28/2017
3. HPC in Cloud – pros and cons
3
• Local Clusters
+ Dedicated IB network
+ Run at physical machine
- Fixed nodes types/numbers
- Shared OS / file system / libraries
- Gap between I/O and computation
- Fixed device types/numbers
- One-size-fits-all configuration
- Per-platform configuring options
• HPC in Cloud [Yan’11]
- Shared 10Gb Ethernet
- Virtualization overhead
+ Online instance acquisition
+ Fully controlled virtual machines
- I/O overhead by virtualization
+ Multiple device/QoS choice
+ Application specific configuration
+ Shared cloud options by all users
Key Idea: Help users find desired I/O system configurations
SuperComputing 20136/28/2017
4. Does I/O Configuration Matter?
4
• Configurations differ in performance and cost [Mingliang’11]
• No single I/O system configuration beats all
• Optimal configurations for performance and cost contradict
SuperComputing 2013
BTIO application with 6 I/O configurations. The lower the better
6/28/2017
7. What Can We Configure?
7
File System
File system internal parameters
(Stripe Size: 64KB/4MB)
File system
(NFS vs. PVFS2)
I/O Server
I/O server number
(1/2/4)
I/O server placement
(Dedicated vs. Part-time)
Storage Device
Software RAID
(RAID 0 vs. No RAID)
Device number
(1/2)
Cloud storage device type
(EBS vs. Ephemeral vs. SSD)
SuperComputing 20136/28/2017
8. What Do Configurations Depend On?
8
Name Value
Number of all processes {32, 64, 128, 256}
Number of I/O processes {32, 64, 128, 256}
I/O interface {POSIX, MPIIO}
I/O iteration count {1, 10, 100}
Data size {1, 4, 16, 32, 128, 512} MB
Request size {256KB, 4MB,16MB, 128MB}
Read and/or write {read, write}
Collective {yes, no}
File sharing {share, individual}
SuperComputing 20136/28/2017
• Target (performance, or cost)
• Workload I/O Characteristics
9. How to Configure Optimally?
• Configure I/O system by hand [Heshan’11]
9SuperComputing 2013
• Try all configurations for one application
• Configuration burden to scientific users
• Time- and money-consuming
6/28/2017
Hard
Expensive
• Obvious gaps between manual configurations and optimal ones
10. Our Approach
• Automatically predict and select optimal I/O configurations
• Map workload I/O characteristics to configurations
10SuperComputing 20136/28/2017
I/O System Configuration Options
Name Value
Disk device {EBS, ephemeral}
File system {NFS, PVFS2}
Instance type {cc1.4xlarge, cc2.8xlarge}
I/O server number {1, 2, 4}
Placement {part-time, dedicated}
Stripe size {64KB, 4MB}
Workload I/O Characteristics
Name Value
Number of all processes {32, 64, 128, 256}
Number of I/O processes {32, 64, 128, 256}
I/O interface {POSIX, MPIIO}
I/O iteration count {1, 10, 100}
Data size {1, 4, 16, 32, 128, 512} MB
Request size {256KB, 4MB,16MB, 128MB}
Read and/or write {read, write}
Collective {yes, no}
File sharing {share, individual}
15 dimension > 1M
20. Name Domain CPU Network Read/Write API
BTIO Physics High High Write MPIIO
FLASHIO Astrophysics Low Low Write MPIIO
mpiBLAST Biology Medium Medium Read POSIX
MADbench2 Cosmology Low Medium Read & Write MPIIO
• Selected HPC Workloads
Evaluation - Applications
20SuperComputing 20136/28/2017
21. App. Proc. Device P/D FS IO Servers Strip Size
BTIO
64 EBS P NFS 1 N/A
256 eph. P PVFS2 4 4MB
FLASHIO
64 eph. D NFS 1 N/A
256 eph. P NFS 1 N/A
mpiBLAST
32 eph. P PVFS2 4 64KB
64 eph. D PVFS2 4 4MB
128 eph. D PVFS2 4 4MB
MADbenc
h2
64 eph. D PVFS2 4 4MB
256 EBS D PVFS2 4 4MB
• Optimal Performance Configurations
7/9: It’s difficult to guess optimal one even within the 5-D space.
Evaluation - No One Excels All
21SuperComputing 20136/28/2017
9
test
cases
7
unique
configs
Guess?
22. Effectiveness of Exec. Time Optimization
22
Median
ACIC
Baseline
SuperComputing 20136/28/2017
• Large performance range under different configurations
• Near optimal configurations predicted by ACIC
23. Effectiveness of Total Cost Saving
23SuperComputing 20136/28/2017
• Even better results in total cost saving by ACIC
24. Training More Data
6/28/2017 SuperComputing 2013 24
(a) Execution time (over baseline)
Figure 7: Accuracy enhancement from examining top-k
0
20
40
60
80
100
7 8 9 10 11 12 13 14 15
0.1
1
10
100
1000
Costsavingunderbaseline(%)
Trainingcost(K$)
Number of model papameters
Training cost
BTIO-64
FLASHIO-256
mpiBLAST-128
MADbench2-256
Figure 8: Impact on prediction performance using di↵erent
numbers of top ranking model parameters
cost of only
timization e
(by collectin
appears to
the estimate
exponential
exploring th
straints, we
dimensions,
will bring si
5.5 Com
20%
40%
60%
80%
100%• More training data points, higher prediction accuracy
• The gain is heavily application-dependent
• Training cost increases exponentially
1000$ × c
100,000$ × c
26. Conclusion
• I/O configurations is crucial to HPC in cloud
• Manual configuration is error-prone even for experts
• Automatic I/O configurator is helpful
• Building a prediction model is challenging
• Reduce high dimensional space to sample training data
• Reuse training data in crowd-sourcing way to amortize cost
26SuperComputing 20136/28/2017
27. 27
http://hpc.cs.tsinghua.edu.cn/ACIC
SuperComputing 2013
• Thank Heshan Lin and Ruini Xue for joining user study
• Thank anonymous reviewers for their useful comments
• Supported in China: 863 NO.2012AA01A302, NSFC 61133006 and 61103021
• Supported in U.S.: NSF awards (CNS-0546301, CNS-0915861, and CCF-0937908)
6/28/2017
28. References
• [Yan’11] Y. Zhai, M. Liu, J. Zhai, X. Ma, and W. Chen. Cloud Versus In-house
Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI
Applications. In SC. ACM, 2011.
• [Plackett’46] R. Plackett and J. Burman. The Design of Optimum Multifactorial
Experiments. Biometrika, 1946.
• [Olshen’84] L. Olshen and C. Stone. Classication and Regression Trees.
Wadsworth International Group, 1984.
• [Mesnier’07] M. Mesnier, M. Wachs, R. Sambasivan, A. Zheng, and G. Ganger.
Modeling the Relative Fitness of Storage. In SIGMETRICS. ACM, 2007.
• [Mingliang’11] Mingliang Liu and Jidong Zhai and Yan Zhai and Xiaosong Ma
and Wenguang Chen. One Optimized I/O Configuration per HPC Application:
Leveraging The Configurability of Cloud. In APSys. ACM, 2011.
• [Heshan’11] H. Lin, X. Ma, W. Feng, and N. Samatova. Coordinating
Computation and I/O in Massively Parallel Sequence Search. IEEE Transactions
on Parallel and Distributed Systems, 2011.
• [Shan’08] H. Shan, K. Antypas, and J. Shalf. Characterizing and Predicting the
I/O Performance of HPC Applications Using a Parameterized Synthetic
Benchmark. In SC. IEEE, 2008.
28SuperComputing 20136/28/2017
Editor's Notes
As the cloud computing becomes increasingly popular, cloud providers begin to support dedicated instances for high-end cloud computing in science.
Thus there is a trend that HPC users are migrating their applications from traditional HPC resources to cloud。
But the HPC in cloud has not grabbed everyone’s mind.
We compared the cloud platform with the local clusters and list the pros and cons.
There are disadvantages of HPC cloud such as the shared 10 Gb Ethernet and virtualization overhead.
While, there are advantages as well.
For example, local clusters have fixed types and numbers of nodes but the we can acquire more instances online and it charges us in a pay-as-you-go approach.
However, there I/O gap in the local clusters is enlarged in cloud.
Fortunately, there are some further potentials which may make the cloud more competitive.
For example, cloud provides multiple device/instance/QoS choices.
We can configure the cloud according to our application’s needs.
As to the configuration options, they are shared by all users of the same cloud,
which makes it possible to reuse the configuration efforts and amortize the cost.
A question is proposed before we move on: does I/O configuration matter?
Here is our preliminary results.
We run the BT-IO of NPB benchmark with 6 I/O configurations varying
File system type (PVFS2 vs. NFS)
I/O servers (1, 2, or 4) numbers
And their placement strategy (dedicated vs. part-time)
Each line in the above figures indicates the result of one configuration.
The y axis is the total execution time or the cost of one run.
The x axis is the number of processes.
We can see from the figures that: 1, 2, 3.
Here is the out line of this talk.
After introduced the motivation,
we define the problem and then propose our tool to address it.
We will show some interesting results and conclude it briefly.
This figure shows the configuration stack of Amazon EC2 platform.
There are three categories, the first one is the storage device configurations, the second is the file system and server configuration and the third are the internal parameters.
We also listed the sample values of the configuration options.
Well, among all these configurations, what’s the optimal one?
Obviously, it depends on our application’s I/O needs and our target.
The target can be minimizing overall execution time, or saving the total cost.
This table lists the important application I/O characteristics we should consider, in order to find the optimal configurations.
Confident users may try to do this by hand.
We invited an experienced user and a developer to configure the I/O system for the mpiBLAST application from 32 configuration candidates.
And compared the total run time and cost of their configurations with the result of the optimal one.
The black bars are the performance improvement of user selected configurations, the dotted bars are the performance improvement of developer selected configurations, and the white bars are the optimal ones of all the candidates.
Conservative users would like to try all configurations for their applications and select the optimal one for the future runs.
For example, performance variance should be considered for one trial
Here is the out line of this talk.
After introduced the motivation,
we define the problem and then propose our tool to address it.
We will show some interesting results and conclude it briefly.
To sample the training data from the exploration space, we need a smarter way than choosing randomly.
We later realized that the parameters differ from each other in importance.
So it’s straightforward to reduce the exploration space by choosing the top parameters and train all their sampled combinations to bootstrap, then we can add more parameters incrementally
We use a magical technique called PB matrix to evaluate the importance of the parameters so that we can select the most important parameters from the huge exploration space.
PB matrix was proposed for agricultural crops experiment design and quality control in manufacturing, it’s able to evaluate the parameters’ importance using a few of experiment trials.
Because in the cloud computing, it takes time and money to run the trials.
There are five parameters in this example table, ABCDE. We use the recipe PB matrix and there are 8 rows for this sample.
For each run, the value for each parameter is set according to one row of the PB Matrix, whose elements are assigned with binary values (either “+1” or “-1”) based on pre-specified PB design rules. For example, in the first row, we use high value for all parameters except D, which will use the low value.
The “high” and “low” values are selected to be at the two ends of the parameter value range.
After the runs are completed, the importance of each parameter is calculated as the dot product of the parameter and the result column.
In this example, parameter D is considered most important and parameter B is considered least important.
This table lists the rank of all the 15 parameters, as well as the sampled values.
We use the top 10 parameters to bootstrap ACIC.
More parameters can be added later using this rank as guidance..
Run the synthetic benchmark named IOR
Vary parameter to mimic different behaviors
Set up I/O system with all configuration candidates
Collect results for different targets: Performance, or Cost
By continuous, crowd-sourcing training, the ACIC can effortlessly deal with cloud hardware/software upgrades with common data aging methods.
Why CART?
Obvious difference in importance of parameters
Simple, flexible, and interpretable
We can tolerate absolute prediction error as long as the rank of the configuration is close to the real one.
Here is a CART example.
There are two kinds of node in a decision tree, internal nodes and leaf nodes.
Each internal node has a predictor to split the values into two sub-groups. The left child and the right child.
To build this tree, every internal node is split if the variance of the values is large enough.
Each leaf node has the final sub-group value, indicating by the AVG field.
For each input, we can get the final value by traversing from the root node to this node.
Now we introduced the three parts of ACIC.
Let’s see some interesting data.
We choose this baseline because it’s simple and popular.
Selected application differ from each other in many characteristics including the scientific area, CPU/network usage, read/write and APIs.
There are 9 cases composed by application as well as the scale.
We exhaustively tested all candidate configurations sampled before by running the 4 applications at different scales.
The total run time with each configuration is indicated by a gray dot.
The vertical span of gray dots depicts the range of measured total execution time for the entire configuration space.
The lowest dot in each figure is the measured optimal configuration.
The black points highlight the total run time under the ACIC recommended I/O configuration.
The solid red line marks the median performance among all configuration candidates,
while the dashed black line marks the performance of the baseline (B) I/O configuration.
Speedup ratios achieved by ACIC over the median and baseline are shown at the top of each figure.
First, these figures clearly demonstrate the potentially large difference, in overall execution time caused by different I/O system configurations
Second, ACIC is able to identify near-optimal I/O configurations in almost all situations, as the black points are located near the bottom of the gray “spectrum”.
The cost is calculated by the execution time, the number of instances and the price per instance per hour.
This figures presents the results of parameter sensitivity using four sample runs, one for each application.
The x axis indicates the number of top ranking parameters used in model training as ordered by PB matrix.
For each parameter count, the y axis on the left measures the performance of the ACIC top recommendation in terms of cost saving over the baseline,
while the y axis on the right measures the cost of training data collection.
When using 10 parameters, the total training data collection cost is around $1K
Here is the out line of this talk.
After introduced the motivation,
we define the problem and then propose our tool to address it.
We will show some interesting results and conclude it briefly.
We published ACIC to the HPC community. Users can download the training database and build the CART model to predict the optimal I/O system configurations for their applications.
New contributions are heavily welcome.
Please scan this bar code and visit the homepage of ACIC.
That’s all thank you!