Monitoring and tuning Spark applications

•

1 like•450 views

By Rohitash Jain - Hardware Trends - Data Locality Concept - Bottleneck factors - Phases in Reduce Task - Event Timeline in Spark UI - PROFILING TOOLS - SPARKLINT - DEMO

Technology

Performance Monitoring
in Spark Applications
ROHITASH JAIN,
SOFTWARE ENGINEER, SIGMOID

Bottleneck factors
▶ CPU
▶ DISK
▶ Memory
▶ Network

PROFILING TOOLS
▶ SPARKLINT
▶ FLAME GRAPHS
▶ GC LOGS from JVM

∙Live view of batch & streaming application stats or
Event by event analysis of historical event logs
∙Stats and graphs for:
∙Idle time
∙Core usage
∙Task locality
SPARKLINT

FLAME GRAPHS
▶ Flame Graph Visualization
Recipe:
▶ Gather multiple stack traces
▶ Aggregate them by sorting alphabetically by function/method name
▶ Visualization using stacked colored boxes
▶ Length of the box proportional to time spent there

HOW TO - FLAME GRAPHS
▶ Enable Java Flight Recorder eg: ./spark-submit --conf
"spark.driver.extraJavaOptions"="-XX:+UnlockCommercialFeatures -
XX:+FlightRecorder" --conf "spark.executor.extraJavaOptions"="-
XX:+UnlockCommercialFeatures -XX:+FlightRecorder“
▶ Collect information of the process with JMCD eg: jcmd <pid> JFR.start
duration=10s filename=$PWD/myoutput.jfr
▶ Refer to this article https://gist.github.com/kayousterhout/7008a8ebf2bab
eedc7ce6f8723fd1bf4 for converting your JFR file to an SVG file.

Memory profiling
▶ --conf ”spark.executor.extraJavaOptions=-XX:SurvivorRatio=16 -
XX:+UseG1GC - XX:+PrintGCDetails -XX:+PrintGCTimeStamps -
XX:+PrintReferenceGC - XX:+PrintAdaptiveSizePolicy”
▶ GC EASY for log analysis.

What’s missing from Spark metrics?
1. Time blocked on reading input data and writing output
data (HADOOP-11873)
2. Time spent spilling intermediate data to disk (SPARK-3577)

Getting Started With Spark Structured Streaming With Dustin Vannoy | Current 2022 Many data pipelines still default to processing data nightly or hourly, but information is created all the time and should be available much sooner. While the move to stream processing adds complexity, Spark Structured Streaming makes it achievable for teams of any size to switch to streaming. This session shares techniques for data engineers who are new to building streaming pipelines with Spark Structured Streaming. It covers how to implement real-time stream processes with Apache Spark and Apache Kafka. We will discuss general concepts for Spark Structured Streaming along with introductory code examples. We will also look at important streaming concepts like triggers, windows, and state. To connect it all we will walk through a complete pipeline, including a demo using PySpark, Apache Kafka, and Delta Lake tables

Java in flames

Isuru Perera

Java Colombo Meetup on 22nd March 2018 Speaker: Isuru Perera, Technical Lead at WSO2 Flame graphs are a visualization of profiled software and it was developed by Brendan Gregg, an industry expert in computing performance and cloud computing. Finding out why CPUs are busy is an important task when troubleshooting performance issues and we often use a sampling profiler to see which code-paths are hot. However, a profiler will dump a lot of data with thousands of lines and it is not easy to go through all data. With Flame Graphs, we can identify the most frequent code-paths quickly and accurately. Basically, a Flame Graph can simply visualize the stack traces output of a sampling profiler. There are many ways to profile Java applications and Java Flight Recorder (JFR) is a really good tool to profile a Java application with a very low overhead. I will show how we can generate a Flame Graph from a Java Flight Recording using the JFR Flame Graph tool (https://github.com/chrishantha/jfr-flame-graph) I developed. Since Flame Graphs can visualize any stack profiles, we can also use a Linux system profiler (perf) and create a Java Mixed-Mode Flame Graph, which will show how much CPU time is spent in Java methods, system libraries and the kernel. We can troubleshoot performance issues related to high CPU usage easily with a flame graph showing profile information from both system code paths and Java code paths. I will discuss how we can use the -XX:+PreserveFramePointer option in JDK and the perf system profiler to generate a Java Mixed-mode flame graph.

Getting Started with Spark Structured Streaming - Current 22

Dustin Vannoy

Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...

Databricks

As Real Time Analytics built on Spark Streaming framework gains increasing adoption and deployment in the data center, the need for hardware accelerators to scale for performance and offer lower latency for certain applications is becoming increasingly apparent. Recently reconfigurable accelerators based on FPGAs (Field Programmable Gate Arrays) have been proposed to accelerate the analytics workloads with support for low-latency streaming. This is becoming a feasible and attractive option now as major CSPs (Cloud Service Providers) announce deployment of FPGAs- as-a-Service (FPGAaaS). However there are certain challenges in using FPGA accelerators in the cloud that first must be overcome: (1) Development of FPGA accelerators is challenging because of the lack of mature high level language support and standard accelerator interfaces; (2) Using hardware accelerators from high level application frameworks like Spark require special consideration for the efficient transfer of data to and from the application; and (3) Managing the FPGA accelerators for sharing across different applications is limited by the lack of a common resource sharing paradigm. In this session, we’ll present a runtime framework that addresses these challenges. The framework will present highly optimized FPGA based accelerated functions as a service (AFaaS) to the application. The framework will demonstrate seamless integration of accelerators into existing Big Data Frameworks using Java, Scala, Pyton or other high level language APIs. The framework will also support the transfer of streaming data to/from the FPGA to the application and the sharing of accelerator libraries across different applications. Key takeaways: This paper demonstrates the feasibility of accelerating Real Time Analytics applications using Spark Streaming with accelerated functions based on FPGAaaS to deliver higher performance efficiencies at lower latencies. The paper will provide performance data for Spark Streaming applications using our solution compared to current native software implementations.

Stage Level Scheduling Improving Big Data and AI Integration

Databricks

In this talk, I will dive into the stage level scheduling feature added to Apache Spark 3.1. Stage level scheduling extends upon Project Hydrogen by improving big data ETL and AI integration and also enables multiple other use cases. It is beneficial any time the user wants to change container resources between stages in a single Apache Spark application, whether those resources are CPU, Memory or GPUs. One of the most popular use cases is enabling end-to-end scalable Deep Learning and AI to efficiently use GPU resources. In this type of use case, users read from a distributed file system, do data manipulation and filtering to get the data into a format that the Deep Learning algorithm needs for training or inference and then sends the data into a Deep Learning algorithm. Using stage level scheduling combined with accelerator aware scheduling enables users to seamlessly go from ETL to Deep Learning running on the GPU by adjusting the container requirements for different stages in Spark within the same application. This makes writing these applications easier and can help with hardware utilization and costs. There are other ETL use cases where users want to change CPU and memory resources between stages, for instance there is data skew or perhaps the data size is much larger in certain stages of the application. In this talk, I will go over the feature details, cluster requirements, the API and use cases. I will demo how the stage level scheduling API can be used by Horovod to seamlessly go from data preparation to training using the Tensorflow Keras API using GPUs. The talk will also touch on other new Apache Spark 3.1 functionality, such as pluggable caching, which can be used to enable faster dataframe access when operating from GPUs.

Build Large-Scale Data Analytics and AI Pipeline Using RayDP

Databricks

A large-scale end-to-end data analytics and AI pipeline usually involves data processing frameworks such as Apache Spark for massive data preprocessing, and ML/DL frameworks for distributed training on the preprocessed data. A conventional approach is to use two separate clusters and glue multiple jobs. Other solutions include running deep learning frameworks in an Apache Spark cluster, or use workflow orchestrators like Kubeflow to stitch distributed programs. All these options have their own limitations. We introduce Ray as a single substrate for distributed data processing and machine learning. We also introduce RayDP which allows you to start an Apache Spark job on Ray in your python program and utilize Ray’s in-memory object store to efficiently exchange data between Apache Spark and other libraries. We will demonstrate how this makes building an end-to-end data analytics and AI pipeline simpler and more efficient.

16 ARTIFACTS TO CAPTURE WHEN YOUR CONTAINER APPLICATION IS IN TROUBLE

Tier1 app

As spark applications move to a containerized environment, there are many questions about how to best configure server systems in the container world. In this talk we will demonstrate a set of tools to better monitor performance and identify optimal configuration settings. We will demonstrate how Prometheus, a project that is now part of the Cloud Native Computing Foundation (CNCF), can be applied to monitor and archive system performance data in a containerized spark environment. In our examples, we will gather spark metric output through Prometheus and present the data with Grafana dashboards. We will use our examples to demonstrate how performance can be enhanced through different tuned configuration settings. Our demo will show how to configure settings across the cluster as well as within each node.

Integrating ChatGPT with Apache Airflow

Tatiana Al-Chueyr

Getting Started with Apache Spark on Kubernetes

Databricks

Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release. Companies choose to run Spark on Kubernetes to use a single cloud-agnostic technology across their entire stack, and to benefit from improved isolation and resource sharing for concurrent workloads. In this talk, the founders of Data Mechanics, a serverless Spark platform powered by Kubernetes, will show how to easily get started with Spark on Kubernetes.

Sergey Gonchar - Fast rendering with Starling

Flash Conference

PyCon 2016: Personalised emails with Spark and Python

Tomas Sirny

Scaling your Data Pipelines with Apache Spark on Kubernetes

Databricks

There is no doubt Kubernetes has emerged as the next generation of cloud native infrastructure to support a wide variety of distributed workloads. Apache Spark has evolved to run both Machine Learning and large scale analytics workloads. There is growing interest in running Apache Spark natively on Kubernetes. By combining the flexibility of Kubernetes and scalable data processing with Apache Spark, you can run any data and machine pipelines on this infrastructure while effectively utilizing resources at disposal. In this talk, Rajesh Thallam and Sougata Biswas will share how to effectively run your Apache Spark applications on Google Kubernetes Engine (GKE) and Google Cloud Dataproc, orchestrate the data and machine learning pipelines with managed Apache Airflow on GKE (Google Cloud Composer). Following topics will be covered: – Understanding key traits of Apache Spark on Kubernetes- Things to know when running Apache Spark on Kubernetes such as autoscaling- Demonstrate running analytics pipelines on Apache Spark orchestrated with Apache Airflow on Kubernetes cluster.

PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...

AMD Developer Central

Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -

Yoshiyasu SAEKI

Profiling & Testing with Spark

Roger Rafanell Mas

Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...

Databricks

Spark SQL Catalyst optimizer, post query plan optimization, compiles the SQL query to Java code. Without code generation, such query expressions would have to be interpreted for each row of data, by walking down a tree of nodes. This introduces large amounts of branches and virtual function calls that slow down execution. With code generation, a query is collapsed into a single optimized function that eliminates multiple function calls and leverages CPU registers for intermediate data. This code is then compiled in runtime to Java bytecode using Janino compiler. This presentation focuses on further catalyst code generation optimizations possible using function outlining. Automatic code generation tools generally tend to generate huge optimized functions. Large functions that are frequently executed might degrade runtime performance by preventing JVM optimizations such as function inlining. To avoid this, code generation tools should try to contain independent logic into separate functions. This presentation will take the audience through the Spark Catalyst Code generation, how automatic split of large functions into smaller functions was achieved and the performance benefits associated with it

Taking Jenkins Pipeline to the Extreme

yinonavraham

Going live with BommandBox and docker Into The Box 2018

Ortus Solutions, Corp

Into The Box 2018 Going live with commandbox and docker

Ortus Solutions, Corp

Securing Infrastructure as a Code - DevFest 2022 Presentation

Obika Gellineau

Spark streaming with kafka

Dori Waldman

Spark stream - Kafka

Dori Waldman

Spark 2.x Troubleshooting Guide

IBM

From common errors seen in running Spark applications, e.g., OutOfMemory, NoClassFound, disk IO bottlenecks, History Server crash, cluster under-utilization to advanced settings used to resolve large-scale Spark SQL workloads such as HDFS blocksize vs Parquet blocksize, how best to run HDFS Balancer to re-distribute file blocks, etc. you will get all the scoop in this information-packed presentation.

Deep Dive into GPU Support in Apache Spark 3.x

Databricks

Lessons Learnt from Running Thousands of On-demand Spark Applications

Itai Yaffe

Ada Sharoni (Software Engineering Architect) @ Hunters: Imagine you had to manage thousands of Spark applications that are automatically spinning up on-demand upon every customer interaction. Our unique constraints in Hunters have led us to adopt an architecture and concepts that we believe many other companies will find useful. In this lecture we will share our solutions and insights in running many lightweight, cheap Spark applications on Kubernetes, that can easily survive frequent restarts and smartly share resources on Spot EC2 instances.

Master your java_applications_in_kubernetes

Andy Moncsek

Kubernetes, and containers in general, are more and more the preferred run-time environment for Java middleware applications. For Java, as we knew it until Java 8, this is not the ideal prerequisite, since containers are very volatile and resources are shared among other containers. Since Java 9, there are many efforts to optimize the JVM for container environments. In this session, you will learn different ways to optimize you Java application to run in containers. From enhanced class sharing, to native compilation. But also, how to build small Docker images and to run them with correct configurations, in respect to the resource-constraints of your environment.

Structured Streaming Using Spark 2.1

Sigmoid

Real-Time Stock Market Analysis using Spark Streaming

Sigmoid

Similar to Monitoring and tuning Spark applications

SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK

zmhassan

Integrating ChatGPT with Apache Airflow

Tatiana Al-Chueyr

Getting Started with Apache Spark on Kubernetes

Databricks

Sergey Gonchar - Fast rendering with Starling

Flash Conference

PyCon 2016: Personalised emails with Spark and Python

Tomas Sirny

Scaling your Data Pipelines with Apache Spark on Kubernetes

Databricks

PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...

AMD Developer Central

Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -

Yoshiyasu SAEKI

Profiling & Testing with Spark

Roger Rafanell Mas

Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...

Databricks

Taking Jenkins Pipeline to the Extreme

yinonavraham

Going live with BommandBox and docker Into The Box 2018

Ortus Solutions, Corp

Into The Box 2018 Going live with commandbox and docker

Ortus Solutions, Corp

Securing Infrastructure as a Code - DevFest 2022 Presentation

Obika Gellineau

Spark streaming with kafka

Dori Waldman

Spark stream - Kafka

Dori Waldman

Spark 2.x Troubleshooting Guide

IBM

Deep Dive into GPU Support in Apache Spark 3.x

Databricks

Lessons Learnt from Running Thousands of On-demand Spark Applications

Itai Yaffe

Master your java_applications_in_kubernetes

Andy Moncsek

Similar to Monitoring and tuning Spark applications (20)

SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK

Integrating ChatGPT with Apache Airflow

Getting Started with Apache Spark on Kubernetes

Sergey Gonchar - Fast rendering with Starling

PyCon 2016: Personalised emails with Spark and Python

Scaling your Data Pipelines with Apache Spark on Kubernetes

PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...

Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -

Profiling & Testing with Spark

Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...

Taking Jenkins Pipeline to the Extreme

Going live with BommandBox and docker Into The Box 2018

Into The Box 2018 Going live with commandbox and docker

Securing Infrastructure as a Code - DevFest 2022 Presentation

Spark streaming with kafka

Spark stream - Kafka

Spark 2.x Troubleshooting Guide

Deep Dive into GPU Support in Apache Spark 3.x

Lessons Learnt from Running Thousands of On-demand Spark Applications

Master your java_applications_in_kubernetes

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Nexer Digital

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack. I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Quantum Computing: Current Landscape and the Future Role of APIs

Vlad Stirbu

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

Welocme to ViralQR, your best QR code generator.

ViralQR

Welcome to ViralQR, your best QR code generator available on the market! At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies. Our Vision We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide. Our Achievements Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes. Our Services At ViralQR, here is a comprehensive suite of services that caters to your very needs: Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses. Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding. Pricing and Packages Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service. Why choose us? ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations. Comprehensive Analytics Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country. So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

The Future of Platform Engineering

Jemma Hussein Allen

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

Peter Spielvogel

Building better applications for business users with SAP Fiori. • What is SAP Fiori and why it matters to you • How a better user experience drives measurable business benefits • How to get started with SAP Fiori today • How SAP Fiori elements accelerates application development • How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities • How SAP Fiori paves the way for using AI in SAP apps

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

UiPath Test Automation using UiPath Test Suite series, part 3

DianaGray10

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

nkrafacyberclub

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Elizabeth Buie - Older adults: Are we really designing for our future selves?

GraphRAG is All You need? LLM & Knowledge Graph

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

When stars align: studies in data quality, knowledge graphs, and machine lear...

Quantum Computing: Current Landscape and the Future Role of APIs

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Elevating Tactical DDD Patterns Through Object Calisthenics

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Welocme to ViralQR, your best QR code generator.

Essentials of Automations: Optimizing FME Workflows with Parameters

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

PCI PIN Basics Webinar from the Controlcase Team

The Future of Platform Engineering

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

UiPath Test Automation using UiPath Test Suite series, part 3

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

Monitoring and tuning Spark applications

1. Performance Monitoring in Spark Applications ROHITASH JAIN, SOFTWARE ENGINEER, SIGMOID

2. Hardware Trends

3. Comparison

6. Data Locality Concept

7. Bottleneck factors ▶ CPU ▶ DISK ▶ Memory ▶ Network

8. Phases in Reduce Task

9. Event Timeline in Spark UI

10. Task Composition

11. PROFILING TOOLS ▶ SPARKLINT ▶ FLAME GRAPHS ▶ GC LOGS from JVM

12. ∙Live view of batch & streaming application stats or Event by event analysis of historical event logs ∙Stats and graphs for: ∙Idle time ∙Core usage ∙Task locality SPARKLINT

13. Design

14. Screenshot

15.

16. FLAME GRAPHS ▶ Flame Graph Visualization Recipe: ▶ Gather multiple stack traces ▶ Aggregate them by sorting alphabetically by function/method name ▶ Visualization using stacked colored boxes ▶ Length of the box proportional to time spent there

17. HOW TO - FLAME GRAPHS ▶ Enable Java Flight Recorder eg: ./spark-submit --conf "spark.driver.extraJavaOptions"="-XX:+UnlockCommercialFeatures - XX:+FlightRecorder" --conf "spark.executor.extraJavaOptions"="- XX:+UnlockCommercialFeatures -XX:+FlightRecorder“ ▶ Collect information of the process with JMCD eg: jcmd <pid> JFR.start duration=10s filename=$PWD/myoutput.jfr ▶ Refer to this article https://gist.github.com/kayousterhout/7008a8ebf2bab eedc7ce6f8723fd1bf4 for converting your JFR file to an SVG file.

18. Memory profiling ▶ --conf ”spark.executor.extraJavaOptions=-XX:SurvivorRatio=16 - XX:+UseG1GC - XX:+PrintGCDetails -XX:+PrintGCTimeStamps - XX:+PrintReferenceGC - XX:+PrintAdaptiveSizePolicy” ▶ GC EASY for log analysis.

19. What’s missing from Spark metrics? 1. Time blocked on reading input data and writing output data (HADOOP-11873) 2. Time spent spilling intermediate data to disk (SPARK-3577)

20. Questions ??

Monitoring and tuning Spark applications

Recommended

Recommended

More Related Content

Similar to Monitoring and tuning Spark applications

Similar to Monitoring and tuning Spark applications (20)

More from Sigmoid

More from Sigmoid (20)

Recently uploaded

Recently uploaded (20)

Monitoring and tuning Spark applications