Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong
JavaOne 2010.
Abstract: It's Friday evening and you hear the first rumble . . . one java node has become slightly unresponsive. You lookup the process, get a thread dump, and for good measure restart it at 8 p.m. Saturday afternoon is when you realize that other nodes have caught the flu and you get the ugly call from the customer. In a matter of hours, you're on that conference bridge with support groups of different packages and Java vendors and one of your uberarchitects. Yes, production instances are up and down, and restarting like there's no tomorrow. Here's an accumulated compendium of the op 10 things that can cause Java production heartburn and what to do when your Java production is on fire. And yes, please have your tools belt on.
Speaker(s):
Cliff Click, Azul Systems, Distinguished Engineer
SriSatish Ambati, Azul Systems, Performance Engineer
Slides for JavaOne 2015 talk by Brendan Gregg, Netflix (video/audio, of some sort, hopefully pending: follow @brendangregg on twitter for updates). Description: "At Netflix we dreamed of one visualization to show all CPU consumers: Java methods, GC, JVM internals, system libraries, and the kernel. With the help of Oracle this is now possible on x86 systems using system profilers (eg, Linux perf_events) and the new JDK option -XX:+PreserveFramePointer. This lets us create Java mixed-mode CPU flame graphs, exposing all CPU consumers. We can also use system profilers to analyze memory page faults, TCP events, storage I/O, and scheduler events, also with Java method context. This talk describes the background for this work, instructions generating Java mixed-mode flame graphs, and examples from our use at Netflix where Java on x86 is the primary platform for the Netflix cloud."
Slides for JavaOne 2015 talk by Brendan Gregg, Netflix (video/audio, of some sort, hopefully pending: follow @brendangregg on twitter for updates). Description: "At Netflix we dreamed of one visualization to show all CPU consumers: Java methods, GC, JVM internals, system libraries, and the kernel. With the help of Oracle this is now possible on x86 systems using system profilers (eg, Linux perf_events) and the new JDK option -XX:+PreserveFramePointer. This lets us create Java mixed-mode CPU flame graphs, exposing all CPU consumers. We can also use system profilers to analyze memory page faults, TCP events, storage I/O, and scheduler events, also with Java method context. This talk describes the background for this work, instructions generating Java mixed-mode flame graphs, and examples from our use at Netflix where Java on x86 is the primary platform for the Netflix cloud."
Introduction to DTrace (Dynamic Tracing), written by Brendan Gregg and delivered in 2007. While aimed at a Solaris-based audience, this introduction is still largely relevant today (2012). Since then, DTrace has appeared in other operating systems (Mac OS X, FreeBSD, and is being ported to Linux), and, many user-level providers have been developed to aid tracing of other languages.
Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
Talk by Brendan Gregg for USENIX ATC 2017.
"Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis. They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.
This talk describes the background for this work, and the challenges encountered when profiling stack traces and resolving symbols for different languages, including for just-in-time compiler runtimes. Instructions will be included generating mixed-mode flame graphs on Linux, and examples from our use at Netflix with Java. Advanced flame graph types will be described, including differential, off-CPU, chain graphs, memory, and TCP events. Finally, future work and unsolved problems in this area will be discussed."
Performance has always been a major concern in software development and should not be taken lightly even when commodity computers have multicore CPUs and a few gigabytes of RAM. One of the most handy, simple tools for performance testing are microbenchmarks. Unfortunately, developing correct Java microbenchmarks is a complex task with many pitfalls on the way. This presentation is about the Do's and Don'ts of Java microbenchmarking and about what tools are out there to help with this tricky task.
Introduction to DTrace (Dynamic Tracing), written by Brendan Gregg and delivered in 2007. While aimed at a Solaris-based audience, this introduction is still largely relevant today (2012). Since then, DTrace has appeared in other operating systems (Mac OS X, FreeBSD, and is being ported to Linux), and, many user-level providers have been developed to aid tracing of other languages.
Surge 2014: From Clouds to Roots: root cause performance analysis at Netflix. Brendan Gregg.
At Netflix, high scale and fast deployment rule. The possibilities for failure are endless, and the environment excels at handling this, regularly tested and exercised by the simian army. But, when this environment automatically works around systemic issues that aren’t root-caused, they can grow over time. This talk describes the challenge of not just handling failures of scale on the Netflix cloud, but also new approaches and tools for quickly diagnosing their root cause in an ever changing environment.
Analyzing OS X Systems Performance with the USE MethodBrendan Gregg
Talk for MacIT 2014. This talk is about systems performance on OS X, and introduces the USE Method to check for common performance bottlenecks and errors. This methodology can be used by beginners and experts alike, and begins by constructing a checklist of the questions we’d like to ask of the system, before reaching for tools to answer them. The focus is resources: CPUs, GPUs, memory capacity, network interfaces, storage devices, controllers, interconnects, as well as some software resources such as mutex locks. These areas are investigated by a wide variety of tools, including vm_stat, iostat, netstat, top, latency, the DTrace scripts in /usr/bin (which were written by Brendan), custom DTrace scripts, Instruments, and more. This is a tour of the tools needed to solve our performance needs, rather than understanding tools just because they exist. This talk will make you aware of many areas of OS X that you can investigate, which will be especially useful for the time when you need to get to the bottom of a performance issue.
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
Talk by Brendan Gregg for USENIX ATC 2017.
"Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis. They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.
This talk describes the background for this work, and the challenges encountered when profiling stack traces and resolving symbols for different languages, including for just-in-time compiler runtimes. Instructions will be included generating mixed-mode flame graphs on Linux, and examples from our use at Netflix with Java. Advanced flame graph types will be described, including differential, off-CPU, chain graphs, memory, and TCP events. Finally, future work and unsolved problems in this area will be discussed."
Performance has always been a major concern in software development and should not be taken lightly even when commodity computers have multicore CPUs and a few gigabytes of RAM. One of the most handy, simple tools for performance testing are microbenchmarks. Unfortunately, developing correct Java microbenchmarks is a complex task with many pitfalls on the way. This presentation is about the Do's and Don'ts of Java microbenchmarking and about what tools are out there to help with this tricky task.
Java profilers lie. Some more, some less. Deal with it. The information you get from a profiler is usually just about right - but with no guarantees.
This might become rather frustrating especially when you picked all the low hanging fruits and need to squeeze the last few tens of nanoseconds from critical parts. Well, you first need to identify the critical parts.
This talk will provide an overview of the common profiler lies, show you why microbenchmarking is "The Good Thing"(tm) and what would it take form JVM to change this.
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
Cache & Concurrency considerations for a high performance Cassandra deployment.
SriSatish Ambati
Cassandra has hit it's stride as a distributed java NoSQL database! It's fast, it's in-memory, it's scalable, it's seda; It's eventually consistent model makes it practical for the large & growing volumes of unstructured data usecases. It is also time to run it through the filters of performance analysis. For starters it runs on the java virtual machine and inherits the capabilities and culpabilities of the platform. This presentation reviews the runtime architecture, cache behavior & performance of a real-world workload on Cassandra. We blend existing system & jvm tools to get a quick overview & a breakdown of hotspots in the get, put & update operations. We highlight the role played by garbage collection & fragmentation due to long lived objects; We investigate lock contention in the data structures under concurrent usage. Cassandra uses UDP for management & TCP for data: we look at robustness of the communication patterns during high spikes and cluster-wide events. We review Non-Blocking Hashmap modifications to Cassandra that improve concurrency & amplify performance of this frontrunner in the NoSQL space
ApacheCon2010 NA
Wed, 03 November 2010 15:00
cassandra
Mtc learnings from isv & enterprise interactionGovind Kanshi
This is one of the dated presentation for which I keep getting requests for, please do reach out to me for status on various things as Azure keeps fixing/innovating whole of things every day.
There are bunch of other things I can help you on to ensure you can take advantage of Azure platform for oss, .net frameworks and databases.
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
This is little dated deck for our learnings - I keep getting multiple requests for it. I have removed one slide for access permissions (RBAC -which are now available).
What to do in case in which an application does not provide the desired performance? If you have ever had problems with optimizing the performance of Java applications, surely you had to invest a solid amount of time to find out the real cause for the problems, which included the involvement of administrators and developers. Is there a way to shorten the time required to find a solution, what free tools are available for this purpose and to check that you have finally solved the problem? In this presentation, we will try to provide answers to these questions with concrete real life examples.
With multicore systems becoming the norm, every programmer is being forced to deal with multi-CPU memory atomicity bugs: data races. Data-race bugs are some of the hardest bugs to find and fix, sometimes taking weeks on end, even for experts. There are very few tools to help here (mostly just academic implementations). The authors of this presentation are at the forefront of multicore Java technology-based systems and daily have to debug data races. They have a lot of hard-won experiences with finding and fixing such bugs, and they share them with you in this presentation.
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Spark Summit
Spark is by its nature very fault tolerant. However, faults, and application failures, can and do happen, in production at scale.
In this talk, we’ll discuss the nuts and bolts of fault tolerance in Spark.
We will begin with a brief overview of the sorts of fault tolerance offered, and lead into a deep dive of the internals of fault tolerance. This will include a discussion of Spark on YARN, scheduling, and resource allocation.
We will then spend some time on a case study and discussing some tools used to find and verify fault tolerance issues. Our case study comes from a customer who experienced an application outage that was root caused to a scheduler bug. We discuss the analysis we did to reach this conclusion and the work that we did to reproduce it locally. We highlight some of the techniques used to simulate faults and find bugs.
At the end, we’ll discuss some future directions for fault tolerance improvements in Spark, such as scheduler and checkpointing changes.
Maria DB Galera Cluster for High AvailabilityOSSCube
Want to understand how to set high availability solutions for MySQL using MariaDB Galera Cluster? Join this webinar, and learn from experts. During this webinar, you will also get guidance on how to implement MariaDB Galera Cluster.
A lab given at the Reversim Summit on 19 February 2013.
http://summit2013.reversim.com/#/sessions/Lab:%20Java%20Production%20Debugging%20101
The code for the sample scenarios can be found on GitHub: https://github.com/holograph/examples/tree/master/reversim-proddbg-lab
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
MongoDB presentation from Silicon Valley Code Camp 2015.
Walkthrough developing, deploying and operating a MongoDB application, avoiding the most common pitfalls.
Top 10 Performance Gotchas for scaling in-memory Algorithms.srisatish ambati
Top 10 Data Parallelism and Model Parallelism lessons from scaling H2O.
"Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm."
Compendium of my Brisk, Cassandra & Hadoop talks of the Summer 2011 - Delivered at JavaOne2011. I like the content in this one personally as it touches, Usecase driven intro to Cassandra, NoSQL followed by Intro to hadoop - MapReduce, HDFS internals, NameNode and JobTrackers. And how Brisk decomposes the Single point of failures in HDFS while providing a single form for Realtime & Batch storage and processing.
(And it seemed enjoyable to the audience in attendance)
SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop.
This talk lays out a few talking points for Apache Cassandra.
Brisk - Truly peer-to-peer hadoop.
Brisk is an open-source Hadoop & Hive distribution that uses Apache Cassandra for its core services and storage. Brisk makes it possible to run Hadoop MapReduce on top of CassandraFS, an HDFS-compatible storage layer. By replacing HDFS with CassandraFS, users leverage MapReduce jobs on Cassandra’s peer-to-peer, fault-tolerant and scalable architecture.
With CassandraFS all nodes are peers. Data files can be loaded through any node in the cluster and any node can serve as the JobTracker for MapReduce jobs. Hive MetaStore is stored & accessed as just another column family (table) on the distributed data store. Brisk makes Hadoop truly peer-to-peer.
We demonstrate visualisation & monitoring of Brisk using OpsCenter. The operational simplicity of cassandra’s multi-datacenter & multi-region aware replication makes Brisk well-suited for a rich set of Applications and usecases. And by being able to store and isolate hdfs & online data within the same data cluster, Brisk makes analytics possible without ETL!
LA Scalability Talk, Mahalo
May 31.2011
SF Java presentation of jvm goes to big data.
“Slowly yet surely the JVM is going to Big Data! In this fun filled presentation we see what pieces of Java & JVM triumph or unravel in the battle for performance at high scale!”
Concurrency is the currency of scale on multi-core & the new generation of collections and non-blocking hashmaps are well worth the time taking a deep dive into. We take a quick look at the next gen serialization techniques as well as implementation pitfalls around UUID. The achilles' heel for JVM remains Garbage Collection: a deep dive into the internals of the memory model, common GC algorithms and their tuning knobs is always a big draw. EC2 & cloud present us with a virtualized & unchartered territory for scaling the JVM.
We will leave some room for Q&A or fill it up with any asynchronous I/O that might queue up during the talk. A round of applause will be due to the various tools that are essentials for Java performance debugging.
invited netflix talk: JVM issues in the age of scale! We take an under the hood look at java locking, memory model, overheads, serialization, uuid, gc tuning, CMS, ParallelGC, java.
Caching in Java - A review of different caching vendors (Oracle Coherence, Apache Cassandra, Infinispan, Ehcache/Terracotta, etc) and limitations presented by the underlying Java Platform.
Presented at RedHat Summit 2010, Boston
Speakers: SriSatish Ambati, Performance Engg
Manik Surtani, InfiniSpan Lead
Presentation details from RH Summit:
How to Stop Worrying & Start Caching in Java
SriSatish Ambati — Performance & Partner Engineer, Azul Systems, Inc.
Manik Surtani — Principal Software Engineer, Red Hat
Application data caching has come of age as distributed and large cache clusters are now common. The next generation of applications that depend on efficient caching has come into being and data and cache size explosion has set in.
In this session, Azul Systems’ SriSatish Ambati and Red Hat’s Manik Surtani will survey performance characteristics of different cache algorithms, their implementations (e.g., implementing a 200Gb data cache size), and how well they work in practical JVM deployments. In each scenario, they will present patterns of architecture that scale, and demonstrate where read and write performance stands in the context of increasing cache sizes and concurrency.
Throughout this discussion, they will recognize several villains, including heap fragmentation, long-lived objects, multi-VM communication, socket handlers, and queue managers. SriSatish and Manik will take a fun-filled “whodunit” approach to portray the roles played by each villain in killing cache performance.
http://www.redhat.com/promo/summit/2010/sessions/jboss.html
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When Things Go Wrong
1. Top 10 Issues for
Java in Production
SriSatish Ambati
Cliff Click Jr.
Azul Systems, Inc
2. A Decade of
Java in Production
• A lot of hard-earned wisdom
• A lot of victories (quickly forgotten)
• A lot of endless pain points
• Usually the Pain Point is really
A Systems Issue
• It's Not Just the JVM (nor network, nor ...)
3. Tools of the Trade
• What the JVM is doing:
– dtrace, hprof, introscope, jconsole, visualvm,
yourkit, azul zvision
• Invasive JVM observation tools:
– bci, jvmti, jvmdi/pi agents, logging
• What the OS is doing:
– dtrace, oprofile, vtune
• What the network/disk is doing:
– ganglia, iostat, lsof, nagios, netstat
5. • Symptom
– Production monitoring can be very expensive
Staging environment does not repro issues
– Instrumented code changes cache profile
– MBeans are not cheap either!
• Solutions
– Pick the right axe for the problem!
– Avoid expensive heap walks
– Finish task then increment perf counters
– Asynchronous logging, jconsole, azul zvision
10 - Instrumentation is
Not Cheap
6. 9 - Leaks
• Symptom
– App consumes all the memory you got
– Live Heap trend is a ramping sawtooth
– Then slows, then throws OutOfMemory
• Tools
– yourkit, hprof, eclipse mat, jconsole,
jhat, jps, visualvm, azul zvision
• Theory
– Allocated vs Live Objects, vm memory, Perm Gen
– Finalizers, ClassLoaders, ThreadLocal
13. • Symptom
– Multi-node scale-out does not scale linearly
– Time in both CPU and I/O (serialization costs)
• Tools
–Cpu profiling, I/O profiling
• Solution
– All serialization libraries are not equal!
– Pick a high performance serialization library or
roll-your-own
– Avro, kryo, protocol-buffers, thrift
8 – I/O: Serialization
14. • Symptom
– Application hangs or remote call fails after awhile
– “Too many open File Descriptors”, “Cursors”
– Inconsistent response times
• Tools
– nagios, pkg, rpm info, ulimit, yum
• Solutions
– Check for “new” OS patches, user & process limits,
network & semaphore configurations
– Close all I/O streams
– Maybe you are I/O bound!
8 – I/O: Limits, Tuning
15. • Symptoms
– Socket.create/close takes too long
– JRMP timeouts, long JDBC calls
– Running out of file descriptors, cursors, disk
• Tools
– dbms tools, du, iostat, gmon, lsof, netstat
• Workaround
– Check all O/S patches, sysctl flags,
run ping/telnet test
– Check & set SO_LINGER, TCP_LINGER2
8 – I/O: Sockets, Files, DB
17. • Symptoms
– Adding users / threads / CPUs causes app slow down
(less throughput, worse response)
– High lock acquire times & contention
– Race conditions, deadlock, I/O under lock
• Tools
– d-trace, lockstat, azul zvision
• Solution
– Use non-blocking Collections
– Striping locks, reducing hold times, no I/O
7 – Locks & synchronized
18. Example: IBM Visual Analyzer
(j.u.c view in eclipse)
Zillion threads acquiring same lock
j.u.c.ConcurrentLock is still a lock!
Need a non-blocking collection
(or stripe lock or lower hold times, etc)
19. Example: zvision
Hot lock is usually 10x to 100x
more acquire time than next lock..
Look for rapidly growing acquire times!
21. • Symptom
– Time “compiling”
– Time in the Interpreter
• Tools
– -XX:+PrintCompilation, cpu profiler
– Find endlessly-recompiling method
• Workaround
– Exclude using .hotspot_compiler file
• Root cause: It's a JVM Bug! File a bug report!
6 – Endless Compilation
22. • Symptom
– Application spends time in j.l.T.fillInStackTrace()
• Tools
– Cpu profiler, azul zvision
– Thread dumps (repeated kill -3, zvision)
– Track caller/callee to find throw'r
• Not all exceptions appear in log files
• Solution
– Don't Throw, alternate return value (e.g. null)
5 – Endless Exceptions
23. • Related
– Exception paths are typically failure paths
– JVMs do not to optimize them much
– Often found when a server collapses
5 – Endless Exceptions
24. • Symptom
– Performance degrades over time
– Inducing a “Full GC” makes problem go away
– Lots of free memory but in tiny fragments
• Tools
– GC logging flags, e.g. for CMS
-XX:PrintFLSStatistics=2
-XX:+PrintCMSInitiationStatistics
4 - Fragmentation
25. • Tools
– “Fragger” www.azulsystems.com/resources
• Tiny cpu cost, low memory cost
• Frag's heap in 60sec like an hour in production
• Get FullGC cycles at dev's desk
• Solution
– Upgrade to latest CMS (CR:6631166)
– Azul Zing & Gen Pauseless GC
– Pooling similar sized/aged objects
• (really hard to get right!)
4 - Fragmentation
26. • Symptom
– Entropy(gc) == number_of_gc_flags
• Too many free parameters
• 64-bit/large heap size is not a solution
– Constant 40-60% CPU utilization by GC
– Scheduled reboot before full-GC
– Full time Engineer working GC flags;
• Workarounds
– Ask JVM Vendor to give 1 flag solution
– G1 GC, Azul’s Zing GPGC
3 – GC Tuning
30. • Symptom
– Different nodes have different configurations,
different stack components, versions
– classpath has dist/*, -verbose:class
– subtle hard to reproduce issues
• Solution
– Method. Version Control.
– Good ol’ fashioned rigor
1 – Versionitis
When ears wage class wars with jars
“It can only be attributable to human error” - HAL
31. • Runs fine as load Ramps Up
– At peak load, system is unstable
– Slightly above peak: Collapse!
• Heavy load triggers exception (e.g. timeout)
• Exception path is slow already (e.g. logging)
• Transaction retried (so more work yet again)
• So NEXT transaction times-out
• Soon all time spent throwing & logging exceptions
• No forward progress
0 – Collapse Under Load
(pick any 3 above!)
33. References:
Java.util.concurrent lock profiling
http://infoq.com/jucprofiler
Java serialization benchmarks
http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
Memory profiling with yourkit
http://yourkit.com
Tuning gc
http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html
http://blog.codecentric.de/en/2010/01/
java-outofmemoryerror-a-tragedy-in-seven-acts/
Cliff Click's High Scale lib, Non-Blocking HashMap
http://sourceforge.net/projects/high-scale-lib/
Q & A
(& Refs 1 of 2)
34. References:
Memory Leak
http://deusch.org/blog/?p=9
Handy list of jvm options
http://blogs.sun.com/watt/resource/jvm-options-list.html
Fragger (with source code)
http://www.azulsystems.com/resources
Garbage Collection: Algorithms for Automatic Dynamic Memory
Management, Richard Jones, Rafael D Lins
Q & A
(& Refs 2 of 2)
35. Backup slide–
Fragmentation
• Works well for hours at
300-400MB
– Same workload
• Suddenly haywire
– Promotion
• Too frequently
– Back to back FullGCs
– May not all be
completing.