Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with metrics.dropwizard.io library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes.
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with metrics.dropwizard.io library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes.
Autopiloting Realtime Processing in HeronStreamlio
Heron is a streaming data processing engine developed at Twitter. This presentation explains how resiliency and self-tuning have been built into Heron.
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with metrics.dropwizard.io library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes.
Autopiloting Realtime Processing in HeronStreamlio
Heron is a streaming data processing engine developed at Twitter. This presentation explains how resiliency and self-tuning have been built into Heron.
Aan het schoolfeest op 30 mei 2015 van basisschool park was een fotowedstrijd gekoppeld.
Stemmen kan nog altijd voor de beste foto op http://www.bspark.be/index.php?id=949 (niet mobiel, aanmelding op Facebook vereist)
of via de berichten (zie knopje links) op de Facebookpagina https://www.facebook.com/basisschool.park
Stemmen kan door de foto te liken!
4Developers 2015: Measure to fail - Tomasz KowalczewskiPROIDEA
YouTube: https://www.youtube.com/watch?v=H5F0D55nKX4&index=11&list=PLnKL6-WWWE_WNYmP_P5x2SfzJ7jeJNzfp
Tomasz Kowalczewski
Language: English
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with com.codahale metrics library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. We will check how graphite averages data just to helplessly watch important latency spikes disappear. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes
This presentation gives a lot of insights into Jimdo's infrastructure that hosts 20 million websites. To enable our application developers to quickly launch and improve their services, we've created a platform called Wonderland that does all the infrastructure work them.
In this talk, I present the parts of Wonderland related to monitoring and logging. You can learn about our Prometheus setup as well as how we stream log messages from Docker to Logstash.
In this deck, Torsten Hoefler from ETH Zurich presents: Scientific Benchmarking of Parallel Computing Systems.
"Measuring and reporting performance of parallel computers constitutes the basis for scientific advancement of high-performance computing. Most scientific reports show performance improvements of new techniques and are thus obliged to ensure reproducibility or at least interpretability. Our investigation of a stratified sample of 120 papers across three top conferences in the field shows that the state of the practice is not sufficient. For example, it is often unclear if reported improvements are in the noise or observed by chance. In addition to distilling best practices from existing work, we propose statistically sound analysis and reporting techniques and simple guidelines for experimental design in parallel computing. We aim to improve the standards of reporting research results and initiate a discussion in the HPC field. A wide adoption of this minimal set of rules will lead to better reproducibility and interpretability of performance results and improve the scientific culture around HPC."
Learn more: https://htor.inf.ethz.ch/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Aan het schoolfeest op 30 mei 2015 van basisschool park was een fotowedstrijd gekoppeld.
Stemmen kan nog altijd voor de beste foto op http://www.bspark.be/index.php?id=949 (niet mobiel, aanmelding op Facebook vereist)
of via de berichten (zie knopje links) op de Facebookpagina https://www.facebook.com/basisschool.park
Stemmen kan door de foto te liken!
4Developers 2015: Measure to fail - Tomasz KowalczewskiPROIDEA
YouTube: https://www.youtube.com/watch?v=H5F0D55nKX4&index=11&list=PLnKL6-WWWE_WNYmP_P5x2SfzJ7jeJNzfp
Tomasz Kowalczewski
Language: English
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with com.codahale metrics library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. We will check how graphite averages data just to helplessly watch important latency spikes disappear. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes
This presentation gives a lot of insights into Jimdo's infrastructure that hosts 20 million websites. To enable our application developers to quickly launch and improve their services, we've created a platform called Wonderland that does all the infrastructure work them.
In this talk, I present the parts of Wonderland related to monitoring and logging. You can learn about our Prometheus setup as well as how we stream log messages from Docker to Logstash.
In this deck, Torsten Hoefler from ETH Zurich presents: Scientific Benchmarking of Parallel Computing Systems.
"Measuring and reporting performance of parallel computers constitutes the basis for scientific advancement of high-performance computing. Most scientific reports show performance improvements of new techniques and are thus obliged to ensure reproducibility or at least interpretability. Our investigation of a stratified sample of 120 papers across three top conferences in the field shows that the state of the practice is not sufficient. For example, it is often unclear if reported improvements are in the noise or observed by chance. In addition to distilling best practices from existing work, we propose statistically sound analysis and reporting techniques and simple guidelines for experimental design in parallel computing. We aim to improve the standards of reporting research results and initiate a discussion in the HPC field. A wide adoption of this minimal set of rules will lead to better reproducibility and interpretability of performance results and improve the scientific culture around HPC."
Learn more: https://htor.inf.ethz.ch/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
Apache Spark: the next big thing? - StampedeCon 2014
Steven Borrelli
It’s been called the leading candidate to replace Hadoop MapReduce. Apache Spark uses fast in-memory processing and a simpler programming model to speed up analytics and has become one of the hottest technologies in Big Data.
In this talk we’ll discuss:
What is Apache Spark and what is it good for?
Spark’s Resilient Distributed Datasets
Spark integration with Hadoop, Hive and other tools
Real-time processing using Spark Streaming
The Spark shell and API
Machine Learning and Graph processing on Spark
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Aleksandr Tavgen
Talk about approaches to an observability. Do we need millions of metrics? Anomalies vs regularities? Can Machine Learning help us? Some abilities of Flux language by InfluxData
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
Apache Beam is Flink’s sibling in the Apache family of streaming processing frameworks. The Beam and Flink teams work closely together on advancing what is possible in streaming processing, including Streaming SQL extensions and code interoperability on both platforms.
Beam was originally developed at Google as the amalgamation of its internal batch and streaming frameworks to power the exabyte-scale data processing for Gmail, YouTube and Ads. It now powers a fully-managed, serverless service Google Cloud Dataflow, as well as is available to run in other Public Clouds and on-premises when deployed in portability mode on Apache Flink, Spark, Samza and other runners. Users regularly run distributed data processing jobs on Beam spanning tens of thousands of CPU cores and processing millions of events per second.
In this session, Sergei Sokolenko, Cloud Dataflow product manager, and Reuven Lax, the founding member of the Dataflow and Beam team, will share Google’s learnings from building and operating a global streaming processing infrastructure shared by thousands of customers, including:
safe deployment to dozens of geographic locations,
resource autoscaling to minimize processing costs,
separating compute and state storage for better scaling behavior,
dynamic work rebalancing of work items away from overutilized worker nodes,
offering a throughput-optimized batch processing capability with the same API as streaming,
grouping and joining of 100s of Terabytes in a hybrid in-memory/on-desk file system,
integrating with the Google Cloud security ecosystem, and other lessons.
Customers benefit from these advances through faster execution of jobs, resource savings, and a fully managed data processing environment that runs in the Cloud and removes the need to manage infrastructure.
A 2015 presentation to introduce users to Java profiling. The Yourkit Profiler is used for concrete examples. The following topics are covered:
1) When to profile
2) Profiler sampling
3) Profiler instrumentation
4) Where to Start
5) Macro vs micro benchmarking
Using Time Series for Full Observability of a SaaS PlatformDevOps.com
Aleksandr Tavgen from Playtech, the world’s largest online gambling software supplier, will share how they are using InfluxDB 2.0, Flux, and the OpenTracingAPI to gain full observability of their platform. In addition, he will share how InfluxDB has served as the glue to cope with multiple sets of time series data.
Performance doesn’t have the same definition between system administrators, developpers and business teams. What is Performance ? High CPU usage, not scalable web site, low business transaction rate per sec, slow response time, … This presentation is about maths, code performance, load testing, web performance, best practices, … Working on performance optimizaton is a very broad topic. It’s important to really understand main concepts and to have a clean and strong methodology because it could be a very time consumming activity. Happy reading !
Data Modelling is an important tool in the toolbox of a developer. By building and communicating a shared understanding of the domain they're working with, their applications and APIs are more useable and maintainable. However, as you scale up your technical teams, how do you keep these benefits whilst avoiding time-consuming meetings every time something new comes along? This talk reminds ourselves of key data modelling technique and how our use of Kafka changes and informs them. It then examines how these patterns change as more teams join your organisation and how Kafka comes into its own in this world.
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx
From our Feb 25, 2016 webcast on operating Elasticsearch at scale, the metrics to monitor, and how to create low-noise meaningful alerts on Elasticsearch performance.
It covers general problem of creating monitoring and observability without killing your Ops motivation team with False Positives and unexplained alerts.
Problems on this side, pitfalls, anti-patterns, and how to make it right.
How to manage a monitoring zoo. Spaghettification of dashboards. Why Uber needs 9 billion metrics (¯\_(ツ)_/¯) and why this is antipattern. Metrics as a stream of data. We talk about new Flux language from InfluxDb. A bit of time series analysis and defining of pipelines in Flux for metrics data. Drunkyard walk on your metrics or why to measure a randomness.
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...InfluxData
Aleksandr Tavgen from Playtech, the world’s largest online gambling software supplier, will share how they are using InfluxDB 2.0, Flux, and the OpenTracingAPI to gain full observability of their platform. In addition, he will share how InfluxDB has served as the glue to cope with multiple sets of time series data, especially in the case of understanding online user activity — a use case that is normally difficult without the math functions now available with Flux.
Time Series Anomaly Detection with .net and AzureMarco Parenzan
If you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
Slildes from the Webinar "Five Ways to Leverage AI and Tableau". Full webinar recording: https://starschema.com/kb/five-ways-to-leverage-ai-and-tableau
Sources & Workbooks: https://github.com/starschema/tableau-ai-use-cases
Seeing RED: Monitoring and Observability in the Age of MicroservicesDave McAllister
Applications are changing. Clouds, Containers, Kubernetes all conspire to make life tougher. Modern Monitoring makes use of practices like USE and RED to tame those beasts. Find out what that means.
How Machines Help Humans Root Case Issues @ NetflixC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2wGIFlU.
Seth Katz discusses ways to build tools designed to enhance the cognitive ability of humans through automated analysis to speed root cause detection in distributed systems. He focuses on examples from large scale systems at Netflix, on the systems directly involved in browsing and playing Netflix movies, and how pairing automation with human feedback reduces time to detect and resolve issues. Filmed at qconnewyork.com.
Seth Katz is Senior Software Engineer, Operational Insights at Netflix. He has been responsible for building the insights tooling around Netflix servers for the past 5 years. He pioneered Netflix's streaming visualization data platforms, the contextual system tracing tools and analytics, and the anomaly systems for detecting and troubleshooting problems on Netflix’s most mission critical servers.
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypseTomasz Kowalczewski
Computation is increasingly constrained by power. With each advancement in the manufacturing process, a decreasing percentage of the CPU can operate at full capacity, leading to the emergence of the term 'dark silicon'. This trend necessitates techniques that utilize chip area to optimize power efficiency through specialized accelerators.
The presentation will outline key concepts that led to the dark silicon such as Moore’s law and breakdown of Dennard scaling, followed by an overview of current and upcoming CPU accelerators. The focus will then shift to vector units and the specifics of vector programming. Attendees will be introduced to registers, a range of vector operations, and methods to develop branchless algorithms such as sorting networks. The session will conclude with an overview of the new Java Vector API and how it was already picked up by projects to do AI inference (Llama 2) and vector search (AstraDB and Cassandra).
How I learned to stop worrying and love the dark silicon apocalypse.pdfTomasz Kowalczewski
Computation is increasingly constrained by power. With each advancement in the manufacturing process, a decreasing percentage of the CPU can operate at full capacity, leading to the emergence of the term 'dark silicon'. This trend necessitates techniques that utilize chip area to optimize power efficiency through specialized accelerators.
The presentation will outline key concepts that led to the dark silicon such as Moore’s law and breakdown of Dennard scaling, followed by an overview of current and upcoming CPU accelerators. The focus will then shift to vector units and the specifics of vector programming. Attendees will be introduced to registers, a range of vector operations, and methods to develop branchless algorithms such as sorting networks. The session will conclude with an overview of the new Java Vector API and how it was already picked up by projects to do AI inference (Llama 2) and vector search (AstraDB and Cassandra).
Companies want to validate products early, with little time for good engineering and performance work. Yet good code can provide 10-100x speed up which brings tremendous value to clients. We get help from modern hardware and algorithms but we need to know what are its strengths and limitations so we can consciously decide when to invest in engineering and what added value to expect.
While compute becomes faster and cheaper we are tempted to abandon sanity and shield ourselves from reality and laws of physics. The resulting mess of monstrous Slack instances rampaging across our RAM should makes us stop (because our computers did it already) and wonder where did we go wrong? Rising developer salaries and time to market pace are tempting us to abandon all hope for optimising our code and understanding our systems.
Contrary to what casual reader could think this is a deeply technical presentation. We will gaze into hardware counters, NUMA nodes, vector registers and that darkness will stare back at us.
All this to get a taste of what is possible on current hardware, to learn the COST of scalability and forever change how you feel when accessing invoice list in your local utilities provider UI so that after 20s of waiting all 12 elements will be displayed (surely Cthulhu must be eating their compute because it is NOT possible Tauron hosts it’s billing services on FIRST GEN IPHONE).
While compute becomes faster and cheaper we are tempted to abandon sanity and shield ourselves from reality and laws of physics. The resulting mess of monstrous Slack instances rampaging across our RAM should makes us stop (because our computers did it already) and wonder where did we go wrong? Rising developer salaries and time to market pace are tempting us abandon all hope for optimising our code and understanding our systems.
Contrary to what casual reader could think this is a deeply technical presentation. We will gaze into hardware counters, NUMA nodes, vector registers and that darkness will stare back at us.
All this to get a taste of what is possible on current hardware, to learn the COST of scalability and forever change how you feel when accessing invoice list in your local utilities provider UI so that after 20s of waiting all 12 elements will be displayed (surely Cthulhu must be eating their compute because it is NOT possible Tauron hosts it's billing services on FIRST GEN IPHONE).
You probably know the mantra that allocation is cheap. It usually is true, but devil is in the details. In your use case object allocation may impact processor caches evicting important data; burn CPU on executing constructor code; impact rates of object promotion to old generation and most importantly increase frequency of stop the word young gen pauses.
This presentation is for you if you are working on a Java based services that need to handle more and more traffic. As number of transactions per second rises you might hit performance wall that are young generation gc stopping whole application for precious milliseconds.
This presentation focuses on optimising object creation rate when dealing with seemingly mundane tasks. I will show few examples of surprising places in JDK and other libraries where garbage is created. I will explain how New Gen GC collection works and what costs are related to it. We will se escape analysis in action. Finally we will conclude that controlling allocation is the concern of library writers so that we can easily implement performant code without doing premature optimisations.
Presentation from JDD 2014 conference about reactive extensions for Java. Github repo with examples: https://github.com/tkowalcz/presentations/tree/master/JDD2014
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
1. E V E R Y B O D Y L I E S
T O M A S Z K O WA L C Z E W S K I
2. C A R G O C U LT
During the Middle Ages there were all kinds of
crazy ideas, such as that a piece of of
rhinoceros horn would increase potency. Then
a method was discovered for separating the
ideas- which was to try one to see if it worked,
and if it didn't work, to eliminate it. This
method became organized, of course, into
science. And it developed very well, so that we
are now in the scientific age. It is such a
scientific age, in fact, that we have difficulty in
understanding how witch doctors could ever
have existed, when nothing that they proposed
ever really worked-or very little of it did.
Richard Feynman
From a Caltech commencement address
given in 1974
3. W H Y B O T H E R ?
• You get what you measure
- Ineffective optimisations that complicate code
+ Numbers to convince management to do
refactoring or migration to Java 8!
4. W H Y B O T H E R ?
• Predictable is better than fast
• One page display requires multiple calls (static and
dynamic resources)
• Multiple microservices are called to generate response
• During a session user may do hundreds of displays of
your webpages
5. W H Y D O T H I S ?
• Every 100 ms increase in load time of Amazon.com
decreased sales by 1%1
• Increasing web search latency 100 to 400 ms reduces
the daily searches per user by 0.2% to 0.6%.
Furthermore, users do fewer searches the longer they
are exposed. For longer delays, the loss of searches
persists for a time even after latency returns to
previous levels.2
1Kohavi and Longbotham 2007
2Brutlag 2009
8. S U R V E Y
• Use graphite?
• Feed it with Coda Hale/Dropwizard metrics?
9. S U R V E Y
• Use graphite?
• Feed it with Coda Hale/Dropwizard metrics?
• Modify their source? Use nonstandard options?
10. S U R V E Y
• Use graphite?
• Feed it with Coda Hale/Dropwizard metrics?
• Modify their source? Use nonstandard options?
• Graph average? Median?
11. S U R V E Y
• Use graphite?
• Feed it with Coda Hale/Dropwizard metrics?
• Modify their source? Use nonstandard options?
• Graph average? Median?
• Percentiles?
13. W H AT M E T R I C S C A N W E U S E ?
graphite.send(prefix(name, "max"), ...);
graphite.send(prefix(name, "mean"), ...);
graphite.send(prefix(name, "min"), ...);
graphite.send(prefix(name, "stddev"), ...);
graphite.send(prefix(name, "p50"), ...);
graphite.send(prefix(name, "p75"), ...);
graphite.send(prefix(name, "p95"), ...);
graphite.send(prefix(name, "p98"), ...);
graphite.send(prefix(name, "p99"), ...);
graphite.send(prefix(name, “p999"), ...);
14. D O N ’ T L O O K AT M E A N
• 1000 queries - 0ms latency, 100 queries 5s latency
• Average is 4,5ms
• 1000 queries - 1ms latency, 100 queries - 5s latency
• Average is 455ms
• Does not help to quantify lags users will experience
15. – A N S C O M B E ' S Q U A R T E T B Y F R A N C I S A N S C O M B E
These four data sets all have the same mean,
median, and variance
16. P L O T T I N G M E A N I S F O R
S H O W I N G O F F T O M A N A G E M E N T
17. M AY B E M E D I A N T H E N ?
• What is the probability of end user encountering
latency worse than median?
• Remember: usually multiple requests are needed to
respond to API call (e.g. N micro services, N
resource requests per page)
✓
1
2
◆N
· 100
18. P R O B A B I L I T Y O F E X P E R I E N C I N G
L AT E N C Y B E T T E R T H A N M E D I A N
I N F U N C T I O N O F M I C R O S E R V I C E S I N V O LV E D
0
1
2
3
4
5
6
7
8
9
10
10
20
30
40
50
60
70
80
90
100
19. W H I C H P E R C E N T I L E I S R E L E VA N T T O
Y O U ?
• Is 99th percentile demanding constraint?
• In application serving 1000 qps latency worse than that happens
ten times per second.
• User that needs to navigate through several web pages will most
probably experience it
• What is the probability of encountering latency better than 99th?
✓
99
100
◆N
· 100
20. P R O B A B I L I T Y O F E X P E R I E N C I N G
L AT E N C Y B E T T E R T H A N 9 9
T H
P E R C E N T I L E
I N F U N C T I O N O F M I C R O S E R V I C E S I N V O LV E D
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
21. D O N O T AV E R A G E P E R C E N T I L E S
Example scenario:
1. Load balancer splits traffic unevenly (ELB anyone?)
2. Server S1 has 1 qps over measured time with 95%’ile == 1ms
3. Server S2 has 100 qps over measured time with 95%’ile == 10s
4. Average is ~5s.
5. What does that tell us?
6. Did we satisfy SLA if it says “95%’ile must be below 8s”?
7. Actual 95%’ile percentile is ~10s
22. – A L I C E ' S A D V E N T U R E S I N W O N D E R L A N D
“If there's no meaning in it,' said the King, 'that
saves a world of trouble, you know, as we
needn't try to find any”
23. Every time you average max values someone in the
world starts new JavaScript framework
25. m e t r i c R e g i s t r y . t i m e r ( " 2 0 1 5 . s t a n d a r d T i m e r " ) ;
Standard timer will over or under report actual
percentiles at will.
Green line represents actual MAX values.
26. m e t r i c R e g i s t r y . t i m e r ( " 2 0 1 5 . s t a n d a r d T i m e r " ) ;
Standard timer will over or under report actual
percentiles at will.
Green line represents actual MAX values.
27. T I M E R ’ S H I S T O G R A M R E S E R V O I R
• Backing storage for Timer’s data
• Contain “statistically representative reservoir of a data stream”
• Default is ExponentiallyDecayingReservoir which has many
drawbacks and is source of most inaccuracies observed
throughout this presentation
• Others include
• UniformReservoir, SlidingTimeWindowReservoir,
SlidingTimeWindowReservoir, SlidingWindowReservoir
28. E X P O N E N T I A L LY D E C AY I N G R E S E R V O I R
• Stores 1028 random samples by default
• Assumes normal distribution of recorded values
• Many statistical tools applied in computer systems
monitoring will assume normal distribution
• Be suspicious of such tools
• Why is that a bad idea?
29. -2,4 -2 -1,6 -1,2 -0,8 -0,4 0 0,4 0,8
0,5
1
1,5
2
2,5
3
N O R M A L
D I S T R I B U T I O N -
W H Y S O U S E F U L ?
• Central limit theorem
• Chebyshev's inequality
f (x, µ, ) =
1
p
2⇡
e
(x µ)2
2 2
30. 10 10,5 11 11,5 12
-0,25
0,25
0,5
0,75
1
C A L C U L AT E 9 5 % ’ I L E
B A S E D O N M E A N
A N D S T D . D E V.
• IFF latency values were
distributed normally then
we could calculate any
percentile based on mean
and standard deviation
µ = 10ms = 1ms
• Lookup into standard
normal (Z) table
• 95%’ile is located 1.65 std.
dev. from mean
• Result is 11,65ms
35. Add spikes due to: lost tcp packet retransmission,
disk swapping, kernel bookkeeping etc.
36. -2,4 -2 -1,6 -1,2 -0,8 -0,4 0 0,4 0,8
0,5
1
1,5
2
2,5
3
N O R M A L
D I S T R I B U T I O N - W H Y
N O T A P P L I C A B L E ?
• The value of the normal
distribution is practically zero when
the value x lies more than a few
standard deviations away from the
mean.
• It may not be an appropriate
model when one expects a
significant fraction of outliers
• […] other statistical inference
methods that are optimal for
normally distributed variables often
become highly unreliable when
applied to such data.
1
f (x, µ, ) =
1
p
2⇡
e
(x µ)2
2 2
1
All quotes on this slide from Wikipedia
37. Blue line represents metric reported from Timer class
Green line represents request rate
38. T I M E R , T I M E R
N E V E R C H A N G E S …
• Timer values decay exponentially
• giving artificial smoothing of values
for server behaviour that may be
long gone
• Timer that is not updated does not
decay
• If Timer is not updated (e.g.
subprocess failed and we stopped
sending requests to it) its values
will remain constant
• Check this post for potential solutions:
taint.org/2014/01/16/145944a.html
39. H D R H I S T O G R A M
• Supports recording and analysis of sampled data across
configurable range with configurable accuracy
• Provides compact representation of data while retaining
high resolution
• Allows configurable tradeoffs between space and accuracy
• Very fast, allocation free, not thread safe for maximum
speed (thread safe versions available)
• Created by Gil Tene of Azul Sytems
40. R E C O R D E R
• Uses HdrHistogram to store values
• Supports concurrent recording of values
• Recording is lock free but also wait free on most
architectures (that support lock xadd)
• Reading is not lock free but does not stall writers (writer-
reader phaser)
• Checkout Marshall Pierce’s library for using it as a
Reservoir implementation
41. S O L U T I O N S
• Always instantiate Timer with custom reservoir
• new ExponentiallyDecayingReservoir(LARGE_NUMBER)
• new SlidingTimeWindowReservoir(1, MINUTES)
• new HdrHistogramResetOnSnapshotReservoir()
• Only last one is safe and accurate and will not report stale values
if no updates were made
43. S M O K I N G B E N C H M A R K I N G I S T H E
L E A D I N G C A U S E O F S TAT I S T I C S I N
T H E W O R L D
44. C O O R D I N AT E D O M I S S I O N
• As formulated by Gil Tene of Azul Systems
• When load driver is plotting with system under test to
deceive you
• Most tools do this
• Most benchmarks do this
• Yahoo Cloud Serving Benchmark had that problem1
1Recently fixed by Nitsan Wakart, see
psy-lob-saw.blogspot.com/2015/03/fixing-ycsb-coordinated-omission.html
45. -0,8 0 0,8 1,6 2,4 3,2 4 4,8 5,6 6,4
-0,8
0,8
1,6
2,4
3,2
4
request arrival time
Application pause time
Requests according to test
plan. Only red one will be
send. Others will be missing
from test.
latency
46. – C R E AT E D W I T H G I L T E N E ' S H D R H I S T O G R A M
P L O T T I N G S C R I P T
Effects on benchmarks at high percentiles are
spectacular
47. C O O R D I N AT E D O M I S S I O N S O L U T I O N S
1. Ignore the problem!
perfectly fine for non interactive system where
only throughput matters
48. C O O R D I N AT E D O M I S S I O N S O L U T I O N S
2. Correct it mathematically in sampling mechanism
HdrHistogram can correct CO with these methods
(choose one!):
histogram.recordValueWithExpectedInterval(
value,
expectedIntervalBetweenSamples
);
histogram.copyCorrectedForCoordinatedOmission(
expectedIntervalBetweenSamples
);
49. C O O R D I N AT E D O M I S S I O N S O L U T I O N S
3. Correct it on load driver side
by noticing pauses between sent requests.
newly issued request will have timer that starts
counting from time it should have been sent but
wasn't
50. C O O R D I N AT E D
O M I S S I O N
S O L U T I O N S
4. Fail the test
for hard real time
systems where pause
causes human
casualties (breaks,
pacemakers, Phalanx
system)
51. C O O R D I N AT E D O M I S S I O N
• Mathematical solutions can overcorrect when load driver
has pauses (e.g. GC).
• Do not account for the fact that server after pause has
no work to do instead of N more requests waiting to be
executed
• In real world it might have never recovered
• Most tools ignore the problem
• Notable exception: Twitter Iago
52. – L O A D D R I V E R M O T T O
“Do not bend to the tyranny of reality”
53. S U M M A RY
• Measure what is meaningful not just what is measurable
• Set SLA before testing and creating dashboards
• Do not trust Timer class, use custom reservoirs, HdrHistogram,
Recorder, never trust EMWA for request rate
• Do not average percentiles unless you need a random number
generator
• Do not plot averages unless you just want to look good on
dashboards
• When load testing be aware of coordinated omission
54. S O U R C E S , T H A N K Y O U S A N D
R E C O M M E N D E D F O L L O W U P S
• Coda Hale for great metrics library
• Gil Tene
• latencytipoftheday.blogspot.de
• www.infoq.com/presentations/latency-pitfalls
• github.com/HdrHistogram/HdrHistogram
• Nitsan Wakart
• psy-lob-saw.blogspot.de/2015/03/fixing-ycsb-coordinated-omission.html
• and whole blog
• Matin Thompson et. al.
• groups.google.com/forum/#!forum/mechanical-sympathy
55. R E C O M M E N D E D
Great introduction to
statistics and queueing
theory.
Performance Modeling and
Design of Computer
Systems: Queueing Theory in
Action
Prof. Mor Harchol-Balter
56. F E E D B A C K K I N D LY R E Q U E S T E D
https://www.surveymonkey.com/s/B5KGWWN