Parallel processing involves executing multiple tasks simultaneously using multiple cores or processors. It can provide performance benefits over serial processing by reducing execution time. When developing parallel applications, developers must identify independent tasks that can be executed concurrently and avoid issues like race conditions and deadlocks. Effective parallelization requires analyzing serial code to find optimization opportunities, designing and implementing concurrent tasks, and testing and tuning to maximize performance gains.
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Tuning the Memory and Optimizer Parameters - This topic has come quite often in requests and suggestions from our regular viewers and followers of our Postgres Hangouts.
So we picked it up as the topic for this month's Postgres Hangout. This time we will try to explore and share on Memory and Memory .
Join us as we discuss what impacts your decision before you put a number against each of these important parameters!
Vladimir Rodionov (Hortonworks)
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Tuning the Memory and Optimizer Parameters - This topic has come quite often in requests and suggestions from our regular viewers and followers of our Postgres Hangouts.
So we picked it up as the topic for this month's Postgres Hangout. This time we will try to explore and share on Memory and Memory .
Join us as we discuss what impacts your decision before you put a number against each of these important parameters!
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
Speakers: Kevin O'Dell, Aleksandr Shulman & Kathleen Ting (Cloudera)
From supporting the 0.90.x, 0.92, 0.94, and 0.96 HBase installations on clusters ranging from tens to hundreds of nodes, Cloudera has seen it all. Having automated the upgrade paths from the different Apache releases, we have developed a smooth path that can help the community with upcoming upgrades. In addition to automation best practices, in this talk you'll also learn proactive configuration tweaks and operational best practices to keep your HBase cluster always up and running. We'll also walk through how to contain an application bug let loose in production, to minimize the impact on HBase posed by faulty hardware, and the direct correlation between inefficient schema design and HBase performance.
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
Swarnim Kulkarni (Cerner)
Cerner has been an active consumer of HBase for a very long time, storing petabytes of healthcare data in its multiple isolated HBase clusters. This talk will walk through the design of Cerner's enterprise data hub with a focus on the multi-tenant HBase as a service offering within the hub.
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
Eshcar Hillel and Anastasia Braginsky (Yahoo!)
Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.
This presentation surveys different ways one can geographically distribute PostgreSQL, including master-slave and multi-master solutions. It discusses pitfalls and emphasizes understanding requirements. The presentation covers some of the existing tools that are available in the community. It also touches upon upcoming PostgreSQL solutions.
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
This technical presentation by EDB Dave Thomas, Systems Engineer provides an overview of:
1) BGWriter/Writer Process
2) Wall Writer Process
3) Stats Collector Process
4) Autovacuum Launch Process
5) Syslogger Process/Logger process
6) Archiver Process
7) WAL Send/Receive Processes
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
Apache HDFS, the file system on which HBase is most commonly deployed, was originally designed for high-latency high-throughput batch analytic systems like MapReduce. Over the past two to three years, the rising popularity of HBase has driven many enhancements in HDFS to improve its suitability for real-time systems, including durability support for write-ahead logs, high availability, and improved low-latency performance. This talk will give a brief history of some of the enhancements from Hadoop 0.20.2 through 0.23.0, discuss some of the most exciting work currently under way, and explore some of the future enhancements we expect to develop in the coming years. We will include both high-level overviews of the new features as well as practical tips and benchmark results from real deployments.
Speaker: Varun Sharma (Pinterest)
Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
The Google Chubby lock service presented in 2006 is the inspiration for Apache ZooKeeper: let's take a deep dive into Chubby to better understand ZooKeeper and distributed consensus.
Speakers: Kevin O'Dell, Aleksandr Shulman & Kathleen Ting (Cloudera)
From supporting the 0.90.x, 0.92, 0.94, and 0.96 HBase installations on clusters ranging from tens to hundreds of nodes, Cloudera has seen it all. Having automated the upgrade paths from the different Apache releases, we have developed a smooth path that can help the community with upcoming upgrades. In addition to automation best practices, in this talk you'll also learn proactive configuration tweaks and operational best practices to keep your HBase cluster always up and running. We'll also walk through how to contain an application bug let loose in production, to minimize the impact on HBase posed by faulty hardware, and the direct correlation between inefficient schema design and HBase performance.
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
Swarnim Kulkarni (Cerner)
Cerner has been an active consumer of HBase for a very long time, storing petabytes of healthcare data in its multiple isolated HBase clusters. This talk will walk through the design of Cerner's enterprise data hub with a focus on the multi-tenant HBase as a service offering within the hub.
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
Eshcar Hillel and Anastasia Braginsky (Yahoo!)
Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.
HBaseCon 2015: HBase at Scale in an Online and High-Demand EnvironmentHBaseCon
Pinterest runs 38 different HBase clusters in production, doing a lot of different types of work—with some doing up to 5 million operations per second. In this talk, you'll get details about how we do capacity planning, maintenance tasks such as online automated rolling compaction, configuration management, and monitoring.
This presentation surveys different ways one can geographically distribute PostgreSQL, including master-slave and multi-master solutions. It discusses pitfalls and emphasizes understanding requirements. The presentation covers some of the existing tools that are available in the community. It also touches upon upcoming PostgreSQL solutions.
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
This technical presentation by EDB Dave Thomas, Systems Engineer provides an overview of:
1) BGWriter/Writer Process
2) Wall Writer Process
3) Stats Collector Process
4) Autovacuum Launch Process
5) Syslogger Process/Logger process
6) Archiver Process
7) WAL Send/Receive Processes
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.
Apache HDFS, the file system on which HBase is most commonly deployed, was originally designed for high-latency high-throughput batch analytic systems like MapReduce. Over the past two to three years, the rising popularity of HBase has driven many enhancements in HDFS to improve its suitability for real-time systems, including durability support for write-ahead logs, high availability, and improved low-latency performance. This talk will give a brief history of some of the enhancements from Hadoop 0.20.2 through 0.23.0, discuss some of the most exciting work currently under way, and explore some of the future enhancements we expect to develop in the coming years. We will include both high-level overviews of the new features as well as practical tips and benchmark results from real deployments.
Speaker: Varun Sharma (Pinterest)
Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
Apache Apex is a next gen big data analytics platform. Originally developed at DataTorrent it comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn about the Apex architecture, including its unique features for scalability, fault tolerance and processing guarantees, programming model and use cases.
http://apachebigdata2016.sched.org/event/6M0L/next-gen-big-data-analytics-with-apache-apex-thomas-weise-datatorrent
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
Introduction to Apache Apex - The next generation native Hadoop platform. This talk will cover details about how Apache Apex can be used as a powerful and versatile platform for big data processing. Common usage of Apache Apex includes big data ingestion, streaming analytics, ETL, fast batch alerts, real-time actions, threat detection, etc.
Bio:
Pramod Immaneni is Apache Apex PMC member and senior architect at DataTorrent, where he works on Apache Apex and specializes in big data platform and applications. Prior to DataTorrent, he was a co-founder and CTO of Leaf Networks LLC, eventually acquired by Netgear Inc, where he built products in core networking space and was granted patents in peer-to-peer VPNs.
Xin Wang(Apache Storm Committer/PMC member)'s topic covered the relations between streaming and messaging platform, and the challenges and tips in Storm usage.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
Predicting Optimal Parallelism for Data AnalyticsDatabricks
A key benefit of serverless computing is that resources can be allocated on demand, but the quantity of resources to request, and allocate, for a job can profoundly impact its running time and cost. For a job that has not yet run, how can we provide users with an estimate of how the job’s performance changes with provisioned resources, so that users can make an informed choice upfront about cost-performance tradeoffs?
This talk will describe several related research efforts at Microsoft to address this question. We focus on optimizing the amount of computational resources that control a data analytics query’s achieved intra-parallelism. These use machine learning models on query characteristics to predict the run time or Performance Characteristic Curve (PCC) as a function of the maximum parallelism that the query will be allowed to exploit.
The AutoToken project uses models to predict the peak number of tokens (resource units) that is determined by the maximum parallelism that the recurring SCOPE job can ever exploit while running in Cosmos, an Exascale Big Data analytics platform at Microsoft. AutoToken_vNext, or TASQ, predicts the PCC as a function of the number of allocated tokens (limited parallelism). The AutoExecutor project uses models to predict the PCC for Apache Spark SQL queries as a function of the number of executors. The AutoDOP project uses models to predict the run time for SQL Server analytics queries, running on a single machine, as a function of their maximum allowed Degree Of Parallelism (DOP).
We will present our approaches and prediction results for these scenarios, discuss some common challenges that we handled, and outline some open research questions in this space.
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
This is an overview of architecture with use cases for Apache Apex, a big data analytics platform. It comes with a powerful stream processing engine, rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on YARN and HDFS and is used in production in various industries. You will learn more about two use cases: A leading Ad Tech company serves billions of advertising impressions and collects terabytes of data from several data centers across the world every day. Apex was used to implement rapid actionable insights, for real-time reporting and allocation, utilizing Kafka and files as source, dimensional computation and low latency visualization. A customer in the IoT space uses Apex for Time Series service, including efficient storage of time series data, data indexing for quick retrieval and queries at high scale and precision. The platform leverages the high availability, horizontal scalability and operability of Apex.
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
This presentation will introduce usage of Apache Apex for Time Series & Data Ingestion Service by General Electric Internet of things Predix platform. Apache Apex is a native Hadoop data in motion platform that is being used by customers for both streaming as well as batch processing. Common use cases include ingestion into Hadoop, streaming analytics, ETL, database off-loads, alerts and monitoring, machine model scoring, etc.
Abstract: Predix is an General Electric platform for Internet of Things. It helps users develop applications that connect industrial machines with people through data and analytics for better business outcomes. Predix offers a catalog of services that provide core capabilities required by industrial internet applications. We will deep dive into Predix Time Series and Data Ingestion services leveraging fast, scalable, highly performant, and fault tolerant capabilities of Apache Apex.
Speakers:
- Venkatesh Sivasubramanian, Sr Staff Software Engineer, GE Predix & Committer of Apache Apex
- Pramod Immaneni, PPMC member of Apache Apex, and DataTorrent Architect
Using Groovy? Got lots of stuff to do at the same time? Then you need to take a look at GPars (“Jeepers!”), a library providing support for concurrency and parallelism in Groovy. GPars brings powerful concurrency models from other languages to Groovy and makes them easy to use with custom DSLs:
- Actors (Erlang and Scala)
- Dataflow (Io)
- Fork/join (Java)
- Agent (Clojure agents)
In addition to this support, GPars integrates with standard Groovy frameworks like Grails and Griffon.
Background, comparisons to other languages, and motivating examples will be given for the major GPars features.
The JVM memory model describes how threads in the Java eco-system interact through memory. While the memory model impact on developing for the JVM may not be obvious, it is the cause for certain number of "anomalies" that are, well, by design.
In this presentation we will explore the aspects of the memory model, including things like reordering of instructions, volatile members, monitors, atomics and JIT.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. Parallel Processing - What is it?
• Concurrent System
– Two or more actions progressing in parallel
– This can be on a single core
• Parallel System
– Two or more actions executing in parallel
– This requires multiple cores
– Parallel systems are a subset of concurrent systems
• Distributed System
– Two or more actions executing in parallel
– This requires multiple connected machines
– Communication is primarily via messaging
3. Why application developers need to
know?
• Multi-core processors
• Large volume of data
• Analytics and other new fields
• Cheaper hardware
4. OS Resource Challenges
Memory
(ROM/
RAM)
DB NW NW
Web
CPU
Registers
Cache
Service
Queue
Distribu
ted
Mem
5. Representation of an Application
Source Data Transform Data Sink
- UI
- DB
- Queue
- Network
- Service
- UI
- DB
- Queue
- Network
- Service
6. Abstraction of Concurrancy
• Programs are execution of atomic
statements
• Concurrent programs are the interleavings
of atomic statements
• All possible interleavings should produce
the same results
• No process should be excluded from any
arbitrary interleaving
8. To go Parallel?
• SpeedUp
– Amdahl’s Law
• Speedup <= 1/((1 – PCTpar) + (PCTpar/P))
– PCTpar -> percentage of time in parallel
– P -> Number of cores
• 75% PCTpar provides 3x speedup on a 8 core
– Gustafson-Barsis’s Law
• Speedup <= P + (1 – P) S
– P -> Number of cores
– S -> Percentage of time spend in serial code
• Efficiency
– Speedup/Cores -> resource utilization as %age
9. Things to Consider
• Identify independent computations
• Implement concurrancy at the highest level
• Make no assumptions on the cores
• Use the best processing model
• Never assume a particular execution order
• User thread local storage or associate locks
to specific data
• Make no assumptions on the order of
execution
10. Methedology
• Start with a tuned and functional serial code
• Analysis: identify possible concurrancy
– Identify hotspots probably using profilers
• Calculate speedup and efficiency
• Design and implementation
• Test for correctness
– Loop executions, rounding errors
• Tune for performance
– Enough work load to compensate for overheads
• Measure Speedup as a multiplier (e.g. 2x faster)
– Serial code elapsed time/parallel code elapsed time
12. Highlights of Process Development
• Single threaded functional code (Java)
• Threads spawned off for each message
– Data decomposition
– Task decomposition
– Concurrency at the highest level
• Monitor (synchronize) & conditional variable
• Code to ignore deadlocks
• Tested and tuned to remove bottlenecks
• Scalable for any number of cores
13. Monitor
public class Monitor {
private static int count = 0;
private static int max = 0;
public boolean isSemaphoreUsed()
{
return (count < max);
}
public Semaphore(int num) {
count = num;
max = num;
}
private synchronized void
incCount() {
count++;
}
private synchronized void
decCount() {
count--;
}
public synchronized void acquire()
{
try {
while (count == 0) {
}
decCount();
}
public synchronized void release()
{
incCount();
}
}
Never comes
out
14. Monitor with Conditional Variables
public class Monitor {
private static int count = 0;
private static int max = 0;
public boolean isSemaphoreUsed()
{
return (count < max);
}
public Semaphore(int num) {
count = num;
max = num;
}
private synchronized void
incCount() {
count++;
}
private synchronized void
decCount() {
count--;
}
public synchronized void acquire()
{
try {
while (count == 0) {
this.wait();
}
decCount();
} catch (InterruptedException e)
{}
}
public synchronized void release()
{
incCount();
this.notify();
}
}
15. General Design Guidelines
• Keep it simple stupid (KISS)
– Break the problem into simple components
• Keep the number of layers to minimum
– Don’t add layers since they are hot in industry
• Do the right thing
– Skeptics will provide scenarios to take care