This document describes HaLoop, a system that extends MapReduce to efficiently support iterative data processing on large clusters. HaLoop introduces caching mechanisms that allow loop-invariant data to be accessed without reloading or reshuffling between iterations. This improves performance for iterative algorithms like PageRank, transitive closure, and k-means clustering. The largest gains come from caching invariant data in the reducer input cache to avoid unnecessary loading and shuffling. HaLoop also eliminates extra MapReduce jobs for termination checking in some cases. Overall, HaLoop shows that minimal extensions to MapReduce can efficiently support a wide range of recursive programs and languages on large-scale clusters.
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...AMD Developer Central
Presentation WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by Leo Meyerovich and Matthew Torok at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...AMD Developer Central
Presentation WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by Leo Meyerovich and Matthew Torok at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
A general introduction to GPGPU and an application involving solving large preconditioning problems with Domain Decomposition. Code is available at http://sourceforge.net/projects/cudasolver/ .
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
This talk was presented at the DOE Centers of Excellence Performance Portability Workshop in August 2017. In this talk I explore the current status of 4 OpenMP 4.5 compilers for NVIDIA GPUs and CPUs from the perspective of performance portability between compilers and between the GPU and CPU.
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchYuichiro Yasui
The 2015 International Conference on High Performance Computing & Simulation (HPCS2015)
Session 9A: July 22, 14:45 − 16:00
July 20 – 24, 2015, Amsterdam, the Netherlands
This is a survey on HPCS languages, i.e. Chapel, X10, and Fortress comparing their idioms that support parallel programming. Paper on this is available at http://grids.ucs.indiana.edu/ptliupages/publications/Survey_on_HPCS_Languages_formatted_v2.pdf
This is a sample of a manual I developed while at Wideband for a software math and science digital signal processing library for the Analog Devices ADSP-21K. It contains the detailed descriptions of the routines and shows the programmers had a complete and useful solution.
Foundations of streaming SQL: stream & table theoryDataWorks Summit
What does it mean to execute streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing? And how can all of this work in a programmatic framework like Apache Beam? The presentation answers these questions and more as it walks you through key concepts underpinning data processing in general.
Presentation explores the relationship between the Beam model (as described in paper “The Dataflow Mode”and the “Streaming 101”and “Streaming 102” blog posts) and stream and table theory (as popularized by Martin Kleppmann and Jay Kreps, among others).
It turns out that stream and table theory does an illuminating job of describing the low-level concepts that underlie the Beam model.
The presentation explains what is required to provide robust stream processing support in SQL and discusses the concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, as well as new ideas yet to come. You’ll leave with a much better understanding of the key concepts underpinning data processing—regardless of whether that data processing is batch or streaming or SQL or programmatic—as well as a concrete notion of what robust stream processing in SQL looks like.
Speaker
Anton Kedin, Google, Software Engineer
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2rtxaMm.
Tyler Akidau explores the relationship between the Beam Model and stream & table theory. He explains what is required to provide robust stream processing support in SQL and discusses concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, compare to other offerings such as Apache Kafka’s KSQL and Apache Spark’s Structured streaming. Filmed at qconlondon.com.
Tyler Akidau is a senior staff software engineer at Google, where he is the technical lead for the Data Processing Languages & Systems group, responsible for Google's Apache Beam efforts, Google Cloud Dataflow, and internal data processing tools like Google Flume, MapReduce, and MillWheel. His also a founding member of the Apache Beam PMC.
Build Your Own 3D Scanner: 3D Scanning with Structured LightingDouglas Lanman
Build Your Own 3D Scanner:
3D Scanning with Structured Lighting
http://mesh.brown.edu/byo3d/
SIGGRAPH 2009 Courses
Douglas Lanman and Gabriel Taubin
This course provides a beginner with the necessary mathematics, software, and practical details to leverage projector-camera systems in their own 3D scanning projects. An example-driven approach is used throughout; each new concept is illustrated using a practical scanner implemented with off-the-shelf parts. The course concludes by detailing how these new approaches are used in rapid prototyping, entertainment, cultural heritage, and web-based applications.
Overview of why (and how) we have adopted a functional reactive approach to solve problems of scale @ AOL, in particular within AOL Advertising's forecasting platform.
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
Image compression technique is used in many applications for example, satellite imaging, medical imaging, video where the size of the iamge requires more space to store, in such application image compression effectively can be used. There are two types in image compression techniques Lossy and Lossless comression. Both these techniques are used for compression of images, but these techniques are not fast. The image compression techniques both lossy and lossless image compression techniques are not fast, they take more time for compression and decompression. For fast and efficient image compression a parallel computing technique is used in matlab. Matlab is used in this project for parallel computing of images. In this paper we will discuss Regular image compression technique, three alternatives of parallel computing using matlab, comparison of image compression with and without parallel computing.
A general introduction to GPGPU and an application involving solving large preconditioning problems with Domain Decomposition. Code is available at http://sourceforge.net/projects/cudasolver/ .
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
This talk was presented at the DOE Centers of Excellence Performance Portability Workshop in August 2017. In this talk I explore the current status of 4 OpenMP 4.5 compilers for NVIDIA GPUs and CPUs from the perspective of performance portability between compilers and between the GPU and CPU.
Fast and Scalable NUMA-based Thread Parallel Breadth-first SearchYuichiro Yasui
The 2015 International Conference on High Performance Computing & Simulation (HPCS2015)
Session 9A: July 22, 14:45 − 16:00
July 20 – 24, 2015, Amsterdam, the Netherlands
This is a survey on HPCS languages, i.e. Chapel, X10, and Fortress comparing their idioms that support parallel programming. Paper on this is available at http://grids.ucs.indiana.edu/ptliupages/publications/Survey_on_HPCS_Languages_formatted_v2.pdf
This is a sample of a manual I developed while at Wideband for a software math and science digital signal processing library for the Analog Devices ADSP-21K. It contains the detailed descriptions of the routines and shows the programmers had a complete and useful solution.
Foundations of streaming SQL: stream & table theoryDataWorks Summit
What does it mean to execute streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing? And how can all of this work in a programmatic framework like Apache Beam? The presentation answers these questions and more as it walks you through key concepts underpinning data processing in general.
Presentation explores the relationship between the Beam model (as described in paper “The Dataflow Mode”and the “Streaming 101”and “Streaming 102” blog posts) and stream and table theory (as popularized by Martin Kleppmann and Jay Kreps, among others).
It turns out that stream and table theory does an illuminating job of describing the low-level concepts that underlie the Beam model.
The presentation explains what is required to provide robust stream processing support in SQL and discusses the concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, as well as new ideas yet to come. You’ll leave with a much better understanding of the key concepts underpinning data processing—regardless of whether that data processing is batch or streaming or SQL or programmatic—as well as a concrete notion of what robust stream processing in SQL looks like.
Speaker
Anton Kedin, Google, Software Engineer
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2rtxaMm.
Tyler Akidau explores the relationship between the Beam Model and stream & table theory. He explains what is required to provide robust stream processing support in SQL and discusses concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, compare to other offerings such as Apache Kafka’s KSQL and Apache Spark’s Structured streaming. Filmed at qconlondon.com.
Tyler Akidau is a senior staff software engineer at Google, where he is the technical lead for the Data Processing Languages & Systems group, responsible for Google's Apache Beam efforts, Google Cloud Dataflow, and internal data processing tools like Google Flume, MapReduce, and MillWheel. His also a founding member of the Apache Beam PMC.
Build Your Own 3D Scanner: 3D Scanning with Structured LightingDouglas Lanman
Build Your Own 3D Scanner:
3D Scanning with Structured Lighting
http://mesh.brown.edu/byo3d/
SIGGRAPH 2009 Courses
Douglas Lanman and Gabriel Taubin
This course provides a beginner with the necessary mathematics, software, and practical details to leverage projector-camera systems in their own 3D scanning projects. An example-driven approach is used throughout; each new concept is illustrated using a practical scanner implemented with off-the-shelf parts. The course concludes by detailing how these new approaches are used in rapid prototyping, entertainment, cultural heritage, and web-based applications.
Overview of why (and how) we have adopted a functional reactive approach to solve problems of scale @ AOL, in particular within AOL Advertising's forecasting platform.
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
Image compression technique is used in many applications for example, satellite imaging, medical imaging, video where the size of the iamge requires more space to store, in such application image compression effectively can be used. There are two types in image compression techniques Lossy and Lossless comression. Both these techniques are used for compression of images, but these techniques are not fast. The image compression techniques both lossy and lossless image compression techniques are not fast, they take more time for compression and decompression. For fast and efficient image compression a parallel computing technique is used in matlab. Matlab is used in this project for parallel computing of images. In this paper we will discuss Regular image compression technique, three alternatives of parallel computing using matlab, comparison of image compression with and without parallel computing.
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
This talk explores deploying a series of small and large batch and streaming pipelines locally, to Spark and Flink clusters and to Google Cloud Dataflow services to give the audience a feel for the portability of Beam, a new portable Big Data processing framework recently submitted by Google to the Apache foundation. This talk will look at how the programming model handles late arriving data in a stream with event time, windows, and triggers.
Three types of service model: SAAS, PAAS, IAAS
Four types of deployment model: Public, Private, Hybrid And community Cloud.
During the load balancing process, few issues are yet to be fully addressed. Couple of them are:
Some of the nodes are overutilized or some of the nodes are underutilized
Improper workload in Cloud environment results into overhead in resource utilization and in turn inefficient usage of energy
response time of jobs
communication cost of servers
maintain cost of VMs,
throughput and overload of any single node.
By addressing the concern of load balancing, we aim to address multiple facets of Cloud viz. (a) resource utilization (b) CPU time (c) Migration time.
Problem statement
Problem raised while dealing with load balancing
How to minimize the CPU time
How to increase the resource utilization &
How to decrease the energy consumption and Migration time etc.
A Multicore Parallelization of Continuous Skyline Queries on Data StreamsTiziano De Matteis
Skyline queries are a relevant example of preference queries frequently used in multi-criteria decision making to retrieve interesting points from large datasets. They return the points whose attribute vector is not dominated by any other point. Due to their importance in real-time scenarios, skyline queries have been studied both in terms of sequential algorithms and parallel implementations for multiprocessors and clusters. Recently, with the advent of the Data Stream Processing paradigm, skyline queries have been computed over continuous data streams according to the sliding window model. Although sequential algorithms have been proposed for continuous skyline queries, few works targeting modern parallel architectures exist. This paper contributes to the current literature by proposing a parallel implementation on multicores. We provide a description of our parallelization by focusing on the cooperation pattern between parallel functionalities, optimizations related to the reduce phase, and load-balancing strategies. Finally, we show experiments using different point distributions, arrival rates and window lengths.
NOTE: transitions are missing in this version. Full version available at: https://docs.google.com/presentation/d/1JQVn9QnLC15e_MhmNOttP3mohray_sulAO532PJqOy4/edit?usp=sharing
These are slides from the Dec 17 SF Bay Area Julia Users meeting [1]. Ehsan Totoni presented the ParallelAccelerator Julia package, a compiler that performs aggressive analysis and optimization on top of the Julia compiler. Ehsan is a Research Scientist at Intel Labs working on the High Performance Scripting project.
[1] http://www.meetup.com/Bay-Area-Julia-Users/events/226531171/
We present a system to support generalized SQL workload analysis and management for multi-tenant and multi-database platforms. Workload analysis applications are becoming more sophisticated to support database administration, model user behavior, audit security, and route queries, but the methods rely on specialized feature engineering, and therefore must be carefully implemented and reimplemented for each SQL dialect, database system, and application. Meanwhile, the size and complexity of workloads are increasing as systems centralize in the cloud. We model workload analysis and management tasks as variations on query labeling, and propose a system design that can support general query labeling routines across multiple applications and database backends. The design relies on the use of learned vector embeddings for SQL queries as a replacement for application-specific syntactic features, reducing custom code and allowing the use of off-the-shelf machine learning algorithms for labeling. The key hypothesis, for which we provide evidence in this paper, is that these learned features can outperform conventional feature engineering on representative machine learning tasks. We present the design of a database-agnostic workload management and analytics service, describe potential applications, and show that separating workload representation from labeling tasks affords new capabilities and can outperform existing solutions for representative tasks, including workload sampling for index recommendation and user labeling for security audits.
Brief remarks on big data trends and responsible data science at the Workshop on Science and Technology for Washington State: Advising the Legislature, October 4th 2017 in Seattle.
Talk at ISIM 2017 in Durham, UK on applying database techniques to querying model results in the geosciences, with a broader position about the interaction between data science and simulation as modes of scientific inquiry.
Data science remains a high-touch activity, especially in life, physical, and social sciences. Data management and manipulation tasks consume too much bandwidth: Specialized tools and technologies are difficult to use together, issues of scale persist despite the Cambrian explosion of big data systems, and public data sources (including the scientific literature itself) suffer curation and quality problems.
Together, these problems motivate a research agenda around “human-data interaction:” understanding and optimizing how people use and share quantitative information.
I’ll describe some of our ongoing work in this area at the University of Washington eScience Institute.
In the context of the Myria project, we're building a big data "polystore" system that can hide the idiosyncrasies of specialized systems behind a common interface without sacrificing performance. In scientific data curation, we are automatically correcting metadata errors in public data repositories with cooperative machine learning approaches. In the Viziometrics project, we are mining patterns of visual information in the scientific literature using machine vision, machine learning, and graph analytics. In the VizDeck and Voyager projects, we are developing automatic visualization recommendation techniques. In graph analytics, we are working on parallelizing best-of-breed graph clustering algorithms to handle multi-billion-edge graphs.
The common thread in these projects is the goal of democratizing data science techniques, especially in the sciences.
Talk given at Los Alamos National Labs in Fall 2015.
As research becomes more data-intensive and platforms become more heterogeneous, we need to shift focus from performance to productivity.
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
A talk at the Urban Science workshop at the Puget Sound Regional Council July 20 2014 organized by the Northwest Institute for Advanced Computing, a joint effort between Pacific Northwest National Labs and the University of Washington.
A talk I gave at the MMDS workshop June 2014 on the Myria system as well as some of Seung-Hee Bae's work on scalable graph clustering.
https://mmds-data.org/
Talk delivered at High Performance Transaction Processing 2013
Myria is a new Big Data service being developed at the University of Washington. We feature high level language interfaces, a hybrid graph-relational data model, database-style algebraic optimization, a comprehensive REST API, an iterative programming model suitable for machine learning and graph analytics applications, and a tight connection to new theories of parallel computation.
In this talk, we describe the motivation for another big data platform emphasizing requirements emerging from the physical, life, and social sciences.
A 25 minute talk from a panel on big data curricula at JSM 2013
http://www.amstat.org/meetings/jsm/2013/onlineprogram/ActivityDetails.cfm?SessionID=208664
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
1. QuickTime™ and a
decompressor
are needed to see this picture.
HaLoop: Efficient Iterative Data
Processing On Large Scale Clusters
Yingyi Bu, UC Irvine
Bill Howe, UW
Magda Balazinska, UW
Michael Ernst, UW
http://clue.cs.washington.edu/
Award IIS 0844572
Cluster Exploratory (CluE)
QuickTime™ and a
decompressor
are needed to see this picture.
http://escience.washington.edu/
VLDB 2010, Singapore
Horizon
2. 01/30/15Bill Howe, UW 2QuickTime™ and a
decompressor
are needed to see this picture.
Thesis in one slide
Observation: MapReduce has proven successful as a
common runtime for non-recursive declarative languages
HIVE (SQL)
Pig (RA with nested types)
Observation: Many people roll their own loops
Graphs, clustering, mining, recursive queries
iteration managed by external script
Thesis: With minimal extensions, we can provide an efficient
common runtime for recursive languages
Map, Reduce, Fixpoint
3. 01/30/15Bill Howe, UW 3QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: Twister [Ekanayake HPDC 2010]
Redesigned evaluation engine using pub/sub
Termination condition evaluated by main()
13. while(!complete){
14. monitor = driver.runMapReduceBCast(cData);
15. monitor.monitorTillCompletion();
16. DoubleVectorData newCData = ((KMeansCombiner) driver
.getCurrentCombiner()).getResults();
17. totalError = getError(cData, newCData);
18. cData = newCData;
19. if (totalError < THRESHOLD) {
20. complete = true;
21. break;
22. }
23. }
O(k)
4. 01/30/15Bill Howe, UW 4QuickTime™ and a
decompressor
are needed to see this picture.
In Detail: PageRank (Twister)
while (!complete) {
// start the pagerank map reduce process
monitor = driver.runMapReduceBCast(new
BytesValue(tmpCompressedDvd.getBytes()));
monitor.monitorTillCompletion();
// get the result of process
newCompressedDvd = ((PageRankCombiner)
driver.getCurrentCombiner()).getResults();
// decompress the compressed pagerank values
newDvd = decompress(newCompressedDvd);
tmpDvd = decompress(tmpCompressedDvd);
totalError = getError(tmpDvd, newDvd);
// get the difference between new and old pagerank values
if (totalError < tolerance) {
complete = true;
}
tmpCompressedDvd = newCompressedDvd;
}
O(N) in the size
of the graph
run MR
term.
cond.
5. 01/30/15Bill Howe, UW 5QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: Spark [Zaharia HotCloud 2010]
Reduction output collected at driver program
“…does not currently support a grouped reduce
operation as in MapReduce”
val spark = new SparkContext(<Mesos master>)
var count = spark.accumulator(0)
for (i <- spark.parallelize(1 to 10000, 10)) {
val x = Math.random * 2 - 1
val y = Math.random * 2 - 1
if (x*x + y*y < 1) count += 1
}
println("Pi is roughly " + 4 * count.value / 10000.0)
all output sent
to driver.
6. 01/30/15Bill Howe, UW 6QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: Pregel [Malewicz PODC 2009]
Graphs only
clustering: k-means, canopy, DBScan
Assumes each vertex has access to outgoing edges
So an edge representation …
…requires offline preprocessing
perhaps using MapReduce
Edge(from, to)
7. 01/30/15Bill Howe, UW 7QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: Piccolo [Power OSDI 2010]
Partitioned table data model, with user-
defined partitioning
Programming model:
message-passing with global synchronization
barriers
User can give locality hints
Worth exploring a direct comparison
GroupTables(curr, next, graph)
8. 01/30/15Bill Howe, UW 8QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: BOOM [c.f. Alvaro EuroSys 10]
Distributed computing based on Overlog
(Datalog + temporal logic + more)
Recursion supported naturally
app: API-compliant implementation of MR
Worth exploring a direct comparison
9. 01/30/15Bill Howe, UW 9QuickTime™ and a
decompressor
are needed to see this picture.
Details
Architecture
Programming Model
Caching (and Indexing)
Scheduling
10. 01/30/15Bill Howe, UW 10QuickTime™ and a
decompressor
are needed to see this picture.
Example 1: PageRank
url rank
www.a.com 1.0
www.b.com 1.0
www.c.com 1.0
www.d.com 1.0
www.e.com 1.0
url_src url_dest
www.a.com www.b.com
www.a.com www.c.com
www.c.com www.a.com
www.e.com www.c.com
www.d.com www.b.com
www.c.com www.e.com
www.e.com www.c.om
www.a.com www.d.com
Rank Table R0
Linkage Table L
url rank
www.a.com 2.13
www.b.com 3.89
www.c.com 2.60
www.d.com 2.60
www.e.com 2.13
Rank Table R3
Ri L
Ri.rank = Ri.rank/γurlCOUNT(url_dest)
Ri.url = L.url_src
π(url_dest, γurl_destSUM(rank))
Ri+1
11. 01/30/15Bill Howe, UW 11QuickTime™ and a
decompressor
are needed to see this picture.
A MapReduce Implementation
M
M
M
M
M
r
r
Ri
L-split1
L-split0
M
M
r
r
i=i+1 Converged?
Join & compute rank
Aggregate fixpoint evaluation
Client
done
r
r
12. 01/30/15Bill Howe, UW 12QuickTime™ and a
decompressor
are needed to see this picture.
What’s the problem?
1. L is loaded on each iteration
2. L is shuffled on each iteration
3. Fixpoint evaluated as a separate MapReduce job per iteration
m
m
m
Ri
L-split1
L-split0
M
M
r
r
1.
2.
3.
L is loop invariant, but
plus
r
r M
M
r
r
13. 01/30/15Bill Howe, UW 13QuickTime™ and a
decompressor
are needed to see this picture.
Example 2: Transitive Closure
Friend Find all transitive friends of Eric
{Eric, Elisa}
{Eric, Tom
Eric, Harry}
{}
R1
R0 {Eric, Eric}
R2
R3
(semi-naïve evaluation)
14. 01/30/15Bill Howe, UW 14QuickTime™ and a
decompressor
are needed to see this picture.
Example 2 in MapReduce
M
M
M
M
M
r
r
Si
Friend1
Friend0
i=i+1
Anything new?
Join
Dupe-elim
Client
done
r
r
(compute next generation of friends)
(remove the ones
we’ve already seen)
15. 01/30/15Bill Howe, UW 15QuickTime™ and a
decompressor
are needed to see this picture.
What’s the problem?
1. Friend is loaded on each iteration
2. Friend is shuffled on each iteration
Friend is loop invariant, but
M
M
M
M
M
r
r
Si
Friend1
Friend0
Join
Dupe-elim
r
r
(compute next generation of friends)
(remove the ones
we’ve already seen)
1.
2.
16. 01/30/15Bill Howe, UW 16QuickTime™ and a
decompressor
are needed to see this picture.
Example 3: k-means
M
M
M
P0
i=i+1
ki - ki+1 < threshold?
Client
done
r
r
P1
P2
= k centroids at iteration iki
ki
ki
ki
ki+1
17. 01/30/15Bill Howe, UW 17QuickTime™ and a
decompressor
are needed to see this picture.
What’s the problem?
M
M
M
P0
i=i+1
ki - ki+1 < threshold?
Client
done
r
r
P1
P2
= k centroids at iteration ikiki
ki
ki
ki+1
1. P is loaded on each iteration
P is loop invariant, but
1.
18. 01/30/15Bill Howe, UW 18QuickTime™ and a
decompressor
are needed to see this picture.
Approach: Inter-iteration caching
Mapper input cache (MI)
Mapper output cache (MO)
Reducer input cache (RI)
Reducer output cache (RO)
M
M
M
r
r
…
Loop body
19. 01/30/15Bill Howe, UW 19QuickTime™ and a
decompressor
are needed to see this picture.
RI: Reducer Input Cache
Provides:
Access to loop invariant data without
map/shuffle
Used By:
Reducer function
Assumes:
1. Mapper output for a given table constant
across iterations
2. Static partitioning (implies: no new nodes)
PageRank
Avoid shuffling the network at every step
Transitive Closure
Avoid shuffling the graph at every step
K-means
No help
…
20. 01/30/15Bill Howe, UW 20QuickTime™ and a
decompressor
are needed to see this picture.
Reducer Input Cache Benefit
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Overall run time
21. 01/30/15Bill Howe, UW 21QuickTime™ and a
decompressor
are needed to see this picture.
Reducer Input Cache Benefit
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Join step only
Livejournal, 12GB
22. 01/30/15Bill Howe, UW 22QuickTime™ and a
decompressor
are needed to see this picture.
Reducer Input Cache Benefit
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Reduce and Shuffle of Join Step
Livejournal, 12GB
23. 01/30/15Bill Howe, UW 23QuickTime™ and a
decompressor
are needed to see this picture.
Join & compute rank
M
M
M
M
M
r
r
Ri
L-split1
L-split0
M
M
r
r
Aggregate fixpoint evaluation
r
r
Total
24. 01/30/15Bill Howe, UW 24QuickTime™ and a
decompressor
are needed to see this picture.
RO: Reducer Output Cache
Provides:
Distributed access to output of previous
iterations
Used By:
Fixpoint evaluation
Assumes:
1. Partitioning constant across iterations
2. Reducer output key functionally
determines Reducer input key
PageRank
Allows distributed fixpoint evaluation
Obviates extra MapReduce job
Transitive Closure
No help
K-means
No help
…
25. 01/30/15Bill Howe, UW 25QuickTime™ and a
decompressor
are needed to see this picture.
Reducer Output Cache BenefitFixpointevaluation(s)
Iteration # Iteration #
Livejournal dataset
50 EC2 small instances
Freebase dataset
90 EC2 small instances
26. 01/30/15Bill Howe, UW 26QuickTime™ and a
decompressor
are needed to see this picture.
MI: Mapper Input Cache
Provides:
Access to non-local mapper input on later
iterations
Used:
During scheduling of map tasks
Assumes:
1. Mapper input does not change
PageRank
Subsumed by use of Reducer Input Cache
Transitive Closure
Subsumed by use of Reducer Input Cache
K-means
Avoids non-local data reads on iterations > 0
…
27. 01/30/15Bill Howe, UW 27QuickTime™ and a
decompressor
are needed to see this picture.
Mapper Input Cache Benefit
5% non-local data reads;
~5% improvement
28. 01/30/15Bill Howe, UW 28QuickTime™ and a
decompressor
are needed to see this picture.
Conclusions (last slide)
Relatively simple changes to MapReduce/Hadoop can
support arbitrary recursive programs
TaskTracker (Cache management)
Scheduler (Cache awareness)
Programming model (multi-step loop bodies, cache control)
Optimizations
Caching loop invariant data realizes largest gain
Good to eliminate extra MapReduce step for termination checks
Mapper input cache benefit inconclusive; need a busier cluster
Future Work
Analyze expressiveness of Map Reduce Fixpoint
Consider a model of Map (Reduce+
) Fixpoint
29. 01/30/15Bill Howe, UW 29QuickTime™ and a
decompressor
are needed to see this picture.
Data-Intensive
Scalable Science
http://clue.cs.washington.edu
http://escience.washington.edu
Award IIS 0844572
Cluster Exploratory (CluE)
30. 01/30/15Bill Howe, UW 30QuickTime™ and a
decompressor
are needed to see this picture.
Motivation in One Slide
MapReduce can’t express recursion/iteration
Lots of interesting programs need loops
graph algorithms
clustering
machine learning
recursive queries (CTEs, datalog, WITH clause)
Dominant solution: Use a driver program outside
of mapreduce
Hypothesis: making MapReduce loop-aware
affords optimization
…and lays a foundation for scalable implementations of
recursive languages
31. 01/30/15Bill Howe, UW 31QuickTime™ and a
decompressor
are needed to see this picture.
Experiments
Amazon EC2
20, 50, 90 default small instances
Datasets
Billions of Triples (120GB) [1.5B nodes 1.6B edges]
Freebase (12GB) [7M ndoes 154M edges]
Livejournal social network (18GB) [4.8M nodes, 67M edges]
Queries
Transitive Closure
PageRank
k-means
[VLDB 2010]
32. 01/30/15Bill Howe, UW 32QuickTime™ and a
decompressor
are needed to see this picture.
HaLoop Architecture
33. 01/30/15Bill Howe, UW 33QuickTime™ and a
decompressor
are needed to see this picture.
Scheduling Algorithm
Input: Node node
Global variable: HashMap<Node, List<Parition>> last, HashMaph<Node, List<Partition>> current
1: if (iteration ==0) {
2: Partition part = StandardMapReduceSchedule(node);
3: current.add(node, part);
4: }else{
5: if (node.hasFullLoad()) {
6: Node substitution = findNearbyNode(node);
7: last.get(substitution).addAll(last.remove(node));
8: return;
9: }
10: if (last.get(node).size()>0) {
11: Partition part = last.get(node).get(0);
12: schedule(part, node);
13: current.get(node).add(part);
14: list.remove(part);
15: }
16: }
The same as MapReduce
Find a substitution
Iteration-local Schedule
34. 01/30/15Bill Howe, UW 34QuickTime™ and a
decompressor
are needed to see this picture.
Programming Interface
Job job = new Job();
job.AddMap(Map Rank, 1);
job.AddReduce(Reduce Rank, 1);
job.AddMap(Map Aggregate, 2);
job.AddReduce(Reduce Aggregate, 2);
job.AddInvariantTable(#1);
job.SetInput(IterationInput);
job.SetFixedPointThreshold(0.1);
job.SetDistanceMeasure(ResultDistance);
job.SetMaxNumOfIterations(10);
job.SetReducerInputCache(true);
job.SetReducerOutputCache(true);
job.Submit();
define loop body
Turn on caches
Declare an input as invariant
Specify loop body input,
parameterized by iteration #
Termination condition
35. 01/30/15Bill Howe, UW 35QuickTime™ and a
decompressor
are needed to see this picture.
Cache Infrastructure Details
Programmer control
Architecture for cache management
Scheduling for inter-iteration locality
Indexing the values in the cache
36. 01/30/15Bill Howe, UW 36QuickTime™ and a
decompressor
are needed to see this picture.
Other Extensions and Experiments
Distributed databases and Pig/Hadoop for Astronomy [IASDS 09]
Efficient “Friends of Friends” in Dryad [SSDBM 2010]
SkewReduce: Automated skew handling [SOCC 2010]
Image Stacking and Mosaicing with Hadoop [Hadoop Summit 2010]
HaLoop: Efficient iterative processing with Hadoop [VLDB2010]
37. 01/30/15Bill Howe, UW 37QuickTime™ and a
decompressor
are needed to see this picture.
MapReduce Broadly Applicable
Biology
[Schatz 08, 09]
Astronomy
[IASDS 09, SSDBM 10, SOCC 10, PASP 10]
Oceanography
[UltraVis 09]
Visualization
[UltraVis 09, EuroVis 10]
38. 01/30/15Bill Howe, UW 38QuickTime™ and a
decompressor
are needed to see this picture.
Key idea
When the loop output is large…
transitive closure
connected components
PageRank (with a convergence test as the
termination condition)
…need a distributed fixpoint operator
typically implemented as yet another
MapReduce job -- on every iteration
39. 01/30/15Bill Howe, UW 39QuickTime™ and a
decompressor
are needed to see this picture.
Background
Why is MapReduce popular?
Because it’s fast?
Because it scales to 1000s of commodity
nodes?
Because it’s fault tolerant?
Witness
MapReduce on GPUs
MapReduce on MPI
MapReduce in main memory
MapReduce on <10 nodes
40. 01/30/15Bill Howe, UW 40QuickTime™ and a
decompressor
are needed to see this picture.
So why is MapReduce popular?
The programming model
Two serial functions, parallelism for free
Easy and expressive
Compare this with MPI
70+ operations
But it can’t express recursion
graph algorithms
clustering
machine learning
recursive queries (CTEs, datalog, WITH clause)
41. 01/30/15Bill Howe, UW 41QuickTime™ and a
decompressor
are needed to see this picture.
Fixpoint
A fixpoint of a function f is a value x such that
f(x) = x
The fixpoint queries FIX can be expressed with
the relational algebra plus a fixpoint operator
Map - Reduce - Fixpoint
hypothesis: sufficient model for all recursive queries
Editor's Notes
Programming Model
Programming Model
User writes two serial functions, and MapReduce create a parallel program with fairly clear semantics and good scalability.