Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
We examine the effectiveness of randomized quasi Monte Carlo (RQMC) to improve the convergence rate of the mean integrated square error, compared with crude Monte Carlo (MC), when estimating the density of a random variable X defined as a function over the s-dimensional unit cube (0,1)^s. We consider histograms and kernel density estimators. We show both theoretically and empirically that RQMC estimators can achieve faster convergence rates in
some situations.
This is joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
We examine the effectiveness of randomized quasi Monte Carlo (RQMC) to improve the convergence rate of the mean integrated square error, compared with crude Monte Carlo (MC), when estimating the density of a random variable X defined as a function over the s-dimensional unit cube (0,1)^s. We consider histograms and kernel density estimators. We show both theoretically and empirically that RQMC estimators can achieve faster convergence rates in
some situations.
This is joint work with Amal Ben Abdellah, Art B. Owen, and Florian Puchhammer.
A fundamental numerical problem in many sciences is to compute integrals. These integrals can often be expressed as expectations and then approximated by sampling methods. Monte Carlo sampling is very competitive in high dimensions, but has a slow rate of convergence. One reason for this slowness is that the MC points form clusters and gaps. Quasi-Monte Carlo methods greatly reduce such clusters and gaps, and under modest smoothness demands on the integrand they can greatly improve accuracy. This can even take place in problems of surprisingly high dimension. This talk will introduce the basics of QMC and randomized QMC. It will include discrepancy and the Koksma-Hlawka inequality, some digital constructions and some randomized QMC methods that allow error estimation and sometimes bring improved accuracy.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
Multi-scalar multiplication: state of the art and new ideasGus Gutoski
A 90-minute online presentation for zkStudyClub, delivered 2020-06-01. I present a new idea with a demonstrated 5% speed-up for multi-scalar multiplication. When combined with precomputation, this method could yield upwards of 20% speed-up.
Presentation at OM-2017, the Twelfth International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference ISWC-2017, October 21st, 2017, Vienna, Austria
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Sparse Binary Zero-Sum Games". David Auger, Jialin Liu, Sylvie Ruette, David L. St-Pierre and Olivier Teytaud. The 6th Asian Conference on Machine Learning (ACML), 2014.
Bayesian modelling and computation for Raman spectroscopyMatt Moores
Raman spectroscopy can be used to identify molecules by the characteristic scattering of light from a laser. Each Raman-active dye label has a unique spectral signature, comprised by the locations and amplitudes of the peaks. The Raman spectrum is discretised into a multivariate observation that is highly collinear, hence it lends itself to a reduced-rank representation. We introduce a sequential Monte Carlo (SMC) algorithm to separate this signal into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. By incorporating this representation into a Bayesian functional regression, we can quantify the relationship between dye concentration and peak intensity. We also estimate the model evidence using SMC to investigate long-range dependence between peaks. These methods have been implemented as an R package, using RcppEigen and OpenMP.
This is a presentation that I presented for my partial fulfillment of the course Optimization Methods for Machine Learning at IIT Gandhinagar. These slides contain an introduction to Alternating Direction Methods of Multipliers and how is the method used in creating distributed optimization algorithms.
Importance sampling has been widely used to improve the efficiency of deterministic computer simulations where the simulation output is uniquely determined, given a fixed input. To represent complex system behavior more realistically, however, stochastic computer models are gaining popularity. Unlike deterministic computer simulations, stochastic simulations produce different outputs even at the same input. This extra degree of stochasticity presents a challenge for reliability assessment in engineering system designs. Our study tackles this challenge by providing a computationally efficient method to estimate a system's reliability. Specifically, we derive the optimal importance sampling density and allocation procedure that minimize the variance of a reliability estimator. The application of our method to a computationally intensive, aeroelastic wind turbine simulator demonstrates the benefits of the proposed approaches.
ExcelR is considered to be the best Data Science training institute in Noida which offers a gamut of services starting from training to placement as part of the program. Faculty is our forte. All our trainers are working as Data Scientists with over 15+ years professional experience. They are qualified, certified, experienced and has passion for training. Majority of the trainers are alumni of premier institutes such as IIT, IIM, Indian School of Business (ISB) and a few Ph.D qualified professionals. Participants who register for classroom training can attend instructor led online training and get access to self-paced e-learning videos. This blended model of training will ensure a perpetual learning so that the participants can absorb and assimilate the concepts thoroughly. ExcelR is the official training delivery partner for over 30+ universities and colleges across the globe which endorses the quality of our course and faculty. ExcelR holds one of the highest placement records in the space of Data Science owing to their tie ups with various organizations, recruiting the participants trained through us.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
This talk is a new update based on some of our recent results on doing Tall and Skinny QRs in MapReduce. In particular, the "fast" iterative refinement approximation based on a sample is new.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
A talk at the SIMONS workshop on Parallel and Distributed Algorithms for Inference and Optimization on how to do tall-and-skinny QR factorizations on MapReduce using a communication avoiding algorithm.
We present recent result on the numerical analysis of Quasi Monte-Carlo quadrature methods, applied to forward and inverse uncertainty quantification for elliptic and parabolic PDEs. Particular attention will be placed on Higher
-Order QMC, the stable and efficient generation of
interlaced polynomial lattice rules, and the numerical analysis of multilevel QMC Finite Element discretizations with applications to computational uncertainty quantification.
Multi-scalar multiplication: state of the art and new ideasGus Gutoski
A 90-minute online presentation for zkStudyClub, delivered 2020-06-01. I present a new idea with a demonstrated 5% speed-up for multi-scalar multiplication. When combined with precomputation, this method could yield upwards of 20% speed-up.
Presentation at OM-2017, the Twelfth International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference ISWC-2017, October 21st, 2017, Vienna, Austria
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
"Sparse Binary Zero-Sum Games". David Auger, Jialin Liu, Sylvie Ruette, David L. St-Pierre and Olivier Teytaud. The 6th Asian Conference on Machine Learning (ACML), 2014.
Bayesian modelling and computation for Raman spectroscopyMatt Moores
Raman spectroscopy can be used to identify molecules by the characteristic scattering of light from a laser. Each Raman-active dye label has a unique spectral signature, comprised by the locations and amplitudes of the peaks. The Raman spectrum is discretised into a multivariate observation that is highly collinear, hence it lends itself to a reduced-rank representation. We introduce a sequential Monte Carlo (SMC) algorithm to separate this signal into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. By incorporating this representation into a Bayesian functional regression, we can quantify the relationship between dye concentration and peak intensity. We also estimate the model evidence using SMC to investigate long-range dependence between peaks. These methods have been implemented as an R package, using RcppEigen and OpenMP.
This is a presentation that I presented for my partial fulfillment of the course Optimization Methods for Machine Learning at IIT Gandhinagar. These slides contain an introduction to Alternating Direction Methods of Multipliers and how is the method used in creating distributed optimization algorithms.
Importance sampling has been widely used to improve the efficiency of deterministic computer simulations where the simulation output is uniquely determined, given a fixed input. To represent complex system behavior more realistically, however, stochastic computer models are gaining popularity. Unlike deterministic computer simulations, stochastic simulations produce different outputs even at the same input. This extra degree of stochasticity presents a challenge for reliability assessment in engineering system designs. Our study tackles this challenge by providing a computationally efficient method to estimate a system's reliability. Specifically, we derive the optimal importance sampling density and allocation procedure that minimize the variance of a reliability estimator. The application of our method to a computationally intensive, aeroelastic wind turbine simulator demonstrates the benefits of the proposed approaches.
ExcelR is considered to be the best Data Science training institute in Noida which offers a gamut of services starting from training to placement as part of the program. Faculty is our forte. All our trainers are working as Data Scientists with over 15+ years professional experience. They are qualified, certified, experienced and has passion for training. Majority of the trainers are alumni of premier institutes such as IIT, IIM, Indian School of Business (ISB) and a few Ph.D qualified professionals. Participants who register for classroom training can attend instructor led online training and get access to self-paced e-learning videos. This blended model of training will ensure a perpetual learning so that the participants can absorb and assimilate the concepts thoroughly. ExcelR is the official training delivery partner for over 30+ universities and colleges across the globe which endorses the quality of our course and faculty. ExcelR holds one of the highest placement records in the space of Data Science owing to their tie ups with various organizations, recruiting the participants trained through us.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
This talk is a new update based on some of our recent results on doing Tall and Skinny QRs in MapReduce. In particular, the "fast" iterative refinement approximation based on a sample is new.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
A talk at the SIMONS workshop on Parallel and Distributed Algorithms for Inference and Optimization on how to do tall-and-skinny QR factorizations on MapReduce using a communication avoiding algorithm.
A history of PageRank from the numerical computing perspectiveDavid Gleich
We'll survey some of the underlying ideas from Google's PageRank algorithm along the lines of Massimo Franceschet's CACM history.
There are some slight liberties I've taken to make it more accessible.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
A talk I gave at the annual meeting for the MetroNY section of the MAA about how Google works from a link-ranking perspective. (http://sections.maa.org/metrony/)
Based on a talk by Margot Gerritsen (which used elements from another talk I gave years ago, yay co-author improvements!)
Spacey random walks and higher-order data analysisDavid Gleich
My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
My talk from KDD2012 about vertex neighborhoods and low conductance cuts. See the paper here: http://arxiv.org/abs/1112.0031 and http://dl.acm.org/citation.cfm?id=2339628
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
Line Detection is computationally more intense than humans often would
expect. A graphics processing unit (GPU) can meet this need with substantial computational power, but the classic algorithmic approaches to line detection are often of a serial nature
and/or
utilize statistical sampling that cannot provide deterministic detection guarantuees.
Our talk presents a line detection algorithm that is able to detect lines of any angle, throughout the image. It is as parallel as the number of given image pixels multiplied by the
number of potential line angle bins. In contrast to the Hough transform, it is able to locate start and end of found line segments as well. Its redundant image accesses and bilinear
interpolations needed for
the multi-angle edge detection are managed by the texture cache, conserving DRAM memory bandwidth and computational complexity.
It is based on local edge detection filtering to fill small line angle candidates, followed by the inference of line primitives by a segmented scan, all happening in a data-parallel
fashion.
The output is a 2D array of line segments, providing the length of all line segments that originate from a given 2D position and a given line angle bin. This line segment map can then
be used to either infer higher-level vector symbols built from line primitives, again in a data-parallel fashion, using either GPU atomics or a data compaction algorithm in stream
fashion such as HistoPyramids. We exemplify this with the detection of parallel lines and quadriliterals.
While the algorithm's implementation benefits from atomics and shared memory, the basic algorithmic implementation is so simple that it can even be implemented on OpenGL ES 2.0 hardware
such as mobile phones.
Through a WebGL implementation, the line detection can even be applied to HTML5-based
camera input, providing a platform portable approach to low-level computer vision, and, in continuation, augmented reality and symbol detection on mobile phones.
https://www.geofront.eu/demos/lines
https://telecombcn-dl.github.io/2018-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
In this presentation part of the "Introduction to DirectX Raytracing" course, Colin Barré-Brisebois of SEED discusses some of the challenges the team had to go through when going from raster to real-time raytracing for Project PICA PICA.
In this presentation we consider several main methods for contruction regular QC-LDPC codes using algebraic approach. Consider existance of non broken by circulant permutation matrix cycles (short balanced cycles). Using Vontobel approach illustrate way to estimate girth bound and it influence on error-floor properties of QC-LDPC codes
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14Sri Ambati
Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.
http://docs.0xdata.com/datascience/deeplearning.html
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Codemotion
Generating representations is the ultimate act of creativity. Recent advancements in neural networks (and in processing power) brought us the capability to perform regression against complex samples like images and audio. In this presentation we show the underlying mechanics of media generation from latent space representation of abstract visual ideas, real embodiment of “Platonic” concepts, with Variational Autoencoders, Generative Adversarial Networks, neural style transfer and PixelRNN/CNN along with current practical applications like DeepFake.
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
Applications in many areas analyze an ever-changing environment. On billion vertices graphs, providing snapshots imposes a large performance cost. We propose the first formal model for graph analysis running concurrently with streaming data updates. We consider an algorithm valid if its output is correct for the initial graph plus some implicit subset of concurrent changes. We show theoretical properties of the model, demonstrate the model on various algorithms, and extend it to updating results incrementally.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Slides of Nathan Piasco ICRA 2019 oral presentation about the paper "Learning Scene Geometry for Visual Localization in Challenging Conditions". Best paper in Robot Vision Finalist
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
Slides for my talk on:
"Convolutional Neural Networks for Image Classification"
...at the Cape Town Deep Learning Meet-up 20170620
https://www.meetup.com/Cape-Town-deep-learning/events/240485642/
Similar to What you can do with a tall-and-skinny QR factorization in Hadoop: Principal components and large regressions (20)
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
I gave this talk at Netflix about some of the recent work I've been doing on fast matrix primitives for link prediction and also some non-standard uses of the nuclear norm for ranking.
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
A talk I gave at ancestry.com on Hadoop, SQL, recommendation and graph algorithms. It's a tutorial overview, there are better algorithms than those I describe, but these are a simple starting point.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Leading Change strategies and insights for effective change management pdf 1.pdf
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal components and large regressions
1. What you can do with a Tall-and-Skinny !
QR Factorization on Hadoop: !
Large regressions, Principal Components
Slides bit.ly/16LS8Vk
@dgleich
Code github.com/dgleich/mrtsqr
dgleich@purdue.edu
DAVID F. GLEICH
ASSISTANT PROFESSOR !
COMPUTER SCIENCE !
PURDUE UNIVERSITY
1
David Gleich · Purdue
bit.ly/16LS8Vk
2. Why you should stay …
you like advanced machine learning techniques
you want to understand how to compute the
singular values and vectors of a huge matrix
(that’s tall and skinny)
you want to learn about large-scale regression,
and principal components from a matrix
perspective
2
David Gleich · Purdue
bit.ly/16LS8Vk
3. What I’m going to assume
you know
MapReduce
Python
Some simple matrix manipulation
3
David Gleich · Purdue
bit.ly/16LS8Vk
4. Tall-and-Skinny
matrices
(m ≫ n)
Many rows (like a billion)
A
A few columns (under 10,000)
regression and!
general linear models!
with many samples! From tinyimages"
collection
Used in
block iterative methods
panel factorizations
approximate kernel k-means
big-data SVD/PCA!
4
David Gleich · Purdue
bit.ly/16LS8Vk
5. If you have tons of small
records, then there is probably
a tall-and-skinny matrix
somwhere
5
David Gleich · Purdue
bit.ly/16LS8Vk
6. Tall-and-skinny matrices are
common in BigData
A : m x n, m ≫ n
A1
Key is an arbitrary row-id
A2
Value is the 1 x n array "
for a row
A3
Each submatrix Ai is an "
A4
the input to a map task.
6
David Gleich · Purdue
bit.ly/16LS8Vk
7. PCA of 80,000,000!
images
1000 pixels
1
0.8 0
Fraction of variance
Fraction of variance
80,000,000 images
0.6 0
A 0.4 0
0.2 0
First 16 columns of V as 0
20 40 60 80 100
images
Principal Components
Figure 5: The 16 most impo
nent basis functions (by row
7
Constantine & Gleich, MapReduce 2011.
David Gleich · Purdue
bit.ly/16LS8Vk
8. via the sum of red-pixel values in each image as a linear combi-
nation of the gray values in each image. Formally, if ri is the
time
and
Regression with 80,000,000
sum of the red components in all pixels of image i, and Gi,j
is the gray value of the jth pixel in image i, then we wanted
per-
ates
images
q q
to find min i (ri ≠ j Gi,j sj )2 . There is no particular im-
(for portance to this regression problem, we use it merely as a
demonstration.
1000 pixels
on),
split The coe cients sj are dis-
file played as an image to approx.
The goal was at the right.
d by They reveal regionsthere was
how much red of the im-
test age in a picture fromimportant
that are not as the
80,000,000 images
the in determining the overall red
value of the grayscale
r in component of an image. The
pixels only.
A color scale varies from light-
final
size blue (strongly measure of blue
We get a negative) to
pers (0) howred (strongly positive).
and much “redness”
The computation took 30 min-
each pixel contributes to
1000 utes using the Dumbo frame-
the whole.
work and a two-iteration job with 250 intermediate reducers.
h is
the We also solved a principal component problem to find a
hav- principal component basis for each image. Let G be matrix
final of Gi,j ’s from the regression andDavidui be the meanbit.ly/16LS8Vk
let Gleich · Purdue
of the ith
8
10. QR Factorization and the
Gram Schmidt process
Consider a set of vectors v1 to
vn. Set u1 to be v1.
Create a new vector u2 by
removing any “component” of
u1 from v2.
Create a new vector u3 by
removing any “component” of
u1 and u2 from v3.
…
“Gram-Schmidt process” "
10
from Wikipedia
David Gleich · Purdue
bit.ly/16LS8Vk
11. QR Factorization and the
Gram Schmidt process
v1 = a1 u1
v2 = b1 u1 + b2 u2
v3 = c1 u1 + c2 u2 + c3 u3
⇥ ⇤
v1 v2 v3 ...
2 3
a1 b1 c1 ...
⇥ ⇤6 0
6 b2 c2 ... 77
= u1 u2 v3 ... 6 0 0 c3 ... 7
4 5
.
. .
. .
. ..
. . . .
11
David Gleich · Purdue
bit.ly/16LS8Vk
12. QR Factorization and the
Gram Schmidt process
v1 = a1 u1
v2 = b1 u1 + b2 u2
v3 = c1 u1 + c2 u2 + c3 u3
For this problem
V = UR All vectors in U
are at right
angles, i.e. they
What it’s usually"
written as by others
A = QR are decoupled
12
David Gleich · Purdue
bit.ly/16LS8Vk
13. QR Factorization and the
Gram Schmidt process
R
v1 = a1 u1
v2 = b1 u1 + b2 u2
v3 = c1 u1 + c2 u2 + c3 u3
A =
Q All vectors in U
are at right
angles, i.e. they
are decoupled
13
David Gleich · Purdue
bit.ly/16LS8Vk
14. PCA of 80,000,000!
images
First 16
columns
of V as
images
1000 pixels
R V
SVD
(principal
TSQR
components)
80,000,000 images
Top 100
A X singular
values
Zero"
mean"
rows
MapReduce Post Processing
14
Constantine & Gleich, MapReduce 2010.
David Gleich · Purdue
bit.ly/16LS8Vk
15. Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute colsum( A ) 161 sec.
Time to compute R in qr( A ) 387 sec.
15
David Gleich · Purdue
bit.ly/16LS8Vk
16. The rest of the talk!
Full TSQR code in hadoopy
import random, numpy, hadoopy def close(self):
class SerialTSQR: self.compress()
def __init__(self,blocksize,isreducer): for row in self.data:
self.bsize=blocksize key = random.randint(0,2000000000)
self.data = [] yield key, row
if isreducer: self.__call__ = self.reducer
else: self.__call__ = self.mapper def mapper(self,key,value):
self.collect(key,value)
def compress(self):
R = numpy.linalg.qr( def reducer(self,key,values):
numpy.array(self.data),'r') for value in values: self.mapper(key,value)
# reset data and re-initialize to R
self.data = [] if __name__=='__main__':
for row in R: mapper = SerialTSQR(blocksize=3,isreducer=False)
self.data.append([float(v) for v in row]) reducer = SerialTSQR(blocksize=3,isreducer=True)
hadoopy.run(mapper, reducer)
def collect(self,key,value):
self.data.append(value)
if len(self.data)>self.bsize*len(self.data[0]):
self.compress()
16
David Gleich · Purdue
bit.ly/16LS8Vk
17. Communication avoiding QR (Demmel et al. 2008) !
on MapReduce (Constantine and Gleich, 2010)
Algorithm
Data Rows of a matrix
A1 A1 Map QR factorization of rows
A2
qr Reduce QR factorization of rows
A2 Q2 R2
Mapper 1 qr
Serial TSQR A3 A3 Q3 R3
A4 qr emit
A4 Q4 R4
A5 A5
qr
A6 A6 Q6 R6
Mapper 2 qr
Serial TSQR A7 A7 Q7 R7
A8 qr emit
A8 Q8 R8
R4 R4
Reducer 1
Serial TSQR qr emit
R8 R8 Q R
17
David Gleich · Purdue
bit.ly/16LS8Vk
18. The rest of the talk!
Full TSQR code in hadoopy
import random, numpy, hadoopy def close(self):
class SerialTSQR: self.compress()
def __init__(self,blocksize,isreducer): for row in self.data:
self.bsize=blocksize key = random.randint(0,2000000000)
self.data = [] yield key, row
if isreducer: self.__call__ = self.reducer
else: self.__call__ = self.mapper def mapper(self,key,value):
self.collect(key,value)
def compress(self):
R = numpy.linalg.qr( def reducer(self,key,values):
numpy.array(self.data),'r') for value in values: self.mapper(key,value)
# reset data and re-initialize to R
self.data = [] if __name__=='__main__':
for row in R: mapper = SerialTSQR(blocksize=3,isreducer=False)
self.data.append([float(v) for v in row]) reducer = SerialTSQR(blocksize=3,isreducer=True)
hadoopy.run(mapper, reducer)
def collect(self,key,value):
self.data.append(value)
if len(self.data)>self.bsize*len(self.data[0]):
self.compress()
18
David Gleich · Purdue
bit.ly/16LS8Vk
19. Too many maps cause too
much data to one reducer!
Each image is 5k.
Each HDFS block has "
12,800 images.
6,250 total blocks.
Each map outputs "
1000-by-1000 matrix
One reducer gets a 6.25M-
by-1000 matrix (50GB)
19
David Gleich · Purdue
bit.ly/16LS8Vk
20. map emit reduce emit reduce emit
R1 R2,1 R
A1 Mapper 1-1
S1 Reducer 1-1
S(2)
A2 Reducer 2-1
Serial TSQR Serial TSQR Serial TSQR
shuffle
identity map
map emit reduce emit
R2 R2,2
A2 Mapper 1-2 S(1) S
A2 Reducer 1-2
shuffle
Serial TSQR Serial TSQR
A
map emit reduce emit
R3 R2,3
A3 Mapper 1-3
S3
A2 Reducer 1-3
Serial TSQR Serial TSQR
map emit
R4
A3
4 Mapper 1-4
Serial TSQR
20
Iteration 1 Iteration 2
David Gleich · Purdue
bit.ly/16LS8Vk
21. Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute colsum( A ) 161 sec.
Time to compute R in qr( A ) 387 sec.
21
David Gleich · Purdue
bit.ly/16LS8Vk
22. Hadoop streaming isn’t
always slow!
Synthetic data test on 100,000,000-by-500 matrix (~500GB)
Codes implemented in MapReduce streaming
Matrix stored as TypedBytes lists of doubles
Python frameworks use Numpy+ATLAS matrix.
Custom C++ TypedBytes reader/writer with ATLAS matrix.
Iter 1
Iter 2
Overall"
Total (secs.)
Total (secs.)
Total (secs.)
Dumbo
960
217
1177
Hadoopy
612
118
730
C++! 350! 37! 387!
Java
436
66
502
22
David Gleich · Purdue
bit.ly/16LS8Vk
23. Use multiple iterations for
problems with many columns
Cols.
Iters.
Split" Maps
Secs.
(MB)
Increasing split size
50
1
64
8000
388
improves performance
(accounts for Hadoop –
–
256
2000
184
data movement)
–
–
512
1000
149
Increasing iterations –
2
64
8000
425
helps for problems with –
–
256
2000
220
many columns.
–
–
512
1000
191
(1000 columns with 64- 1000
1
512
1000
666
MB split size overloaded
–
2
64
6000
590
the single reducer.)
–
–
256
2000
432
–
–
512
1000
337
23
David Gleich · Purdue
bit.ly/16LS8Vk
24. More about how to !
compute a regression
2
min kAx bk
XX
2
= min ( Aij xj bi )
i j
A b1
A1 A1
Q2 b2 = Q2T b1
qr
A2 A2 R2
Mapper 1 qr
Serial TSQR A3 A3
b
A4
24
David Gleich · Purdue
bit.ly/16LS8Vk
25. TSQR code in hadoopy for
regressions
import random, numpy, hadoopy def close(self):
class SerialTSQR: self.compress()
def __init__(self,blocksize,isreducer): for i,row in enumerate(self.data):
[…] key = random.randint(0,2000000000)
yield key, (row, self.rhs[i])
def compress(self):
Q,R = numpy.linalg.qr( def mapper(self,key,value):
numpy.array(self.data), ‘full’) self.collect(key,unpack(value))
# reset data and re-initialize to R
self.data = [] def reducer(self,key,values):
for row in R: for value in values: self.mapper(key,
self.data.append([float(v) for v in row]) unpack(value))
self.rhs = list( numpy.dot(Q.T,
numpy.array(self.rhs) ) if __name__=='__main__':
mapper = SerialTSQR(blocksize=3,isreducer=False)
def collect(self,key,valuerhs): reducer = SerialTSQR(blocksize=3,isreducer=True)
self.data.append(valuerhs[0]) hadoopy.run(mapper, reducer)
self.rhs.append(valuerhs[1])
if len(self.data)>self.bsize*len(self.data[0]):
self.compress()
25
David Gleich · Purdue
bit.ly/16LS8Vk
26. More about how to !
compute a regression
min kAx bk2
= min kQRx bk2
Orthogonal or “right angle” matrices"
don’t change vector magnitude
T T 2
QT b
= min kQ QRx Q bk
A R = min kRx Q T bk2
QR"
for " This is a tiny linear system!
Regression
def compute_x(output):!
R,y = load_from_hdfs(output)!
x = numpy.linalg.solve(R,y)!
write_output(x,output+’-x’)!
b
26
David Gleich · Purdue
bit.ly/16LS8Vk
27. We do a similar step for the
PCA and compute the 1000-
by-1000 SVD on one machine
27
David Gleich · Purdue
bit.ly/16LS8Vk
29. What about the matrix Q?
We want Q to be Constantine & Gleich,
MapReduce 2011
numerically
orthogonal.
Prior work
norm ( QTQ – I )
AR-1
A condition number
measures problem Benson, Gleich,
sensitivity.
Demmel, Submitted
AR + "
-1
nt
Direct TSQR
refineme
iterative Benson, Gleich, "
Prior methods all Demmel, Submitted
failed without any 105
1020
warning.
Condition number
29
David Gleich · Purdue
bit.ly/16LS8Vk
30. Taking care of business by
keeping track of Q
3. Distribute the
pieces of Q*1 and
form the true Q
Mapper 1
Mapper 3
Task 2
R1
Q11
A1
Q1
R1
Q11
R
Q1
Q1
R2
Q21
Q output
R output
R2
R3
Q31
Q21
A2
Q2
Q2
Q2
R4
Q41
R3
Q31
2. Collect R on one
A3
Q3
Q3
Q3
node, compute Qs
for each piece
R4
Q41
A4
Q4
Q4
Q4
1. Output local Q and
R in separate files
30
David Gleich · Purdue
bit.ly/16LS8Vk
32. Future work … more columns!
With ~3000 columns, one 64MB chunk is a local
QR computation.
Could “iterate in blocks of 3000” columns to
continue … maybe “efficient” for 10,000 columns
Need different ideas for 100,000 columns
(randomized methods?)
32
David Gleich · Purdue
bit.ly/16LS8Vk
I think this took 30 minutes using our slowest codes. Our fastest codes should take it down to about 3-4 minutes. You’ll probably wait longer to get your job scheduled.
I think this took 30 minutes using our slowest codes. Our fastest codes should take it down to about 3-4 minutes. You’ll probably wait longer to get your job scheduled.