In a talk at the Chinese Academic of Sciences Institute for Automation, I discuss some of the MapReduce and community detection methods I've worked on.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.
A copy of my slides from the SILO Seminar at UW Madison on our recent developments for the NEO-K-Means methods including new optimization routines and results.
Localized methods for diffusions in large graphsDavid Gleich
I describe a few ongoing research projects on diffusions in large graphs and how we can create efficient matrix computations in order to determine them efficiently.
PageRank Centrality of dynamic graph structuresDavid Gleich
A talk I gave at the SIAM Annual Meeting Mini-symposium on the mathematics of the power grid organized by Mahantesh Halappanavar. I discuss a few ideas on how our dynamic centrality could help analyze such situations.
Localized methods in graph mining exploit the local structures in a graph instead attempting to find global structures. These are widely successful at all sorts of problems including community detection, label propagation, and a few others.
Spacey random walks and higher order Markov chainsDavid Gleich
My talk at SIAM NetSci workshop (2015) on our new spacey random walk and spacey random surfer models and how we derived them. There many potential extensions and opportunities to use this for analyzing big data as tensors.
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Spacey random walks and higher-order data analysisDavid Gleich
My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
Presentation at OM-2017, the Twelfth International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference ISWC-2017, October 21st, 2017, Vienna, Austria
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Tensor Train (TT) decomposition [3] is a generalization of SVD decomposition from matrices to tensors (=multidimensional arrays).
It represents a tensor compactly in terms of factors and allows to work with the tensor via its factors without materializing the tensor itself.
For example, we can find the elementwise product of two TT-tensors of size 2^100 and get the result in the TT-format as well.
In the talk, we will show how Tensor Train decomposition can be used to represent parameters of neural networks [1] and polynomial models [2].
This parametrization allows exponentially many 'virtual' parameters while working only with small factors of the TT-format.
To train the model, i.e. optimize the objective subject to the constraint that the parameters are in the TT-format, [2] uses stochastic Riemannian optimization.
[1] Novikov, A., Podoprikhin, D., Osokin, A., & Vetrov, D. P. (2015). Tensorizing neural networks. In Advances in Neural Information Processing Systems.
[2] Novikov, A., Trofimov, M., & Oseledets, I. (2016). Tensor Train polynomial models via Riemannian optimization. arXiv:1605.03795.
[3] Oseledets, I. (2011). Tensor-train decomposition. SIAM Journal on Scientific Computing.
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group at MIT between 2009-2010. She is the recipient of the Microsoft Faculty Fellowship, ARO Young Investigator Award, NSF CAREER Award, and IBM Fran Allen PhD fellowship.
Applications of community detection in bibliometric network analysisNees Jan van Eck
In this talk, we focus on the analysis of bibliometric networks, and in particular on the detection of communities in these networks. We start by demonstrating VOSviewer, a popular software tool for visualizing bibliometric networks. We discuss the techniques used by VOSviewer for visualizing bibliometric networks and for detecting communities in these networks. We pay special attention to the close relationship between visualization and community detection, and we discuss the unified approach to visualization and community detection that is implemented in VOSviewer. We then shift our attention to community detection in very large citation networks, including millions of publications and hundreds of millions of citation relations. We show how community detection techniques can be used to construct highly detailed classification systems of science. We also discuss applications of such classification systems to science policy questions. Finally, we demonstrate CitNetExplorer, a new software tool in which community detection techniques are used to support the large-scale analysis of citation networks. We use CitNetExplorer to analyze the citation network of publications on network science and in particular on community detection.
This talk is a new update based on some of our recent results on doing Tall and Skinny QRs in MapReduce. In particular, the "fast" iterative refinement approximation based on a sample is new.
Community detection from research papers (AAN dataset) using the algorithms:
K-Means
Louvain
Newman-Girvan
github link to code: https://goo.gl/CXej44
github link to project web page: http://goo.gl/7OOkhI
youtube link to video:https://goo.gl/SCpamf
dropbox link to ppt report video: https://goo.gl/cgACzU
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
We study how Google's PageRank method relates to mincut and a particular type of electrical flow in a network. We also explain the details of how the "push method" for computing PageRank helps to accelerate it. This has implications for semi-supervised learning and machine learning, as well as social network analysis.
Spacey random walks and higher-order data analysisDavid Gleich
My talk at TMA 2016 (The workshop on Tensors, Matrices, and their Applications) on the relationship between a spacey random walk process and tensor eigenvectors
Spectral clustering with motifs and higher-order structuresDavid Gleich
I presented these slides at the #strathna meeting in Glasgow in June 2017. They are an updated and enhanced version of the earlier talks on the subject.
Relaxation methods for the matrix exponential on large networksDavid Gleich
My talk from the Stanford ICME seminar series on doing network analysis and link prediction using the a fast algorithm for the matrix exponential on graph problems.
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
This is my KDD2015 talk on robustness in semi-supervised learning. The paper is already on Michael Mahoney's website: http://www.stat.berkeley.edu/~mmahoney/pubs/robustifying-kdd15.pdf See the KDD paper for all the details, which this talk is a bit light on.
Correlation clustering and community detection in graphs and networksDavid Gleich
We show a new relationship between various community detection objectives and a correlation clustering framework. These enable us to detect communities with good bounds on the solution.
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
I discuss some runtimes for the personalized PageRank vector and how it relates to open questions in how we should tackle these network based measures via matrix computations.
Presentation at OM-2017, the Twelfth International Workshop on Ontology Matching collocated with the 16th International Semantic Web Conference ISWC-2017, October 21st, 2017, Vienna, Austria
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Tensor Train (TT) decomposition [3] is a generalization of SVD decomposition from matrices to tensors (=multidimensional arrays).
It represents a tensor compactly in terms of factors and allows to work with the tensor via its factors without materializing the tensor itself.
For example, we can find the elementwise product of two TT-tensors of size 2^100 and get the result in the TT-format as well.
In the talk, we will show how Tensor Train decomposition can be used to represent parameters of neural networks [1] and polynomial models [2].
This parametrization allows exponentially many 'virtual' parameters while working only with small factors of the TT-format.
To train the model, i.e. optimize the objective subject to the constraint that the parameters are in the TT-format, [2] uses stochastic Riemannian optimization.
[1] Novikov, A., Podoprikhin, D., Osokin, A., & Vetrov, D. P. (2015). Tensorizing neural networks. In Advances in Neural Information Processing Systems.
[2] Novikov, A., Trofimov, M., & Oseledets, I. (2016). Tensor Train polynomial models via Riemannian optimization. arXiv:1605.03795.
[3] Oseledets, I. (2011). Tensor-train decomposition. SIAM Journal on Scientific Computing.
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group at MIT between 2009-2010. She is the recipient of the Microsoft Faculty Fellowship, ARO Young Investigator Award, NSF CAREER Award, and IBM Fran Allen PhD fellowship.
Applications of community detection in bibliometric network analysisNees Jan van Eck
In this talk, we focus on the analysis of bibliometric networks, and in particular on the detection of communities in these networks. We start by demonstrating VOSviewer, a popular software tool for visualizing bibliometric networks. We discuss the techniques used by VOSviewer for visualizing bibliometric networks and for detecting communities in these networks. We pay special attention to the close relationship between visualization and community detection, and we discuss the unified approach to visualization and community detection that is implemented in VOSviewer. We then shift our attention to community detection in very large citation networks, including millions of publications and hundreds of millions of citation relations. We show how community detection techniques can be used to construct highly detailed classification systems of science. We also discuss applications of such classification systems to science policy questions. Finally, we demonstrate CitNetExplorer, a new software tool in which community detection techniques are used to support the large-scale analysis of citation networks. We use CitNetExplorer to analyze the citation network of publications on network science and in particular on community detection.
This talk is a new update based on some of our recent results on doing Tall and Skinny QRs in MapReduce. In particular, the "fast" iterative refinement approximation based on a sample is new.
Community detection from research papers (AAN dataset) using the algorithms:
K-Means
Louvain
Newman-Girvan
github link to code: https://goo.gl/CXej44
github link to project web page: http://goo.gl/7OOkhI
youtube link to video:https://goo.gl/SCpamf
dropbox link to ppt report video: https://goo.gl/cgACzU
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
In this talk, we describe our recent work in the analysis of Twitter-based network graphs, including the Ebola crisis in 2014 and the stock market in 2015.
A history of PageRank from the numerical computing perspectiveDavid Gleich
We'll survey some of the underlying ideas from Google's PageRank algorithm along the lines of Massimo Franceschet's CACM history.
There are some slight liberties I've taken to make it more accessible.
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
A talk at the SIMONS workshop on Parallel and Distributed Algorithms for Inference and Optimization on how to do tall-and-skinny QR factorizations on MapReduce using a communication avoiding algorithm.
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
A talk I gave at the annual meeting for the MetroNY section of the MAA about how Google works from a link-ranking perspective. (http://sections.maa.org/metrony/)
Based on a talk by Margot Gerritsen (which used elements from another talk I gave years ago, yay co-author improvements!)
Fast relaxation methods for the matrix exponential David Gleich
The matrix exponential is a matrix computing primitive used in link prediction and community detection. We describe a fast method to compute it using relaxation on a large linear system of equations. This enables us to compute a column of the matrix exponential is sublinear time, or under a second on a standard desktop computer.
Slides from our PacificVis 2015 presentation.
The paper tackles the problems of the “giant hairballs”, the dense and tangled structures often resulting from visualiza- tion of large social graphs. Proposed is a high-dimensional rotation technique called AGI3D, combined with an ability to filter elements based on social centrality values. AGI3D is targeted for a high-dimensional embedding of a social graph and its projection onto 3D space. It allows the user to ro- tate the social graph layout in the high-dimensional space by mouse dragging of a vertex. Its high-dimensional rotation effects give the user an illusion that he/she is destructively reshaping the social graph layout but in reality, it assists the user to find a preferred positioning and direction in the high- dimensional space to look at the internal structure of the social graph layout, keeping it unmodified. A prototype im- plementation of the proposal called Social Viewpoint Finder is tested with about 70 social graphs and this paper reports four of the analysis results.
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
Presentation given on VLDB 2016: 42nd International Conference on Very Large Data Bases.
Paper: http://dx.doi.org/10.14778/2977797.2977800
ArXiv: https://arxiv.org/abs/1605.05219
Poster: https://zenodo.org/record/61653 (doi 10.5281/zenodo.61653)
Gumbo Software: https://github.com/JonnyDaenen/Gumbo
Abstract
While services such as Amazon AWS make computing power abundantly available, adding more computing nodes can incur high costs in, for instance, pay-as-you-go plans while not always significantly improving the net running time (aka wall-clock time) of queries. In this work, we provide algorithms for parallel evaluation of SGF queries in MapReduce that optimize total time, while retaining low net time. Not only can SGF queries specify all semi-join reducers, but also more expressive queries involving disjunction and negation. Since SGF queries can be seen as Boolean combinations of (potentially nested) semi-joins, we introduce a novel multi-semi-join (MSJ) MapReduce operator that enables the evaluation of a set of semi-joins in one job. We use this operator to obtain parallel query plans for SGF queries that outvalue sequential plans w.r.t. net time and provide additional optimizations aimed at minimizing total time without severely affecting net time. Even though the latter optimizations are NP-hard, we present effective greedy algorithms. Our experiments, conducted using our own implementation Gumbo on top of Hadoop, confirm the usefulness of parallel query plans, and the effectiveness and scalability of our optimizations, all with a significant improvement over Pig and Hive.
Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Florent Renucci
(General) To retrieve a clean dataset by deleting outliers.
(Computer Vision) the recovery of a digital image that has been contaminated by additive white Gaussian noise.
We compute polynomial based surrogates for all components of the solution of the Navier-Stokes equation. We compress this surrogate on the fly to reduce cubic computational complexity to almost linear. All these surrogates are used to quantify uncertainties in numerical aerodynamics.
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Christopher Morris
Most state-of-the-art graph kernels only take local graph properties into account, i.e., the kernel is computed with regard to properties of the neighborhood of vertices or other small substructures. On the other hand, kernels that do take global graph properties into account may not scale well to large graph databases. Here we propose to start exploring the space between local and global graph kernels, striking the balance between both worlds. Specifically, we introduce a novel graph kernel based on the k-dimensional Weisfeiler-Lehman algorithm. Unfortunately, the k-dimensional Weisfeiler-Lehman algorithm scales exponentially in k. Consequently, we devise a stochastic version of the kernel with provable approximation guarantees using conditional Rademacher averages. On bounded-degree graphs, it can even be computed in constant time. We support our theoretical results with experiments on several graph classification benchmarks, showing that our kernels often outperform the state-of-the-art in terms of classification accuracies.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Big data matrix factorizations and Overlapping community detection in graphs
1. Big data matrix
factorizations and
Overlapping community
detection in graphs.
David F. Gleich!
Purdue University!
Joint work with Paul
Constantine, Austin Benson,
Jason Lee, Jeremy Templeton,
Yangyang Hou, C. Seshadhri
Joyce Jiyoung Whang, and
Inderjit S. Dhillon, supported by
NSF CAREER 1149756-CCF,
and DOE ASCR award
Code bit.ly/dgleich-codes!
2. 2
A
From tinyimages"
collection
Tall-and-Skinny
matrices
(m ≫ n)
Many rows (like a billion)
A few columns (under 10,000)
regression and!
general linear models!
with many samples!
block iterative methods
panel factorizations
approximate kernel k-means
big-data SVD/PCA!
Used in
David Gleich · Purdue
3. A graphical view of the MapReduce
programming model
David Gleich · Purdue
3
data
Map
data
Map
data
Map
data
Map
key
value
key
value
key
value
key
value
key
value
key
value
()
Shuffle
key
value
value
dataReduce
key
value
value
value
dataReduce
key
value dataReduce
Map tasks read batches of data in
parallel and do some initial filtering
Reduce is often where the
computation happens
Shuffle is a
global comm.
like group-by
or MPIAlltoall
4. PCA of 80,000,000"
images
4/22
A
80,000,000images
1000 pixels
First 16 columns of V as
images
David Gleich · Purdue
Constantine & Gleich, MapReduce 2010.
20 40 60 80 100
0
0.2
0.4
0.6
0.8
1
Principal Components
Fractionofvariance
20 40 60 80 100
0
0.2
0.4
0.6
0.8
1
Principal Components
Fractionofvariance
0
0
0
0
Fractionofvariance
0
0
0
0
Fractionofvariance
Figure 5: The 16 most impo
nent basis functions (by row
5. Regression with 80,000,000
images
The goal was to approx.
how much red there was
in a picture from the
value of the grayscale
pixels only.
We get a measure of
how much “redness”
each pixel contributes to
the whole.
via
time
and
per-
ates
(for
on),
split
file
d by
test
the
r in
final
size
pers
1000
h is
the
hav-
final
the sum of red-pixel values in each image as a linear combi-
nation of the gray values in each image. Formally, if ri is the
sum of the red components in all pixels of image i, and Gi,j
is the gray value of the jth pixel in image i, then we wanted
to find min
q
i
(ri ≠
q
j
Gi,jsj)2
. There is no particular im-
portance to this regression problem, we use it merely as a
demonstration.
The coe cients sj are dis-
played as an image at the right.
They reveal regions of the im-
age that are not as important
in determining the overall red
component of an image. The
color scale varies from light-
blue (strongly negative) to blue
(0) and red (strongly positive).
The computation took 30 min-
utes using the Dumbo frame-
work and a two-iteration job with 250 intermediate reducers.
We also solved a principal component problem to find a
principal component basis for each image. Let G be matrix
of Gi,j’s from the regression and let ui be the mean of the ith
A
80,000,000images
1000 pixels
David Gleich · Purdue
5
6. Models and algorithms for high performance !
matrix and network computations
David Gleich · Purdue
6
1
error
1
std
0
2
(b) Std, s = 0.39 cm
10
error
0
0
10
std
0
20
(d) Std, s = 1.95 cm
model compared to the prediction standard de-
bble locations at the final time for two values of
= 1.95 cm. (Colors are visible in the electronic
approximately twenty minutes to construct using
s.
ta involved a few pre- and post-processing steps:
m Aria, globally transpose the data, compute the
nd errors. The preprocessing steps took approx-
recise timing information, but we do not report
Tensor eigenvalues"
and a power method
FIGURE 6 – Previous work
from the PI tackled net-
work alignment with ma-
trix methods for edge
overlap:
i
j j0
i0
OverlapOverlap
A L B
This proposal is for match-
ing triangles using tensor
methods:
j
i
k
j0
i0
k0
TriangleTriangle
A L B
t
r
o
s.
g
n.
o
n
s
s-
g
maximize
P
ijk Tijk xi xj xk
subject to kxk2 = 1
where ! ensures the 2-norm
[x(next)
]i = ⇢ · (
X
jk
Tijk xj xk + xi )
SSHOPM method due to "
Kolda and Mayo
Big data methods
SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12
Network alignment
ICDM ‘09, SC ‘11, TKDE ‘13
Fast & Scalable"
Network centrality
SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …
Data clustering
WSDM ‘12, KDD ‘12, CIKM ’13 …
Ax = b
min kAx bk
Ax = x
Massive matrix "
computations
on multi-threaded
and distributed
architectures
7. PCA of 80,000,000"
images
7/22
A
80,000,000images
1000 pixels
X
MapReduce Post Processing
Zero"
mean"
rows
TSQR
R
SVD
V
First 16
columns
of V as
images
Top 100
singular
values
(principal
components)
David Gleich · Purdue
Constantine & Gleich, MapReduce 2010.
8. Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute colsum( A ) 161 sec.
Time to compute R in qr( A ) 387 sec.
David Gleich · Purdue
8
9. How to store tall-and-skinny
matrices in Hadoop
David Gleich · Purdue
9
A1
A4
A2
A3
A4
A : m x n, m ≫ n
Key is an arbitrary row-id
Value is the 1 x n array "
for a row (or b x n block)
Each submatrix Ai is an "
the input to a map task.
10. 10
0
10
5
10
10
10
15
10
20
10
−15
10
−10
10
−5
10
0
10
5
Numerical stability was a
problem for prior approaches
10
Condition number
norm(QTQ–I)
AR-1
AR-1 + "
iterative refinement
4. Direct TSQR
Benson, Gleich, "
Demmel, BigData’13
Prior work
1. Constantine & Gleich,
MapReduce 2011
2. Benson, Gleich,
Demmel, BigData’13
Previous methods
couldn’t ensure
that the matrix Q
was orthogonal
David Gleich · Purdue
3. Benson, Gleich,
Demmel, BigData’13
11. A1
A2
A3
A1
A2
qr
Q2 R2
A3
qr
Q3 R3
A4
qr
Q4A4
R4
emit
A5
A6
A7
A5
A6
qr
Q6 R6
A7
qr
Q7 R7
A8
qr
Q8A8
R8
emit
Mapper 1
Serial TSQR
R4
R8
Mapper 2
Serial TSQR
R4
R8
qr
Q
emit
R
Reducer 1
Serial TSQR
Algorithm
Data Rows of a matrix
Map QR factorization of rows
Reduce QR factorization of rows
Communication avoiding QR (Demmel et al. 2008) "
on MapReduce (Constantine and Gleich, 2011)
11
David Gleich · Purdue
12. More about how to "
compute a regression
A
min kAx bk2
= min
X
i
(
X
j
Aij xj bi )2
b
A1
A2
A3
A1
A2
qr Q2
R2
A3
qr
A4
Mapper 1
Serial TSQR
b2 = Q2
T b1
b1
David Gleich · Purdue
12
13. Too many maps cause too
much data to one reducer!
Each image is 5k.
Each HDFS block has "
12,800 images.
6,250 total blocks.
Each map outputs "
1000-by-1000 matrix
One reducer gets a 6.25M-
by-1000 matrix (50GB)
David Gleich · Purdue
13
14. Too many maps cause too
much data to one reducer!
S(1)
A
A1
A2
A3
A3
R1
map
Mapper 1-1
Serial TSQR
A2
emit
R2
map
Mapper 1-2
Serial TSQR
A3
emit
R3
map
Mapper 1-3
Serial TSQR
A4
emit
R4
map
Mapper 1-4
Serial TSQR
shuffle
S1
A2
reduce
Reducer 1-1
Serial TSQR
S2
R2,2
reduce
Reducer 1-2
Serial TSQR
R2,1
emit
emit
emit
shuffle
A2S3
R2,3
reduce
Reducer 1-3
Serial TSQR
emit
Iteration 1 Iteration 2
identitymap
A2S(2)
Rreduce
Reducer 2-1
Serial TSQR
emit
David Gleich · Purdue
14
15. The rest of the talk"
Full TSQR code in hadoopy
15
David Gleich · Purdue
import random, numpy, hadoopy
class SerialTSQR:
def __init__(self,blocksize,isreducer):
self.bsize=blocksize
self.data = []
if isreducer: self.__call__ = self.reducer
else: self.__call__ = self.mapper
def compress(self):
R = numpy.linalg.qr(
numpy.array(self.data),'r')
# reset data and re-initialize to R
self.data = []
for row in R:
self.data.append([float(v) for v in row])
def collect(self,key,value):
self.data.append(value)
if len(self.data)>self.bsize*len(self.data[0]):
self.compress()
def close(self):
self.compress()
for row in self.data:
key = random.randint(0,2000000000)
yield key, row
def mapper(self,key,value):
self.collect(key,value)
def reducer(self,key,values):
for value in values: self.mapper(key,value)
if __name__=='__main__':
mapper = SerialTSQR(blocksize=3,isreducer=False)
reducer = SerialTSQR(blocksize=3,isreducer=True)
hadoopy.run(mapper, reducer)
16. Non-negative matrix
factorization
David Gleich · Purdue
16
(b) NMF (c) Manifold Learning
xy
z
xy
Projection on 1st NNF
2ndNNF
First manifold parameter
Second
Find W, H 0
where A ⇡ WH
NMF !
Separable NMF!
Find H 0, A(:, K)
where A ⇡ A(:, K)H
17. There are good algorithms for
separable NMF that avoid
alternating between W, H.
David Gleich · Purdue
17
Find W, H 0
where A ⇡ WH
NMF ! Separable NMF!
Find H 0, A(:, K)
where A ⇡ A(:, K)H
18. Separable NMF algorithms
1. Find the columns of A.
2. Find the values of W.
David Gleich · Purdue
18
(b) NMF (c) Manifold Learning
xy
z
x
y
NNF
cond
Separable NMF!
Find H 0, A(:, K)
where A ⇡ A(:, K)H
19. Separable NMF algorithms
are really geometry
1. Find the columns of A. "
Equiv. to “Find the extreme
points of a convex set.”
2. These are preserved under
linear transformations
David Gleich · Purdue
19
(b) NMF (c) Manifold Learning
xy
z
x
y
NNF
cond
Separable NMF!
Find H 0, A(:, K)
where A ⇡ A(:, K)H
20. We use our tall-and-skinny QR
to get a orthogonal
transformation to make the
problem easily solvable.
David Gleich · Purdue
20
21. David Gleich · Purdue
21
A U
S VT
SVD
NMF
AK
H
1. Compute QR using
TSQR method
2. Run a separable NMF
method on SVT
3. Find H by solving a
small non-negative
least-squares problem
in each column. These
are tiny.
22. All of the hard analysis is on
the small dimension of the
matrix, which makes this very
useful in practice.
David Gleich · Purdue
22
23. Our methods vs. the
competition
David Gleich · Purdue
23
Figure 1: Relative error in the separable factoriza-
ion as a function of nonnegative rank (r) for the
hree algorithms. The matrix was synthetically gen-
erated to be separable. SPA and GP capture all of
he true extreme columns when r = 20 (where the
esidual is zero). Since we are using the greedy vari-
Figure 2: First 20 extreme columns selected by
XRAY, and GP along with the true column
in the synthetic matrix generation. A mar
present for a given column index if and only
column is a selected extreme column. SPA an
capture all of the true extreme columns. Sin
gure 1: Relative error in the separable factoriza-
n as a function of nonnegative rank (r) for the
ree algorithms. The matrix was synthetically gen-
ated to be separable. SPA and GP capture all of
e true extreme columns when r = 20 (where the
idual is zero). Since we are using the greedy vari-
t of XRAY, it takes r = 21 to capture all of the
Figure 2: First 20 extreme columns selected by SPA,
XRAY, and GP along with the true columns used
in the synthetic matrix generation. A marker is
present for a given column index if and only if that
column is a selected extreme column. SPA and GP
capture all of the true extreme columns. Since we
are using the greedy variant of XRAY, it does se-
200 million rows, 200 columns, separation rank 20.
24. David Gleich · Purdue
24
Nonlinear heat transfer model in
random media
Each run takes 5 hours on 8 processors,
outputs 4M (node) by 9 (time-step) simulation
We did 8192 runs (128 samples of
bubble locations, 64 bubble radii)
4.5 TB of data in Exodus II (NetCDF)
Applyheat
Lookattemperature
https://www.opensciencedatacloud.org/
publicdata/heat-transfer/
26. David Gleich · Purdue
26
A
Each simulation is a column
5B-by-64 matrix
2.2TB
U
S VT
SVD
NMF
AK
H
Run a “standard” NMF "
algorithm on SVT
27. David Gleich · Purdue
27
Figure 9: Coe cient matrix H for SPA, XRAY, and GP for the heat transfer simulation data when r = 10. In
all cases, the non-extreme columns are conic combinations of two of the selected columns, i.e., each column
n H has at most two non-zero values. Specifically, the non-extreme columns are conic combinations of the
two extreme columns that “sandwich” them in the matrix. See Figure 10 for a closer look at the coe cients.
Figure 9: Coe cient matrix H for SPA, XRAY, and GP for the heat transfer simulation data when r = 10. In
all cases, the non-extreme columns are conic combinations of two of the selected columns, i.e., each column
in H has at most two non-zero values. Specifically, the non-extreme columns are conic combinations of the
two extreme columns that “sandwich” them in the matrix. See Figure 10 for a closer look at the coe cients.
Figure 8: First 10 extreme columns selected by SPA,
XRAY, and GP for the heat transfer simulation
Figure 10: Value of H matrix for columns 1 through
34 for the SPA algorithm on the heat transfer sim-
33. We can find communities using
Personalized PageRank (PPR)
[Andersen et al. 2006]
PPR is a Markov chain on nodes
1. with probability 𝛼, ", "
follow a random edge
2. with probability 1-𝛼, ", "
restart at a seed
aka random surfer
aka random walk with restart
unique stationary distribution
David Gleich · Purdue
33
34. Personalized PageRank
community detection
1. Given a seed, approximate the
stationary distribution.
2. Extract the community.
Both are local operations.
David Gleich · Purdue
34
35. Conductance communities
Conductance is one of the most
important community scores [Schaeffer07]
The conductance of a set of vertices is
the ratio of edges leaving to total edges:
Equivalently, it’s the probability that a
random edge leaves the set.
Small conductance ó Good community
(S) =
cut(S)
min vol(S), vol( ¯S)
(edges leaving the set)
(total edges
in the set)
David Gleich · Purdue
cut(S) = 7
vol(S) = 33
vol( ¯S) = 11
(S) = 7/11
35
37. # G is graph as dictionary-of-sets!
alpha=0.99!
tol=1e-4!
!
x = {} # Store x, r as dictionaries!
r = {} # initialize residual!
Q = collections.deque() # initialize queue!
for s in seed: !
r(s) = 1/len(seed)!
Q.append(s)!
while len(Q) > 0:!
v = Q.popleft() # v has r[v] > tol*deg(v)!
if v not in x: x[v] = 0.!
x[v] += (1-alpha)*r[v]!
mass = alpha*r[v]/(2*len(G[v])) !
for u in G[v]: # for neighbors of u!
if u not in r: r[u] = 0.!
if r[u] < len(G[u])*tol and !
r[u] + mass >= len(G[u])*tol:!
Q.append(u) # add u to queue if large!
r[u] = r[u] + mass!
r[v] = mass*len(G[v]) !
David Gleich · Purdue
37
39. Whang-Gleich-Dhillon,
CIKM2013 [upcoming…]
1. Extract part of the graph that might have
overlapping communities.
2. Compute a partitioning of the network into
many pieces (think sqrt(n)) using Graclus.
3. Find the center of these partitions.
4. Use PPR to grow egonets of these centers.
David Gleich · Purdue
39
40. Student Version of MATLAB
(a) AstroPh
0 10 20 30 40 50 60 70 80 90 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Coverage (percentage)
MaximumConductance
egonet
graclus centers
spread hubs
random
bigclam
(d) Flickr
Flickr social
network
2M vertices"
22M edges
We can cover
95% of network
with communities
of cond. ~0.15.
David Gleich · Purdue
A good partitioning helps"
40
flickr sample - 2M verts, 22M edges
41. F1 F2
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
DBLP
demon
bigclam
graclus centers
spread hubs
random
egonet
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
Figure 3: F1 and F2 measures comparing our algorithmic co
indicates better communities.
Run time Our seed
Using datasets from "
Yang and Leskovec
(WDSM 2013) with
known overlapping
community structure
Our method outperform
current state of the art
overlapping community
detection methods. "
Even randomly seeded!
David Gleich · Purdue
And helps to find real-world
overlapping communities too.
41
42. Seed Set Expansion
Carefully select seeds
Greedily expand communities around the seed sets
The algorithm
Filtering Phase
Seeding Phase
Seed Set Expansion Phase
Propagation Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (8/44)
David Gleich · Purdue
42
43. David Gleich · Purdue
43
Filtering Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (9/44)
Filtering Phase
44. David Gleich · Purdue
44
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (16/44)
Seed Set Expansion Phase
Run clustering,
and choose
centers or pick
an independent
set of high
degree nodes
Run
personalized
PageRank
45. David Gleich · Purdue
45
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (28/44)
Propagation Phase
Joyce Jiyoung Whang, The University of Texas at Austin Conference on Information and Knowledge Management (30/44)
We can prove that this only
improves the objective
46. Conclusion & Discussion &
PPR community detection is fast "
[Andersen et al. FOCS06]
PPR communities look real "
[Abrahao et al. KDD2012; Zhu et al. ICML2013]
Partitioning for seeding yields "
high coverage & real communities.
“Caveman” communities?!
!
!
!
David Gleich · Purdue
46
Gleich & Seshadhri
KDD2012
Whang, Gleich & Dhillon
CIKM2013
PPR Sample !
bit.ly/18khzO5!
!
Egonet seeding
bit.ly/dgleich-code!
References
Best conductance cut
at intersection of
communities?