This document summarizes a research paper on a multi-threaded algorithm for computing augmented contour trees from scalar field data. The key points are:
1) A simple parallel algorithm is proposed that partitions the scalar field domain, computes local contour trees in parallel on each partition, and stitches the local trees together.
2) The algorithm was implemented in C++ using VTK for generic input/output support. It achieves good parallel efficiency on workstations with up to 8 threads.
3) Experimental results show the parallel implementation outperforms sequential state-of-the-art methods for large data sets and provides interactive analysis capabilities for topological data exploration and visualization applications.
Implementation of Thorup's Linear Time Algorithm for Undirected Single Source...Nick Pruehs
Mikkel Thorup found the first deterministic algorithm to solve the classic single-source shortest paths problem for undirected graphs with positive integer weights in linear time and space. The algorithm requires a hierarchical bucketing structure for identifying the order the vertices have to be visited in without breaking this time bound, thus avoiding the sorting bottleneck of the algorithm proposed by Dijkstra in 1959.
Implementation of Thorup's Linear Time Algorithm for Undirected Single Source...Nick Pruehs
Mikkel Thorup found the first deterministic algorithm to solve the classic single-source shortest paths problem for undirected graphs with positive integer weights in linear time and space. The algorithm requires a hierarchical bucketing structure for identifying the order the vertices have to be visited in without breaking this time bound, thus avoiding the sorting bottleneck of the algorithm proposed by Dijkstra in 1959.
Algorithmic Techniques for Parametric Model RecoveryCurvSurf
A complete description of algorithmic techniques for automatic feature extraction from point cloud. The orthogonal distance fitting, an art of maximum liklihood estimation, plays the main role. Differential geometry determines the type of object surface.
ChatGPT
Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It involves applying various techniques and methods to extract insights from data sets, often with the goal of uncovering patterns, trends, relationships, or making predictions.
Here's an overview of the key steps and techniques involved in data analysis:
Data Collection: The first step in data analysis is gathering relevant data from various sources. This can include structured data from databases, spreadsheets, or surveys, as well as unstructured data such as text documents, social media posts, or sensor readings.
Data Cleaning and Preprocessing: Once the data is collected, it often needs to be cleaned and preprocessed to ensure its quality and suitability for analysis. This involves handling missing values, removing duplicates, addressing inconsistencies, and transforming data into a suitable format for analysis.
Exploratory Data Analysis (EDA): EDA involves examining and understanding the data through summary statistics, visualizations, and statistical techniques. It helps identify patterns, distributions, outliers, and potential relationships between variables. EDA also helps in formulating hypotheses and guiding further analysis.
Data Modeling and Statistical Analysis: In this step, various statistical techniques and models are applied to the data to gain deeper insights. This can include descriptive statistics, inferential statistics, hypothesis testing, regression analysis, time series analysis, clustering, classification, and more. The choice of techniques depends on the nature of the data and the research questions being addressed.
Data Visualization: Data visualization plays a crucial role in data analysis. It involves creating meaningful and visually appealing representations of data through charts, graphs, plots, and interactive dashboards. Visualizations help in communicating insights effectively and spotting trends or patterns that may be difficult to identify in raw data.
Interpretation and Conclusion: Once the analysis is performed, the findings need to be interpreted in the context of the problem or research objectives. Conclusions are drawn based on the results, and recommendations or insights are provided to stakeholders or decision-makers.
Reporting and Communication: The final step is to present the results and findings of the data analysis in a clear and concise manner. This can be in the form of reports, presentations, or interactive visualizations. Effective communication of the analysis results is crucial for stakeholders to understand and make informed decisions based on the insights gained.
Data analysis is widely used in various fields, including business, finance, marketing, healthcare, social sciences, and more. It plays a crucial role in extracting value from data, supporting evidence-based decision-making, and driving actionable insig
Ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. The Super Learner algorithm, also called "stacking", learns the optimal combination of the base learner fits. The latest version of H2O now contains a "Stacked Ensemble" method, which allows the user to stack H2O models into a Super Learner. The Stacked Ensemble method is the the native H2O version of stacking, previously only available in the h2oEnsemble R package, and now enables stacking from all the H2O APIs: Python, R, Scala, etc.
Erin is a Statistician and Machine Learning Scientist at H2O.ai. Before joining H2O, she was the Principal Data Scientist at Wise.io (acquired by GE Digital) and Marvin Mobile Security (acquired by Veracode) and the founder of DataScientific, Inc. Erin received her Ph.D. from University of California, Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing.
Beyond EXPLAIN: Query Optimization From Theory To CodeYuto Hayamizu
EXPLAIN is too much explained. Let's go "beyond EXPLAIN".
This talk will take you to an optimizer backstage tour: from theoretical background of state-of-the-art query optimization to close look at current implementation of PostgreSQL.
Nafems15 Technical meeting on system modelingSDTools
This presentation illustrates the main mechanisms of model reduction used in generating efficient system models that can be used in vibration design. Examples from automotive, aeronautics and train industries are used as illustrations.
The concept of talk is as follows: - to give a general idea about user segmentation task in DMP project and how solving this problem helps our business - to tell how we use autoML to solve this task and to explain its components - to give insights about techniques we apply to make our pipeline fast and stable on huge datasets
Automated machine learning lectures given at the Advanced Course on Data Science & Machine Learning. AutoML, hyperparameter optimization, Bayesian optimization, Neural Architecture Search, Meta-learning, MAML
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
In model-driven software engineering (MDE), model queries are core technologies of many tool and transformation-specific challenges such as design rule validation, model synchronization, view maintenance, simulation and many more. As software models are rapidly increasing in size and complexity, traditional MDE tools frequently face scalability issues that decrease productivity of engineers and increase development costs. Incremental graph queries offer a graph pattern based language for capturing queries. Furthermore, the result set of a query is cached and incrementally maintained upon model changes to provide instantaneous query response time. In this talk, first a brief overview is given on the EMF-IncQuery framework (which is an official Eclipse subproject). Then we discuss how to incorporate incremental queries over a distributed cloud infrastructure (to scale up from a single-node tool to a cluster of nodes) deployed over popular database back-ends (such as Cassandra. 4store, Neo4J, etc). We present our first benchmarking experiments with IncQuery-D to highlight that distributed incremental model queries can perform significantly better than the native query technologies of the underlying database back-end, especially, for complex queries.
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. This is talk is about *the* Moore's Law, the bull that the other "Laws" ride; and how Python-powered ML helps drive it. How do we keep making ever-smaller devices? How do we harness atomic-scale physics? Large-scale machine learning is key. The computation drives new chip designs, and those new chip designs are used for new computations, ad infinitum. High-dimensional regression, classification, active learning, optimization, ranking, clustering, density estimation, scientific visualization, massively parallel processing -- it all comes into play, and Python is powering it all.
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
Apache Tajo is an open source big data warehouse system on Hadoop. This slide shows two high-tech efforts for performance improvement in Tajo project. First one is query optimization including cost-based join order and progressive optimization. The second effort is JIT-based vectorized processing.
Topology Optimization
Topology optimization is concerned with material distribution and how the members within a structure are connected. It treats the “equivalent density” of each element as a design variable.
The solver calculates an equivalent density for each element, where 1 is equivalent to 100% material, while 0 is equivalent to no material in the element. The solver then seeks to assign elements that have a low stress value a lower equivalent density before analyzing the effect on the remaining structure. In this way extraneous elements tend towards a density of 0, with the optimum design tending towards 1. As a designer, you will need to exercise your judgment. For example, you may decide that you will omit material from all (finite) elements whose density is less than 0.3 (or 30%). Using an iso-plot of element densities helps to visualize the “remaining” structure as elements with a density below this threshold can be masked leaving behind the optimum design. Then you will need to take this geometry back to your CAD modeler, smooth it out (that is, use geometrically regular edges or surfaces, etc.) and re-evaluate the design for stresses, displacements, frequencies etc..
University Course "Micro and nano systems" for Master Degree in Biomedical Engineering at University of Pisa. Topic: Software for additive manufacturing (part1)
Algorithmic Techniques for Parametric Model RecoveryCurvSurf
A complete description of algorithmic techniques for automatic feature extraction from point cloud. The orthogonal distance fitting, an art of maximum liklihood estimation, plays the main role. Differential geometry determines the type of object surface.
ChatGPT
Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It involves applying various techniques and methods to extract insights from data sets, often with the goal of uncovering patterns, trends, relationships, or making predictions.
Here's an overview of the key steps and techniques involved in data analysis:
Data Collection: The first step in data analysis is gathering relevant data from various sources. This can include structured data from databases, spreadsheets, or surveys, as well as unstructured data such as text documents, social media posts, or sensor readings.
Data Cleaning and Preprocessing: Once the data is collected, it often needs to be cleaned and preprocessed to ensure its quality and suitability for analysis. This involves handling missing values, removing duplicates, addressing inconsistencies, and transforming data into a suitable format for analysis.
Exploratory Data Analysis (EDA): EDA involves examining and understanding the data through summary statistics, visualizations, and statistical techniques. It helps identify patterns, distributions, outliers, and potential relationships between variables. EDA also helps in formulating hypotheses and guiding further analysis.
Data Modeling and Statistical Analysis: In this step, various statistical techniques and models are applied to the data to gain deeper insights. This can include descriptive statistics, inferential statistics, hypothesis testing, regression analysis, time series analysis, clustering, classification, and more. The choice of techniques depends on the nature of the data and the research questions being addressed.
Data Visualization: Data visualization plays a crucial role in data analysis. It involves creating meaningful and visually appealing representations of data through charts, graphs, plots, and interactive dashboards. Visualizations help in communicating insights effectively and spotting trends or patterns that may be difficult to identify in raw data.
Interpretation and Conclusion: Once the analysis is performed, the findings need to be interpreted in the context of the problem or research objectives. Conclusions are drawn based on the results, and recommendations or insights are provided to stakeholders or decision-makers.
Reporting and Communication: The final step is to present the results and findings of the data analysis in a clear and concise manner. This can be in the form of reports, presentations, or interactive visualizations. Effective communication of the analysis results is crucial for stakeholders to understand and make informed decisions based on the insights gained.
Data analysis is widely used in various fields, including business, finance, marketing, healthcare, social sciences, and more. It plays a crucial role in extracting value from data, supporting evidence-based decision-making, and driving actionable insig
Ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity and training time. The Super Learner algorithm, also called "stacking", learns the optimal combination of the base learner fits. The latest version of H2O now contains a "Stacked Ensemble" method, which allows the user to stack H2O models into a Super Learner. The Stacked Ensemble method is the the native H2O version of stacking, previously only available in the h2oEnsemble R package, and now enables stacking from all the H2O APIs: Python, R, Scala, etc.
Erin is a Statistician and Machine Learning Scientist at H2O.ai. Before joining H2O, she was the Principal Data Scientist at Wise.io (acquired by GE Digital) and Marvin Mobile Security (acquired by Veracode) and the founder of DataScientific, Inc. Erin received her Ph.D. from University of California, Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing.
Beyond EXPLAIN: Query Optimization From Theory To CodeYuto Hayamizu
EXPLAIN is too much explained. Let's go "beyond EXPLAIN".
This talk will take you to an optimizer backstage tour: from theoretical background of state-of-the-art query optimization to close look at current implementation of PostgreSQL.
Nafems15 Technical meeting on system modelingSDTools
This presentation illustrates the main mechanisms of model reduction used in generating efficient system models that can be used in vibration design. Examples from automotive, aeronautics and train industries are used as illustrations.
The concept of talk is as follows: - to give a general idea about user segmentation task in DMP project and how solving this problem helps our business - to tell how we use autoML to solve this task and to explain its components - to give insights about techniques we apply to make our pipeline fast and stable on huge datasets
Automated machine learning lectures given at the Advanced Course on Data Science & Machine Learning. AutoML, hyperparameter optimization, Bayesian optimization, Neural Architecture Search, Meta-learning, MAML
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
In model-driven software engineering (MDE), model queries are core technologies of many tool and transformation-specific challenges such as design rule validation, model synchronization, view maintenance, simulation and many more. As software models are rapidly increasing in size and complexity, traditional MDE tools frequently face scalability issues that decrease productivity of engineers and increase development costs. Incremental graph queries offer a graph pattern based language for capturing queries. Furthermore, the result set of a query is cached and incrementally maintained upon model changes to provide instantaneous query response time. In this talk, first a brief overview is given on the EMF-IncQuery framework (which is an official Eclipse subproject). Then we discuss how to incorporate incremental queries over a distributed cloud infrastructure (to scale up from a single-node tool to a cluster of nodes) deployed over popular database back-ends (such as Cassandra. 4store, Neo4J, etc). We present our first benchmarking experiments with IncQuery-D to highlight that distributed incremental model queries can perform significantly better than the native query technologies of the underlying database back-end, especially, for complex queries.
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. This is talk is about *the* Moore's Law, the bull that the other "Laws" ride; and how Python-powered ML helps drive it. How do we keep making ever-smaller devices? How do we harness atomic-scale physics? Large-scale machine learning is key. The computation drives new chip designs, and those new chip designs are used for new computations, ad infinitum. High-dimensional regression, classification, active learning, optimization, ranking, clustering, density estimation, scientific visualization, massively parallel processing -- it all comes into play, and Python is powering it all.
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
Apache Tajo is an open source big data warehouse system on Hadoop. This slide shows two high-tech efforts for performance improvement in Tajo project. First one is query optimization including cost-based join order and progressive optimization. The second effort is JIT-based vectorized processing.
Topology Optimization
Topology optimization is concerned with material distribution and how the members within a structure are connected. It treats the “equivalent density” of each element as a design variable.
The solver calculates an equivalent density for each element, where 1 is equivalent to 100% material, while 0 is equivalent to no material in the element. The solver then seeks to assign elements that have a low stress value a lower equivalent density before analyzing the effect on the remaining structure. In this way extraneous elements tend towards a density of 0, with the optimum design tending towards 1. As a designer, you will need to exercise your judgment. For example, you may decide that you will omit material from all (finite) elements whose density is less than 0.3 (or 30%). Using an iso-plot of element densities helps to visualize the “remaining” structure as elements with a density below this threshold can be masked leaving behind the optimum design. Then you will need to take this geometry back to your CAD modeler, smooth it out (that is, use geometrically regular edges or surfaces, etc.) and re-evaluate the design for stresses, displacements, frequencies etc..
University Course "Micro and nano systems" for Master Degree in Biomedical Engineering at University of Pisa. Topic: Software for additive manufacturing (part1)
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
1. Contour Forests: Fast Multi-threaded
Augmented Contour Trees
Charles Gueunet, UPMC and Kitware
Pierre Fortin, UPMC
Julien Jomier, Kitware
Julien Tierny, UPMC
2. • Topological analysis: classical approach in Sci Viz
• Contour Trees, Reeb Graphs, Morse-Smale Complexes...
Introduction
[DoraiswamyTVCG13] [LandgeSC14] [MaadasamyHiPC12]
Context
Related Works
Challenges
Contributions
3. • Topological analysis: classical approach in Sci Viz
• Contour Trees, Reeb Graphs, Morse-Smale Complexes...
• Increasing data size and complexity
• Challenge for interactive exploration
• Multi-core architectures are common
Introduction
[DoraiswamyTVCG13] [LandgeSC14] [MaadasamyHiPC12]
Context
Related Works
Challenges
Contributions
Motivation for multi-threaded parallelism
11. • Topological data analysis algorithms
• Intrinsically sequential approaches
• Challenging parallelization
Introduction
Context
Related Works
Challenges
Contributions
12. • Topological data analysis algorithms
• Intrinsically sequential approaches
• Challenging parallelization
• Contour Tree:
• No complete parallelization (only subroutines)
• No efficient parallel algorithm for augmented trees
Introduction
Context
Related Works
Challenges
Contributions
13. • Efficient multi-threaded algorithm for contour tree computation
• Simple approach
• Good parallel efficiency on workstations
Introduction
Context
Related Works
Challenges
Contributions
14. • Efficient multi-threaded algorithm for contour tree computation
• Simple approach
• Good parallel efficiency on workstations
• Ready-to-use VTK-based C++ implementation
• Generic input (VTU/VTI, 2D/3D)
• Generic output (augmented trees)
Introduction
Context
Related Works
Challenges
Contributions
21. 1) Sort vertices
2) Create partitions using level sets
3) Compute local trees
4) Stitch local trees
Main steps of our approach:
Preliminaries
Background
Overview
22. 1) Sort vertices
2) Create partitions using level sets
3) Compute local trees
4) Stitch local trees
• Simple approach
• Local computation of CT
• Augmented trees
• Simple stitching of the forests
Main steps of our approach:
Preliminaries
Background
Overview Pros:
23. 1) Sort vertices
2) Create partitions using level sets
3) Compute local trees
4) Stitch local trees
• Simple approach
• Local computation of CT
• Augmented trees
• Simple stitching of the forests
Main steps of our approach:
Preliminaries
Background
Overview Pros:
• Depends on interface level sets
Cons:
25. • Range driven partitions
• Sorted scalars ⇒ balanced partitions
• Use partitions
Algorithm
Domain partitioning
Local computation
Contour forest stitching
26. • Range driven partitions
• Sorted scalars ⇒ balanced partitions
• Use partitions
Algorithm
Domain partitioning
Local computation
Contour forest stitching
27. • Range driven partitions
• Sorted scalars ⇒ balanced partitions
• Use partitions
Algorithm
Domain partitioning
Local computation
Contour forest stitching
3D Example
28. • Range driven partitions
• Sorted scalars ⇒ balanced partitions
• Use partitions
Algorithm
Domain partitioning
Local computation
Contour forest stitching
3D Example
29. • Range driven partitions
• Sorted scalars ⇒ balanced partitions
• Use partitions
Algorithm
Domain partitioning
Local computation
Contour forest stitching
noise
2D Example 3D Example
30. • Extended partition:
initial partition + boundary vertices
• Correct tree in initial partition
• Visit all edges in parallel
Algorithm
Domain partitioning
Local computation
Contour forest stitching
extended
green + red
Redundant computations:
2D Example 3D Example
31. • In extended partitions
• Using Carr et al. Algorithm:
• Join Tree
• Split Tree
• Combination
• Union-Find [CormenItA01]
Algorithm
Domain partitioning
Local computation
Contour forest stitching
Independent
2D Example 3D Example
32. • Join and Split trees: 2 threads
• Combine: 1 thread
• O(|σi
|×α(|σi
|)) + O(|S|)
• Keep arcs on the boundary (n-h)
Algorithm
Domain partitioning
Local computation
Contour forest stitching
Each partition:
2D Example 3D Example
33. • Simple step
• Visit crossing arcs
• Cut interface arcs
• Segmentation: fast lookup
Algorithm
Domain partitioning
Local computation
Contour forest stitching
2D Example 3D Example
34. • New super-arc:
• Union of the two facing halves
• Once per connected component
of level set (∼ crossing arc)
Algorithm
Domain partitioning
Local computation
Contour forest stitching
2D Example 3D Example
35. • Repeat for each interface
• Fast in practice
Algorithm
Domain partitioning
Local computation
Contour forest stitching
2D Example 3D Example
36. Experimental results
Intel Xeon CPU E5-2630 v3 (2.4 Ghz, 8 cores)
64GB of RAM
C++ (GCC-4.9), VTK, OpenMP (additional material)
44. Exp. Results
Efficiency
Comparison
Speed
Pros & cons
• Open-source reference
implementation
• pTourtre: naive parallel
version
• Always better than
sequential libtourtre for
big data sets
• Better than parallel except
for the foot data set
Libtourtre:
58. The end
Thank you for your attention,
Do you have any question ?
This work is partially supported by the Bpifrance grant “AVIDO” (Programme d’Investissements d’Avenir, reference
P112017-2661376/DOS0021427) and by the French National Association for Research and Technology (ANRT), in the
framework of the LIP6 - Kitware SAS CIFRE partnership reference 2015/1039.
Paper and Code at: http://www-pequan.lip6.fr/~tierny/