발표자: 고윤용(한양대 박사과정)
발표일: 2018.2.
Influence maximization (IM) is the problem of finding a seed set composed of k nodes that maximize their influence spread over a social network. Kempe et al. showed the problem to be NP-hard and proposed a greedy algorithm (referred to as SimpleGreedy) that guarantees 63% influence spread of its optimal solution. However, SimpleGreedy has two performance issues: at a micro level, it estimates the influence spread of a single node by running Monte-Carlo (MC) simulations that are fairly expensive; at a macro level, after selecting one seed at each step, it re-evaluates the influence spread of every node in a social network, leading to significant computational overhead. In this paper, we propose Hybrid-IM that addresses the two issues in both micro and macro levels by combining PB-IM (Path Based Influence Maximization) and CB-IM (Community Based Influence Maximization). Furthermore, we identify two technical issues that could improve the performance of Hybrid-IM more and propose two strategies to address those issues. Through extensive experiments with four real-world datasets, we show that Hybrid-IM achieves great improvement (up to 43 times) in performance over state-of-the-art methods and finds the seed set that provides the influence spread very close to that of the state-of-the-art methods.
High-performance graph analysis is unlocking knowledge in computer security, bioinformatics, social networks, and many other data integration areas. Graphs provide a convenient abstraction for many data problems beyond linear algebra. Some problems map directly to linear algebra. Others, like community detection, look eerily similar to sparse linear algebra techniques. And then there are algorithms that strongly resist attempts at making them look like linear algebra. This talk will cover recent results with an emphasis on streaming graph problems where the graph changes and results need updated with minimal latency. We’ll also touch on issues of sensitivity and reliability where graph analysis needs to learn from numerical analysis and linear algebra.
Algorithm in Social network of graph and social network analysisoliviaclark2905
This document discusses algorithms for analyzing social networks and detecting communities within them. It describes several algorithms for community detection, including minimum-cut methods, hierarchical clustering, Girvan-Newman algorithms, modularity maximization techniques like the Louvain method, statistical inference methods, and clique-based methods. It also covers algorithms for community search to efficiently find the community containing a given query node.
Network-Wide Heavy-Hitter Detection with Commodity SwitchesAJAY KHARAT
Network operators often need to identify outliers in network traffic, to detect attacks or diagnose performance problems.
In order to detect such problems network operators perform heavy hitter detection for flows.
In the traditional system, the heavy hitter detection was done using analysing packets or examining the packet flows.
Prior work was focus on detecting heavy hitters on a single switch but we often need to track network-wide heavy hitters.
While detecting heavy hitters on network wide basis we will try to reduce the communication overhead while maintaining the accuracy.
발표자: 송환준(KAIST 박사과정)
발표일: 2018.8.
(Parallel Clustering Algorithm Optimization for Large-Scale Data Analytics)
Clustering은 데이터 분석에 가장 널리 쓰이는 방법 중 하나로 주어진 데이터를 유사성에 기초하여 여러 개의 그룹으로 나누는 작업이다. 하지만 Clustering 방법의 높은 계산 복잡도 때문에 대용량 데이터 분석에는 잘 사용되지 못하고 있다. 최근 이 높은 복잡도 문제를 해결하기 위해 많은 연구가 Hadoop, Spark와 같은 분산 컴퓨팅 방식을 적용하고 있지만 기존 Clustering 알고리즘을 분산 환경에 최적화시키는 것은 쉽지 않다. 특히, 효율성을 높이기 위해 정확성을 손실하는 문제 그리고 여러 작업자들 간의 부하 불균형 문제는 알고리즘을 분산처리 할 때 발생하는 대표적인 문제이다. 본 세미나에서는 대표적 Clustering 알고리즘인 DBSCAN을 분산처리 할 때 발생하는 여러 도전 과제에 초점을 맞추고 이를 해결 할 수 있는 새로운 해결책을 제시한다. 실제로 이 방법은 최신 연구의 방법과 비교하여 정확도 손실 없이 최대 180배까지 알고리즘의 성능을 향상시켰다.
본 세미나는 SIGMOD 2018에서 발표한 다음 논문에 대한 내용이다.
Song, H. and Lee, J., "RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning," In Proc. 2018 ACM Int'l Conf. on Management of Data (SIGMOD), Houston, Texas, pp. 1173 ~ 1187, June 2018
1. Background
- Concept of Clustering
- Concept of Distributed Processing (MapReduce)
- Clustering Algorithms (Focus on DBSCAN)
2. Challenges of Parallel Clustering
- Parallelization of Clustering Algorithm (Focus on DBSCAN)
- Existing Work
- Challenges
3. Our Approach
- Key Idea and Key Contribution
- Overview of Random Partitioning-DBSCAN
4. Experimental Results
5. Conclusions
Using Generative Augmentation to improve 'Learning from Crowds'. CrowdInG model is based on Generative Adversarial Networks to improve upon crowdsourced annotations and aid in Supervised learning.
reinforcement learning for difficult settingsOlivier Teytaud
1) Subgoal learning, macro-actions, partial observation, and clustering of features were explored for difficult reinforcement learning settings.
2) MCTS was adapted for these settings by incorporating simulations, action categorization, macro-action building, feature clustering, and a tree of subgoals with nodes containing goals.
3) This approach combines techniques from the state of the art and was tested on problems from other projects, showing successful application to complex real-world problems.
High-performance graph analysis is unlocking knowledge in computer security, bioinformatics, social networks, and many other data integration areas. Graphs provide a convenient abstraction for many data problems beyond linear algebra. Some problems map directly to linear algebra. Others, like community detection, look eerily similar to sparse linear algebra techniques. And then there are algorithms that strongly resist attempts at making them look like linear algebra. This talk will cover recent results with an emphasis on streaming graph problems where the graph changes and results need updated with minimal latency. We’ll also touch on issues of sensitivity and reliability where graph analysis needs to learn from numerical analysis and linear algebra.
Algorithm in Social network of graph and social network analysisoliviaclark2905
This document discusses algorithms for analyzing social networks and detecting communities within them. It describes several algorithms for community detection, including minimum-cut methods, hierarchical clustering, Girvan-Newman algorithms, modularity maximization techniques like the Louvain method, statistical inference methods, and clique-based methods. It also covers algorithms for community search to efficiently find the community containing a given query node.
Network-Wide Heavy-Hitter Detection with Commodity SwitchesAJAY KHARAT
Network operators often need to identify outliers in network traffic, to detect attacks or diagnose performance problems.
In order to detect such problems network operators perform heavy hitter detection for flows.
In the traditional system, the heavy hitter detection was done using analysing packets or examining the packet flows.
Prior work was focus on detecting heavy hitters on a single switch but we often need to track network-wide heavy hitters.
While detecting heavy hitters on network wide basis we will try to reduce the communication overhead while maintaining the accuracy.
발표자: 송환준(KAIST 박사과정)
발표일: 2018.8.
(Parallel Clustering Algorithm Optimization for Large-Scale Data Analytics)
Clustering은 데이터 분석에 가장 널리 쓰이는 방법 중 하나로 주어진 데이터를 유사성에 기초하여 여러 개의 그룹으로 나누는 작업이다. 하지만 Clustering 방법의 높은 계산 복잡도 때문에 대용량 데이터 분석에는 잘 사용되지 못하고 있다. 최근 이 높은 복잡도 문제를 해결하기 위해 많은 연구가 Hadoop, Spark와 같은 분산 컴퓨팅 방식을 적용하고 있지만 기존 Clustering 알고리즘을 분산 환경에 최적화시키는 것은 쉽지 않다. 특히, 효율성을 높이기 위해 정확성을 손실하는 문제 그리고 여러 작업자들 간의 부하 불균형 문제는 알고리즘을 분산처리 할 때 발생하는 대표적인 문제이다. 본 세미나에서는 대표적 Clustering 알고리즘인 DBSCAN을 분산처리 할 때 발생하는 여러 도전 과제에 초점을 맞추고 이를 해결 할 수 있는 새로운 해결책을 제시한다. 실제로 이 방법은 최신 연구의 방법과 비교하여 정확도 손실 없이 최대 180배까지 알고리즘의 성능을 향상시켰다.
본 세미나는 SIGMOD 2018에서 발표한 다음 논문에 대한 내용이다.
Song, H. and Lee, J., "RP-DBSCAN: A Superfast Parallel DBSCAN Algorithm Based on Random Partitioning," In Proc. 2018 ACM Int'l Conf. on Management of Data (SIGMOD), Houston, Texas, pp. 1173 ~ 1187, June 2018
1. Background
- Concept of Clustering
- Concept of Distributed Processing (MapReduce)
- Clustering Algorithms (Focus on DBSCAN)
2. Challenges of Parallel Clustering
- Parallelization of Clustering Algorithm (Focus on DBSCAN)
- Existing Work
- Challenges
3. Our Approach
- Key Idea and Key Contribution
- Overview of Random Partitioning-DBSCAN
4. Experimental Results
5. Conclusions
Using Generative Augmentation to improve 'Learning from Crowds'. CrowdInG model is based on Generative Adversarial Networks to improve upon crowdsourced annotations and aid in Supervised learning.
reinforcement learning for difficult settingsOlivier Teytaud
1) Subgoal learning, macro-actions, partial observation, and clustering of features were explored for difficult reinforcement learning settings.
2) MCTS was adapted for these settings by incorporating simulations, action categorization, macro-action building, feature clustering, and a tree of subgoals with nodes containing goals.
3) This approach combines techniques from the state of the art and was tested on problems from other projects, showing successful application to complex real-world problems.
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
This document summarizes a research paper that proposes a new method called TTT-MAE (Test-Time Training with Masked Autoencoders) to address the problem of domain shift in visual recognition tasks. TTT-MAE uses masked autoencoders as the self-supervised pretext task in test-time training, instead of rotation prediction as used in previous work. Experimental results on datasets like ImageNet-C and ImageNet-R show that TTT-MAE achieves higher performance gains than prior methods under different types of distribution shifts. However, TTT-MAE is slower at test time than directly applying a fixed model. Future work could focus on improving efficiency and generalizing the approach to other tasks
A GRASS-based procedure to compare OSM and IGN Paris road network datasetsMarco Minghini
These slides were presented during the WG2 meeting of COST Action IC1203 ENERGIC (http://vgibox.eu) held in Paris on December 3-4, 2015, which was focused on the evaluation of OSM quality through the comparison with official IGN data. The presentation describes an application of an open source GRASS-based procedure - including a Web Processing Service - to compare OSM and authoritative road network datasets (https://github.com/MoniaMolinari/OSM-roads-comparison) in the Paris case study.
This document discusses software module clustering using genetic algorithms and hill climbing techniques. It introduces genetic algorithms and hill climbing algorithms and how they can be applied to software module clustering. Specifically, it proposes using multiple hill climbs first to gather information about the search landscape, which is then used to define "building blocks" to improve subsequent searches done by genetic algorithms. The results of empirical studies using this novel approach show it to be effective at software module clustering.
Graph Transformer with Graph Pooling for Node Classification, IJCAI 2023.pptxssuser2624f71
Gapformer is a model that combines graph transformers with graph pooling for efficient node classification in large graphs. It addresses two issues with existing graph transformers: quadratic complexity with number of nodes and noise from distant neighbors. Gapformer uses graph pooling to reduce the number of attended nodes, computing attention over pooled nodes only. Experiments on 13 datasets show Gapformer outperforms other graph neural networks and graph transformers, with reduced computation and memory costs.
# Can we trust ai. the dilemma of model adjustmentTerence Huang
This document provides a summary of an AI expert's background and experience, and then discusses some challenges in ensuring the trustworthiness of AI models. It notes that while models may perform well during training, their performance can decline when deployed in the real world due to new data, noise, and errors. Interpretable modeling techniques like LIME and Grad-CAM are introduced to help evaluate whether models' predictions are appropriate and diagnose issues. The discussion emphasizes that identification of errors is not enough, and ways to correct models must also be explored, such as improving data quality.
This document provides an introduction to genetic algorithms and their applications in VLSI design and automation. It discusses the fundamentals of genetic algorithms including genetic representation, selection, crossover and mutation operators. Examples are provided for simple function optimization and the traveling salesman problem. The document also discusses applications of genetic algorithms for VLSI design problems such as partitioning, placement, routing, technology mapping and automatic test pattern generation. It provides details on genetic algorithm parameters and compares genetic algorithms to traditional optimization methods.
This document summarizes the GoogLeNet deep learning architecture. It describes how GoogLeNet uses inception modules containing 1x1 convolutional layers to reduce computational load. The inception modules perform 1x1, 3x3, and 5x5 convolutions in parallel, with the 1x1 layers reducing dimensionality first. This allows GoogLeNet to have significantly more parameters than VGGNet but higher accuracy and less computational resources. The document also explains how auxiliary classifiers are added to intermediate layers to address vanishing gradients in the deep model.
This document presents a graph-based recommendation system approach that uses multi-armed bandit modeling. It estimates an unweighted, undirected graph to represent relationships between users based on their preferences. It then applies community detection to cluster the users graph and estimates preferences for each cluster. Products are then recommended to users using an upper confidence bound method, and the user feedback is used to update the preference estimates in a reinforcement learning approach. The method is evaluated on both synthetic and real-world recommendation datasets.
This document presents an overview of Newman's fast algorithm for detecting community structure in networks. It begins by recapping previous algorithms like Newman-Girvan and Clique Percolation Method. It then outlines Newman's algorithm which works to redefine modularity in order to optimize partitioning networks into communities. The algorithm runs in near-linear time, representing a significant improvement over previous algorithms. Examples are provided to demonstrate the accuracy of Newman's approach.
Using CINET presentation as part of the CINET Workshop on July 10th, 2015 in Blacksburg, VA. CINET applications include Granite, GDS Calculator, and EDISON.
The document discusses problem solving agents and how to formulate problems for agents to solve. It explains that problem solving involves defining a goal, formulating the initial state, possible actions, and transition model between states. A search algorithm can then find a solution path through the state space from the initial to goal states. The performance of search algorithms depends on factors like completeness, optimality, and time and space complexity which are determined by properties of the state space like branching factor and solution depth. Examples of problems discussed include the vacuuming agent, 8-puzzle, and traveling salesman problems.
A GRASS-based automated procedure to compare OpenStreetMap and authoritative ...Marco Minghini
These slides were presented during the XVII meeting of the Italian users of GRASS and FOSS4G, held in Parma on February 11-12, 2015. The presentation describes an automated GRASS-based procedure to compare OSM and authoritative road network datasets (https://github.com/MoniaMolinari/OSM-roads-comparison). An application is presented focused on Paris city, where the OSM road network is compared with the official road network provided by IGN. A Web Processing Service (WPS) is also under development.
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
Graph-structured data in network security, social networks, finance, and other applications not only are massive but also under continual evolution. The changes often are scattered across the graph, permitting novel parallel and incremental analysis algorithms. We discuss analysis algorithms for streaming graph data to maintain both local and global metrics with low latency and high efficiency.
The document discusses big data and data analytics. It provides examples of the large amounts of data being generated daily by companies like Google, Facebook, eBay, and CERN. It also describes the Earthscope project which generates 67 terabytes of data by monitoring seismic activity across North America. The types of data discussed include relational, text, semi-structured, graph, and streaming data. The document outlines common techniques for analyzing big data, including aggregation, indexing/searching, knowledge discovery via data mining and statistical modeling. It provides overviews of statistics, OLAP, data warehousing, and several data mining techniques like classification, clustering, association rule mining and collaborative filtering.
Propagating Data Policies - A User StudyEnrico Daga
The document summarizes a user study conducted to evaluate a system that propagates data policies in data flows. 10 participant teams were given 5 data journeys involving real datasets and processes to determine what policies should propagate from input to output. The teams used a tool to understand the journeys and compare their decisions to the system. An accuracy analysis was conducted on the results and teams provided feedback through a questionnaire.
Christian jensen advanced routing in spatial networks using big datajins0618
Advanced Routing in Spatial Networks Using Big Data discusses using big data and advanced routing techniques for transportation networks. It covers modeling transportation networks using big data from sensors to assign time-varying weights representing factors like travel time and emissions. It then discusses routing algorithms that find optimal routes considering these weights, including algorithms for stochastic and uncertain weights. The document provides an overview of using big data to improve transportation network modeling and routing.
This document discusses how knowledge graphs and graph analytics can be used for anomaly detection in financial services. It describes building time-sequenced graph data models from a base knowledge graph to model customer behavior over time. Champion models are applied to each time window to learn a statistical distribution, and outliers in that distribution that are hard to reproduce can indicate anomalous financial behavior worthy of investigation, such as money laundering. Scaling the graph snapshots by collections of nodes and edges allows analyzing behavior at different levels from micro to macro.
Slides for paper reading in VietNam AI Community in Japan
Explanation on MobileNet V2: Inverted Residuals and Linear Bottlenecks, a paper in CVPR 23018
This document discusses the digital circuit layout problem and approaches to solving it using graph partitioning techniques. It begins by introducing the digital circuit layout problem and how it has become more complex with increasing circuit sizes. It then discusses how the problem can be decomposed into subproblems using graph partitioning to assign geometric coordinates to circuit components. The document reviews several traditional approaches to solve the problem, such as the Kernighan-Lin algorithm, and discusses their limitations for larger circuit sizes. It also discusses more recent approaches using evolutionary algorithms and concludes by analyzing the contributions of various approaches.
ZUIX is a design system created by Zigbang's CTO team to standardize design across all of Zigbang's services. It uses React Native for responsive, multi-platform components and includes tools like Storybook for development and a design review infrastructure for validation. The deployment process involves code reviews, CI/CD pipelines, and publishing to a npm registry. Training and documentation is provided through tools like Google Classroom and Notion. The team aims to further develop ZUIX by improving the design review tools, adding end-to-end testing, and analyzing component usage. The goal is to solve Zigbang's unique challenges through an agile, collaborative approach between designers and developers.
More Related Content
Similar to Efficient and Effective Influence Maximization in Social Networks: Hybrid Approach
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
This document summarizes a research paper that proposes a new method called TTT-MAE (Test-Time Training with Masked Autoencoders) to address the problem of domain shift in visual recognition tasks. TTT-MAE uses masked autoencoders as the self-supervised pretext task in test-time training, instead of rotation prediction as used in previous work. Experimental results on datasets like ImageNet-C and ImageNet-R show that TTT-MAE achieves higher performance gains than prior methods under different types of distribution shifts. However, TTT-MAE is slower at test time than directly applying a fixed model. Future work could focus on improving efficiency and generalizing the approach to other tasks
A GRASS-based procedure to compare OSM and IGN Paris road network datasetsMarco Minghini
These slides were presented during the WG2 meeting of COST Action IC1203 ENERGIC (http://vgibox.eu) held in Paris on December 3-4, 2015, which was focused on the evaluation of OSM quality through the comparison with official IGN data. The presentation describes an application of an open source GRASS-based procedure - including a Web Processing Service - to compare OSM and authoritative road network datasets (https://github.com/MoniaMolinari/OSM-roads-comparison) in the Paris case study.
This document discusses software module clustering using genetic algorithms and hill climbing techniques. It introduces genetic algorithms and hill climbing algorithms and how they can be applied to software module clustering. Specifically, it proposes using multiple hill climbs first to gather information about the search landscape, which is then used to define "building blocks" to improve subsequent searches done by genetic algorithms. The results of empirical studies using this novel approach show it to be effective at software module clustering.
Graph Transformer with Graph Pooling for Node Classification, IJCAI 2023.pptxssuser2624f71
Gapformer is a model that combines graph transformers with graph pooling for efficient node classification in large graphs. It addresses two issues with existing graph transformers: quadratic complexity with number of nodes and noise from distant neighbors. Gapformer uses graph pooling to reduce the number of attended nodes, computing attention over pooled nodes only. Experiments on 13 datasets show Gapformer outperforms other graph neural networks and graph transformers, with reduced computation and memory costs.
# Can we trust ai. the dilemma of model adjustmentTerence Huang
This document provides a summary of an AI expert's background and experience, and then discusses some challenges in ensuring the trustworthiness of AI models. It notes that while models may perform well during training, their performance can decline when deployed in the real world due to new data, noise, and errors. Interpretable modeling techniques like LIME and Grad-CAM are introduced to help evaluate whether models' predictions are appropriate and diagnose issues. The discussion emphasizes that identification of errors is not enough, and ways to correct models must also be explored, such as improving data quality.
This document provides an introduction to genetic algorithms and their applications in VLSI design and automation. It discusses the fundamentals of genetic algorithms including genetic representation, selection, crossover and mutation operators. Examples are provided for simple function optimization and the traveling salesman problem. The document also discusses applications of genetic algorithms for VLSI design problems such as partitioning, placement, routing, technology mapping and automatic test pattern generation. It provides details on genetic algorithm parameters and compares genetic algorithms to traditional optimization methods.
This document summarizes the GoogLeNet deep learning architecture. It describes how GoogLeNet uses inception modules containing 1x1 convolutional layers to reduce computational load. The inception modules perform 1x1, 3x3, and 5x5 convolutions in parallel, with the 1x1 layers reducing dimensionality first. This allows GoogLeNet to have significantly more parameters than VGGNet but higher accuracy and less computational resources. The document also explains how auxiliary classifiers are added to intermediate layers to address vanishing gradients in the deep model.
This document presents a graph-based recommendation system approach that uses multi-armed bandit modeling. It estimates an unweighted, undirected graph to represent relationships between users based on their preferences. It then applies community detection to cluster the users graph and estimates preferences for each cluster. Products are then recommended to users using an upper confidence bound method, and the user feedback is used to update the preference estimates in a reinforcement learning approach. The method is evaluated on both synthetic and real-world recommendation datasets.
This document presents an overview of Newman's fast algorithm for detecting community structure in networks. It begins by recapping previous algorithms like Newman-Girvan and Clique Percolation Method. It then outlines Newman's algorithm which works to redefine modularity in order to optimize partitioning networks into communities. The algorithm runs in near-linear time, representing a significant improvement over previous algorithms. Examples are provided to demonstrate the accuracy of Newman's approach.
Using CINET presentation as part of the CINET Workshop on July 10th, 2015 in Blacksburg, VA. CINET applications include Granite, GDS Calculator, and EDISON.
The document discusses problem solving agents and how to formulate problems for agents to solve. It explains that problem solving involves defining a goal, formulating the initial state, possible actions, and transition model between states. A search algorithm can then find a solution path through the state space from the initial to goal states. The performance of search algorithms depends on factors like completeness, optimality, and time and space complexity which are determined by properties of the state space like branching factor and solution depth. Examples of problems discussed include the vacuuming agent, 8-puzzle, and traveling salesman problems.
A GRASS-based automated procedure to compare OpenStreetMap and authoritative ...Marco Minghini
These slides were presented during the XVII meeting of the Italian users of GRASS and FOSS4G, held in Parma on February 11-12, 2015. The presentation describes an automated GRASS-based procedure to compare OSM and authoritative road network datasets (https://github.com/MoniaMolinari/OSM-roads-comparison). An application is presented focused on Paris city, where the OSM road network is compared with the official road network provided by IGN. A Web Processing Service (WPS) is also under development.
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
Graph-structured data in network security, social networks, finance, and other applications not only are massive but also under continual evolution. The changes often are scattered across the graph, permitting novel parallel and incremental analysis algorithms. We discuss analysis algorithms for streaming graph data to maintain both local and global metrics with low latency and high efficiency.
The document discusses big data and data analytics. It provides examples of the large amounts of data being generated daily by companies like Google, Facebook, eBay, and CERN. It also describes the Earthscope project which generates 67 terabytes of data by monitoring seismic activity across North America. The types of data discussed include relational, text, semi-structured, graph, and streaming data. The document outlines common techniques for analyzing big data, including aggregation, indexing/searching, knowledge discovery via data mining and statistical modeling. It provides overviews of statistics, OLAP, data warehousing, and several data mining techniques like classification, clustering, association rule mining and collaborative filtering.
Propagating Data Policies - A User StudyEnrico Daga
The document summarizes a user study conducted to evaluate a system that propagates data policies in data flows. 10 participant teams were given 5 data journeys involving real datasets and processes to determine what policies should propagate from input to output. The teams used a tool to understand the journeys and compare their decisions to the system. An accuracy analysis was conducted on the results and teams provided feedback through a questionnaire.
Christian jensen advanced routing in spatial networks using big datajins0618
Advanced Routing in Spatial Networks Using Big Data discusses using big data and advanced routing techniques for transportation networks. It covers modeling transportation networks using big data from sensors to assign time-varying weights representing factors like travel time and emissions. It then discusses routing algorithms that find optimal routes considering these weights, including algorithms for stochastic and uncertain weights. The document provides an overview of using big data to improve transportation network modeling and routing.
This document discusses how knowledge graphs and graph analytics can be used for anomaly detection in financial services. It describes building time-sequenced graph data models from a base knowledge graph to model customer behavior over time. Champion models are applied to each time window to learn a statistical distribution, and outliers in that distribution that are hard to reproduce can indicate anomalous financial behavior worthy of investigation, such as money laundering. Scaling the graph snapshots by collections of nodes and edges allows analyzing behavior at different levels from micro to macro.
Slides for paper reading in VietNam AI Community in Japan
Explanation on MobileNet V2: Inverted Residuals and Linear Bottlenecks, a paper in CVPR 23018
This document discusses the digital circuit layout problem and approaches to solving it using graph partitioning techniques. It begins by introducing the digital circuit layout problem and how it has become more complex with increasing circuit sizes. It then discusses how the problem can be decomposed into subproblems using graph partitioning to assign geometric coordinates to circuit components. The document reviews several traditional approaches to solve the problem, such as the Kernighan-Lin algorithm, and discusses their limitations for larger circuit sizes. It also discusses more recent approaches using evolutionary algorithms and concludes by analyzing the contributions of various approaches.
Similar to Efficient and Effective Influence Maximization in Social Networks: Hybrid Approach (20)
ZUIX is a design system created by Zigbang's CTO team to standardize design across all of Zigbang's services. It uses React Native for responsive, multi-platform components and includes tools like Storybook for development and a design review infrastructure for validation. The deployment process involves code reviews, CI/CD pipelines, and publishing to a npm registry. Training and documentation is provided through tools like Google Classroom and Notion. The team aims to further develop ZUIX by improving the design review tools, adding end-to-end testing, and analyzing component usage. The goal is to solve Zigbang's unique challenges through an agile, collaborative approach between designers and developers.
This document discusses Kakao's search platform front-end project. It describes the architecture of an integrated search service using microservices and the need for a design system due to fragmented UIs. It introduces the KST (Kakao Search Template) project for creating a design system including 200+ UI blocks and templates. The KST Builder, Logger, and Dashboard are discussed for managing templates, logging usage, and monitoring coverage. Maintaining a consistent design system is important for operating diverse search services and platforms.
This document discusses Banksalad Product Language (BPL), which is a method used at Banksalad to standardize UI text, elements, and components. It allows designers and developers to use consistent terms, while abstracting UI elements to different levels suitable for their roles. Examples of standardized elements are provided, as well as external resources that discuss concepts like tree shaking that are relevant to BPL. While BPL has benefits, the document considers whether there may be better approaches than BPL.
This document summarizes a presentation about using Stitches, a React styling library, and Storybook for component design.
The presentation introduces Stitches as the styling library used for its support of React, easy usage, and themes. Key features of Stitches discussed include creating styled components, variants, and comparisons to other libraries.
Storybook is presented as a way to improve communication between designers and developers by allowing visualization of components alongside their stories. Clean communication through a shared Storybook is emphasized.
Reflections on initially creating a design system note the benefits of consistency and speed but also identify areas for improvement like documentation, process alignment, and understanding each other's roles. Establishing trust and understanding between
비행기 설계를 왜 통일 해야 할까?
디자인 시스템을 하는 이유
비행기들이 다 용도가 다르다...어떻게 설계하지?
맥락이 다른 페이지와 패턴
경유지까지 아직 멀었다... 언제 수리하지?
디자인 시스템을 적용하는 시점
엔지니어랑 얘기해서 정비해야하는데...어떻게 수리하지?
디자인 시스템을 적용하는 프로세스
비행기 설계가 바뀐걸 어떻게 알리지?
디자인 시스템의 전파
The document discusses Kotlin coroutines and how they can be used to write asynchronous code in a synchronous, sequential way. It explains what coroutines are, how they work internally using continuation-passing style (CPS) transformation and state machines, and compares them to callbacks. It also outlines some of the benefits of using coroutines, such as structured concurrency, light weight execution, built-in cancellation, and simplifying asynchronous code. Finally, it provides examples of how to use common coroutine builders like launch, async, and coroutineScope in a basic Android application with ViewModels.
This document contains the transcript from a presentation given by Wonsuk Lim from Naver on tips for debugging and analyzing Android applications. Some key tips discussed include fully utilizing the Android emulator's capabilities like 2-finger touch control, clipboard sharing between the emulator and host PC, and mocking locations. Advanced settings for the emulator like foldable and camera emulation are also covered. The presenter recommends ways to configure developer options and use tools like LeakCanary, the Android profiler, and Stetho for testing app stability. Methods for understanding the Android framework by reviewing system services and managers via AIDL files and logcat dumps are presented. Finally, reverse engineering tools like APK Extractor and decompilers are introduced.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Efficient and Effective Influence Maximization in Social Networks: Hybrid Approach
1. Febrary 21, 2018 Page 1/22
Efficient and Effective Influence Maximization
in Social Networks: A Hybrid Approach
2018. 02. 21
Yun-Yong Ko
BigData science lab
Department of Computer and Software
Hanyang University
2. Febrary 21, 2018 Page 2/22
Table of Contents
• Problem definition
• Preliminary
– Diffusion model
• Related works
• Hybrid-IM
– Path-based community detection
– G-CELF algorithm
• Experimental results
3. Febrary 21, 2018 Page 3/22
Problem definition
• Influence Maximization (IM)
– To find a k-seed set that maximizes influence spread in a given network
𝑆 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑆⊂𝑉, 𝑆 =𝑘σ(S)
– Network
✓ Node: user
✓ Edge: the relationship between users
– Type of a node
✓ Active user: user who buys the product
✓ Inactive user: user who doesn’t buy the product
– The selected nodes (seed set)
✓ Active nodes in the initial stage of influence propagation
▪ User group that receives samples from a company
4. Febrary 21, 2018 Page 4/22
Preliminary
• Diffusion model
– To describes how influence spreads over the network.
✓ Linear threshold (LT) model
✓ Independent cascade (IC) model
• Common assumptions
– (1) Nodes can have either of two states, active or inactive.
– (2) As time goes by, inactive nodes can be activated, but active nodes
cannot become inactive.
– (3) The diffusion process is finished if any nodes do not become active
state.
5. Febrary 21, 2018 Page 5/22
Linear threshold (LT) model
Inactive node
Active node
Threshold
Active neighbors
v
0.5
0.3
0.2
0.5
0.1
0.4
0.3
0.2
0.6
0.2
Stop!
X
6. Febrary 21, 2018 Page 6/22
Independent cascade (IC) model
v
0.5
0.3
0.2
0.5
0.1
0.4
0.3
0.2
0.6
0.2
Inactive node
Active node
New active
node
Successful
attempt
Unsuccessful
attempt
Stop!
X
7. Febrary 21, 2018 Page 7/22
Related works - Greedy approach
• Optimal solution to IM problem
– Finding the optimal solution to IM is NP-Hard
✓ There are possible 𝑛 𝐶 𝑘 k-seed sets
• Greedy approach (SimpleGreedy)
– To select a node having the maximum marginal gain at each step
✓ The marginal gain of node v
▪ 𝜎 𝑆 + 𝑣 − 𝜎(𝑆)
▪ The influence spread obtained by the node additionally
– SimpleGreedy guarantees to find an approximate solution that provides
63% of the quality of the optimal solution
✓ Considered as the ground truth in the IM field
8. Febrary 21, 2018 Page 8/22
Performance issues of SimpleGreedy
• Macro level
– When a new seed selected in a step,
– It have to re-evaluate the marginal gain of all the non-seed nodes
✓ Their marginal gain is likely to have been changed by the new seed
▪ 𝜎 𝑆𝑡 + 𝑣 − 𝜎 𝑆𝑡 ≠ 𝜎 𝑆𝑡+1 + 𝑣 − 𝜎(𝑆𝑡+1)
• Micro level
– It evaluates the marginal gain of a node by running MC-simulations
✓ 𝝈 𝑺 + 𝒗 − 𝜎(𝑆)
✓ Running MC-simulations is very time-consuming
9. Febrary 21, 2018 Page 9/22
Related works - Community-based IM (CB-IM)
• Purpose
– To resolve the macro issue by exploiting the property of communities
• The property of communities in a social network
– Users belonging to the same community
✓ Exchange information frequently
– Users belonging to different communities
✓ Exchange information rarely
10. Febrary 21, 2018 Page 10/22
Gain of CB-IM
• Exploiting the property of communities
– The difference between the influence spread of a node within a
community and that on the whole network is insignificant
𝜎𝑖𝑛𝑡𝑟𝑎({𝑣}) ≈ 𝜎 𝑤ℎ𝑜𝑙𝑒({𝑣})
• After a new seed from a community selected,
– Only those nodes in same community need to be re-evaluated
11. Febrary 21, 2018 Page 11/22
Related works - Path-based IM (PB-IM)
• Purpose
– To resolve the micro issue by replacing MC-simulations
• Method
– To evaluate the influence spread of a node
✓ Aggregating the weights of all paths from the node
▪ Rather than running MC-simulations
𝜎 𝑣 = 1 +
𝑢⊂𝑂 𝑣
𝜎 𝑢
(𝑣)
▪ 𝜎 𝑢
(𝑣): influence from node v to node u
𝑊 𝑝 = ෑ
𝑖=1
𝑚−1
𝑤(𝑣𝑖, 𝑣𝑖+1)
12. Febrary 21, 2018 Page 12/22
Path pruning
• Purpose
– To estimate the influence spread more efficiently in PB-IM
✓ Finding all possible paths is #P-hard problem
• Method
– Only consider paths whose weights are larger than the threshold
13. Febrary 21, 2018 Page 13/22
Hybrid-IM
• Purpose
– To resolve both the micro and the macro issues of SimpleGreedy
• Proposed method
– To combine PB-IM and CB-IM for addressing both of the two issues
✓ Community detection stage
▪ To reduce the number of nodes to be re-evaluated applying CB-IM
✓ Seed selection stage
▪ To evaluate the marginal gain of a node by PB-IM quickly
– To address additional technical issues
✓ The existing community detection method
✓ The existing CELF algorithm
14. Febrary 21, 2018 Page 14/22
The existing community detection method
• Background
– Existing community detection method does not consider influence
propagation between nodes
– To improve the performance, CB-IM exploits the property of the
community structure
✓ If a considerable time is required in the community detection stage, it is not
meaningful in improving the overall performance of CB-IM
• Intuition
– Detecting communities considering the influence propagation between
communities through only live edges
✓ live edge: its weight is greater than a pre-defined threshold
15. Febrary 21, 2018 Page 15/22
The existing community detection method
• The problems of the existing method
– A large number of actual edges are ignored
✓ More than 90% of the edges were removed
– The weights of live edges are all ignored
✓ All live edges are treated identically although they could be quite different
0
1
2
3
4
5
6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Thenumberofedges
x1000
Edge weight
16. Febrary 21, 2018 Page 16/22
Strategy 1: PB-CD
• Path based community detection (PB-CD)
– To apply path-based influence estimation for estimating the overflowed
influence between communities
✓ Relies on much more edges of the original graph and their weight
✓ Rather than only live edges
• Two sub-steps
– Unit-community detection (UCD)
✓ To assign community label to each node,
▪ Based on its affinity to each neighboring community
– Community merge (CM)
✓ To merge communities by considering the overflowed influence between the
communities after UCD step
17. Febrary 21, 2018 Page 17/22
The existing CELF algorithm
• Cost effective lazy forward (CELF)
– To reduce the number of nodes to be re-evaluated (macro issue)
✓ Exploiting the submodularity of influence function
✓ σ 𝑆 + 𝑣 − σ 𝑆 ≥ σ 𝑇 + 𝑣 − σ 𝑇 , 𝑆 ⊂ 𝑇
• Example
– After node a was selected at step t,
✓ The marginal gain of node b was re-evaluated by 19
– The below nodes cannot be the next seed
✓ Because of submodularity
19
Re-evaluated
18. Febrary 21, 2018 Page 18/22
The existing CELF algorithm
• The existing CELF in CB-IM
– To assign a local CLEF queue for each community
– To be applied to local queue independently
19. Febrary 21, 2018 Page 19/22
Strategy 2: G-CELF
• Global CELF (G-CELF)
– To assign a single global queue
– To reduce the number of nodes to be re-evaluated more
• Additional information
– Node
✓ Community label
✓ Flag for the re-evaluation process
– Community
✓ Flag for the re-evaluation process Hybrid-IM
21. Febrary 21, 2018 Page 21/22
• Diffusion model
– Independent cascade (IC) model
✓ The weight of edge (u, v) = 1/𝑖𝑑𝑒𝑔𝑟𝑒𝑒 𝑣
✓ 𝑖𝑑𝑒𝑔𝑟𝑒𝑒 𝑣 : the number of in-coming edges of node v
• Dataset
Experimental setup
Dataset NetHEPT NetPHY Stanford DBLP
# of Nodes 15K 37K 281K 655K
# of Edges 58K 231K 2.31M 3.98M
Avg. Degree 7.7 12.4 8.2 6.1
Max Degree 341 286 38606 588
Direction Undirected Undirected Directed Undirected
22. Febrary 21, 2018 Page 22/22
Experiments for PB-CD
• Methods
– To show the effectiveness each part (UCD, CM) of PB-CD,
✓ Building four methods of community detection by employing all possible
combinations
UCD/ CM Live based Path based
Live based LL_CD LP_CD
Path based PL_CD PP_CD
26. Febrary 21, 2018 Page 26/22
Experiments for Hybrid-IM
• Methods
– Random
✓ Baseline
– SDD (single degree discount)
✓ To selects a node having the highest degree;
✓ After a seed is selected, the degree of all its neighbors is decreased by 1
– CB-IM, PB-IM
✓ Explained in previous slides
– Hybrid-IM
✓ Our proposed method
27. Febrary 21, 2018 Page 27/22
The running time of each method
0.001
0.01
0.1
1
10
100
1000
1 100 200 300 400 500 600 700 800 900 1000
RunningTime(sec)
The number of seeds
Hybrid-IM PB-IM CB-IM
SDD Random
0.001
0.01
0.1
1
10
100
1000
10000
1 100 200 300 400 500 600 700 800 900 1000
RunningTime(sec)
The number of seeds
Hybrid-IM PB-IM CB-IM
SDD Random
0.01
0.1
1
10
100
1000
10000
1 100 200 300 400 500 600 700 800 900 1000
RunningTime(sec)
The number of seeds
Hybrid-IM PB-IM
SDD Random
0.1
1
10
100
1000
10000
1 100 200 300 400 500 600 700 800 900 1000
RunningTime(sec)
The number of seeds
Hybrid-IM PB-IM
SDD Random
28. Febrary 21, 2018 Page 28/22
The influence spread of each method
0
1
2
3
4
5
6
1 100 200 300 400 500 600 700 800 900 1000
InfluenceSpread(x1000)
The number of seeds
Hybrid-IM PB-IM CB-IM
SDD Random
0
1
2
3
4
5
6
7
8
9
1 100 200 300 400 500 600 700 800 900 1000
InfluenceSpread(x1000)
The number of seeds
Hybrid-IM PB-IM CB-IM
SDD Random
0
1
2
3
4
5
6
7
8
1 100 200 300 400 500 600 700 800 900 1000
InfluenceSpread(x10000)
The number of seeds
Hybrid-IM PB-IM
SDD Random
0
1
2
3
4
5
6
1 100 200 300 400 500 600 700 800 900 1000
InfluenceSpread(x10000)
The number of seeds
Hybrid-IM PB-IM
SDD Random
29. Febrary 21, 2018 Page 29/22
Conclusions
• We propose Hybrid-IM that combines PB-IM and CB-IM
– In order to resolve the micro and macro level issues together in the
problem of influence maximization
• To refine it more, we identified two additional issues and proposed
two strategies that address the issues
– PB-CD strategy
✓ To consider influence propagation more accurately in community detection
– G-CELF strategy
✓ Further optimizes seed selections without any sacrifice of accuracy