Come progetto di tesi ho implementato un table constraint su un CSP solver, basato sull'architettura parallela CUDA, ancora in fase sperimentale.
As a thesis project I implemented a table constraint on a CSP solver, still experimental, based on a parallel CUDA architecture.
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...Igor Korkin
This paper focuses on the anticipatory enhancement of methods of detecting stealth software. Cyber security detection tools are insufficiently powerful to reveal the most recent cyber-attacks which use malware. In this paper, we will present first an idea of the highest stealth malware, as this is the most complicated scenario for detection because it combines both existing anti-forensic techniques together with their potential improvements. Second, we present new detection methods, which are resilient to this hidden prototype. To help solve this detection challenge, we have analyzed Windows memory content using a new method of Shannon Entropy calculation; methods of digital photogrammetry; the Zipf–Mandelbrot law, as well as by disassembling the memory content and analyzing the output. Finally, we present an idea and architecture of the software tool, which uses CUDA-enabled GPU hardware to speed-up memory forensics. All three ideas are currently a work in progress.
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsMarcos Gonzalez
O documento discute GPUs, CUDA e OpenCL para aplicações paralelas. Aborda a evolução das GPUs, arquiteturas como Tesla, Fermi e Kepler, e frameworks como CUDA e OpenCL para programação em GPUs de forma paralela, incluindo organização de memória e execução de kernels.
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be time consuming, and in an attempt to minimize this time, our project is a parallel implementation of K-Means clustering algorithm on CUDA using C. We present the performance analysis and implementation of our approach to parallelizing K-Means clustering.
Acceleration of Statistical Detection of Zero-day Malware in the Memory Dump ...Igor Korkin
This paper focuses on the anticipatory enhancement of methods of detecting stealth software. Cyber security detection tools are insufficiently powerful to reveal the most recent cyber-attacks which use malware. In this paper, we will present first an idea of the highest stealth malware, as this is the most complicated scenario for detection because it combines both existing anti-forensic techniques together with their potential improvements. Second, we present new detection methods, which are resilient to this hidden prototype. To help solve this detection challenge, we have analyzed Windows memory content using a new method of Shannon Entropy calculation; methods of digital photogrammetry; the Zipf–Mandelbrot law, as well as by disassembling the memory content and analyzing the output. Finally, we present an idea and architecture of the software tool, which uses CUDA-enabled GPU hardware to speed-up memory forensics. All three ideas are currently a work in progress.
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsMarcos Gonzalez
O documento discute GPUs, CUDA e OpenCL para aplicações paralelas. Aborda a evolução das GPUs, arquiteturas como Tesla, Fermi e Kepler, e frameworks como CUDA e OpenCL para programação em GPUs de forma paralela, incluindo organização de memória e execução de kernels.
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
This document discusses implementing linear solvers for 3D stable fluid simulations using CUDA. It introduces stable fluids, the Navier-Stokes equations used in the physics model, and iterative solvers like Jacobi, Gauss-Seidel, and conjugate gradient. Performance results show the CUDA implementations of Jacobi and Gauss-Seidel outperform CPU versions, while conjugate gradient is slower for grid sizes over 64^3 due to global memory latency. The conclusions recommend reducing global memory access and comparing multi-core CPU solvers to CUDA solvers.
The document discusses graphics processing units (GPUs) and general-purpose GPU (GPGPU) computing. It explains that GPUs were originally designed for computer graphics but can now be used for general computations through GPGPU. The document outlines CUDA and MPI frameworks for programming GPGPU applications and discusses how GPGPU provides highly parallel processing that is much faster than traditional CPUs. Example applications mentioned include molecular dynamics, bioinformatics, and high performance computing.
The document provides a weekly status update on work done for a classroom scheduler project. It discusses meeting with the client to estimate timelines and prioritize use cases. A use case diagram was created identifying unique use cases such as allowing only administrators to create course entries and only teachers to make room requests. An export to Excel use case was also developed to integrate with the client's existing Excel system.
A brute forcing system for DES cryptosystem based on distributed computing architecture built using OpenMP OpenMPI and also with support for CUDA architectures.
Leveraging social relevance: Using social networks to enhance literature acce...Lamjed Ben Jabeur
Leveraging social relevance: Using social networks to enhance literature access and microblog search
(Exploitation des réseaux sociaux pour l'accès à la littérature et la recherche des microblogs)
Thesis submitted for the degree of Doctor of Philosophy
Thesis defended on October 8th, 2013
Ph.D: Lamjed Ben Jabeur
Supervisor: Prof. Lynda Tamine, University of Toulouse 3 Paul Sabatier
Advisor: Prof. Mohand Boughanem, University of Toulouse 3 Paul Sabatier
Abstract(EN)
We propose in this work to integrate the social information network in the retrieval process and exploit the social relations between social actors as a source of evidence to measure the relevance of a document in response to a query. Two social information retrieval models have been proposed in different application frameworks: literature access and microblog retrieval. The main contributions of each model are detailed in the following.
* A social information model for flexible literature access
We proposed a generic social information retrieval model for literature access. This model represents scientific papers within a social network and evaluates their importance according to the position of respective authors in the network. Compared to previous approaches, this model incorporates new social entities represented by annotators and social annotations (tags). In addition to co-authorships, this model includes two other types of social relationships: citation and social annotation. Finally, we propose to weight these relationships according to the position of authors in the social network and their mutual collaborations.
* A social model for information retrieval for microblog search
We proposed a microblog retrieval model that evaluates the quality of tweets in two contexts: the social context and temporal context. The quality of a tweet is estimated by the social importance of the corresponding blogger. In particular, blogger's importance is calculated by the applying PageRank algorithm on the network of social influence. With the same aim, the quality of a tweet is evaluated according to its date of publication. Tweets submitted in periods of activity of query terms are then characterized by a greater importance. Finally, we propose to integrate the social importance of blogger and the temporal magnitude tweets as well as other relevance factors using a Bayesian network model.
Résumé (FR)
Nous proposons dans cette thèse d'intégrer le réseau social d'information dans le processus de recherche d'information afin d'utiliser les relations sociales entre les acteurs sociaux comme une source d'évidence pour mesurer la pertinence d'un document en réponse à une requête. Deux modèles de recherche d'information sociale ont été proposés à des cadres applicatifs différents : la recherche d'information bibliographique et la recherche d'information dans les microblogs.
GPUs are specialized processors designed for graphics processing. CUDA (Compute Unified Device Architecture) allows general purpose programming on NVIDIA GPUs. CUDA programs launch kernels across a grid of blocks, with each block containing multiple threads that can cooperate. Threads have unique IDs and can access different memory types including shared, global, and constant memory. Applications that map well to this architecture include physics simulations, image processing, and other data-parallel workloads. The future of CUDA includes more general purpose uses through GPGPU and improvements in virtual memory, size, and cooling.
La programmation par contraintes avec Choco3 (Java)Aline Figoureux
Présentation rapide du concept de programmation par contraintes (Constraint programming) avec la librairie Java Choco3 lors du JUG du 9 Avril 2014 à Tours (France)
Parallel Implementation of K Means Clustering on CUDAprithan
K-Means clustering is a popular clustering algorithm in data mining. Clustering large data sets can be
time consuming, and in an attempt to minimize this time, our project is a parallel implementation of KMeans
clustering algorithm on CUDA using C. We present the performance analysis and implementation
of our approach to parallelizing K-Means clustering.
This document discusses implementing linear solvers for 3D stable fluid simulations using CUDA. It introduces stable fluids, the Navier-Stokes equations used in the physics model, and iterative solvers like Jacobi, Gauss-Seidel, and conjugate gradient. Performance results show the CUDA implementations of Jacobi and Gauss-Seidel outperform CPU versions, while conjugate gradient is slower for grid sizes over 64^3 due to global memory latency. The conclusions recommend reducing global memory access and comparing multi-core CPU solvers to CUDA solvers.
The document discusses graphics processing units (GPUs) and general-purpose GPU (GPGPU) computing. It explains that GPUs were originally designed for computer graphics but can now be used for general computations through GPGPU. The document outlines CUDA and MPI frameworks for programming GPGPU applications and discusses how GPGPU provides highly parallel processing that is much faster than traditional CPUs. Example applications mentioned include molecular dynamics, bioinformatics, and high performance computing.
The document provides a weekly status update on work done for a classroom scheduler project. It discusses meeting with the client to estimate timelines and prioritize use cases. A use case diagram was created identifying unique use cases such as allowing only administrators to create course entries and only teachers to make room requests. An export to Excel use case was also developed to integrate with the client's existing Excel system.
A brute forcing system for DES cryptosystem based on distributed computing architecture built using OpenMP OpenMPI and also with support for CUDA architectures.
Leveraging social relevance: Using social networks to enhance literature acce...Lamjed Ben Jabeur
Leveraging social relevance: Using social networks to enhance literature access and microblog search
(Exploitation des réseaux sociaux pour l'accès à la littérature et la recherche des microblogs)
Thesis submitted for the degree of Doctor of Philosophy
Thesis defended on October 8th, 2013
Ph.D: Lamjed Ben Jabeur
Supervisor: Prof. Lynda Tamine, University of Toulouse 3 Paul Sabatier
Advisor: Prof. Mohand Boughanem, University of Toulouse 3 Paul Sabatier
Abstract(EN)
We propose in this work to integrate the social information network in the retrieval process and exploit the social relations between social actors as a source of evidence to measure the relevance of a document in response to a query. Two social information retrieval models have been proposed in different application frameworks: literature access and microblog retrieval. The main contributions of each model are detailed in the following.
* A social information model for flexible literature access
We proposed a generic social information retrieval model for literature access. This model represents scientific papers within a social network and evaluates their importance according to the position of respective authors in the network. Compared to previous approaches, this model incorporates new social entities represented by annotators and social annotations (tags). In addition to co-authorships, this model includes two other types of social relationships: citation and social annotation. Finally, we propose to weight these relationships according to the position of authors in the social network and their mutual collaborations.
* A social model for information retrieval for microblog search
We proposed a microblog retrieval model that evaluates the quality of tweets in two contexts: the social context and temporal context. The quality of a tweet is estimated by the social importance of the corresponding blogger. In particular, blogger's importance is calculated by the applying PageRank algorithm on the network of social influence. With the same aim, the quality of a tweet is evaluated according to its date of publication. Tweets submitted in periods of activity of query terms are then characterized by a greater importance. Finally, we propose to integrate the social importance of blogger and the temporal magnitude tweets as well as other relevance factors using a Bayesian network model.
Résumé (FR)
Nous proposons dans cette thèse d'intégrer le réseau social d'information dans le processus de recherche d'information afin d'utiliser les relations sociales entre les acteurs sociaux comme une source d'évidence pour mesurer la pertinence d'un document en réponse à une requête. Deux modèles de recherche d'information sociale ont été proposés à des cadres applicatifs différents : la recherche d'information bibliographique et la recherche d'information dans les microblogs.
GPUs are specialized processors designed for graphics processing. CUDA (Compute Unified Device Architecture) allows general purpose programming on NVIDIA GPUs. CUDA programs launch kernels across a grid of blocks, with each block containing multiple threads that can cooperate. Threads have unique IDs and can access different memory types including shared, global, and constant memory. Applications that map well to this architecture include physics simulations, image processing, and other data-parallel workloads. The future of CUDA includes more general purpose uses through GPGPU and improvements in virtual memory, size, and cooling.
La programmation par contraintes avec Choco3 (Java)Aline Figoureux
Présentation rapide du concept de programmation par contraintes (Constraint programming) avec la librairie Java Choco3 lors du JUG du 9 Avril 2014 à Tours (France)
La programmation par contraintes avec Choco3 (Java)
Implementazione di un vincolo table su un CSP solver GPU-based
1. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Implementazione di un vincolo table su un CSP
solver GPU-based
Tesi di Laurea
Tommaso Campari
27 Ottobre 2016 - A.A. 2015-2016
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
2. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Indice
Introduzione ai CSP
Introduzione a CUDA
Il solver iNVIDIOSO
Il Table Constraint
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
3. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
I CSP: Constraint Satisfaction Problem
Constraint:
Sia X una sequenza finita di variabili X = {x1, ..., xn} con
n > 0 con i rispettivi domini D = {d1, ..., dn}. Un constraint c
su X definito come c ⊆ d1 × ... × dn `e un sottoinsieme del
prodotto cartesiano dei domini.
CSP:
un CSP `e una tripla P = X, D, C dove:
X: rappresenta l’insieme delle variabili {x1, ..., xn}
D: rappresenta l’insieme dei domini necessariamente non
vuoti {d1, ..., dn} associati univocamente alle variabili.
C: rappresenta l’insieme dei vincoli sulle variabili X.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
4. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Risolvere un CSP
L’obbiettivo `e trovare una o pi`u soluzioni ammissibili.
Soluzione:
Una soluzione `e un’assegnamento delle variabili che soddisfa tutti i
vincoli del CSP.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
5. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Arc e Bound consistency
L’operazione di consistency rimuove dai domini delle variabili
associate ad un constraint valori che sicuramente non portano a
una soluzione.
Arc consistency
Analizza ogni valore del
dominio;
E’ pi`u costosa;
Elimina valori che non
portano a soluzione.
Bound consistency
Analizza solo i valori agli
estremi del dominio.
E’ meno costosa;
Elimina solo i valori agli
estremi del dominio che non
portano a soluzione.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
6. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Introduzione a CUDA
CUDA:
Architettura general purpose per il
parallel computing;
Sfrutta il motore di calcolo delle
GPU per risolvere problemi;
Utilizza blocchi e thread per il
parallelismo;
Le funzioni parallele sono
denominate Kernel.
Figura:
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
7. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Parallelismo dinamico
Parallelismo dinamico:
Estensione al modello di
programmazione CUDA:
Permette ai Kernel di essere
invocati direttamente della GPU;
Minor comunicazione CPU → GPU
e viceversa;
Maggior efficienza e flessibilit`a.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
8. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
iNVIDIOSO
Si tratta di un CSP solver:
Sperimentale;
Ancora in fase di sviluppo;
Con supporto all’architettura CUDA.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
9. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
La rappresentazione dei domini in iNVIDIOSO
I domini sono rappresentati secondo due modalit`a:
Bound rapresentation: le variabili i cui domini hanno una
differenza tra il minimo e il massimo elemento di almeno 256
sono implementati come una coppia di valori denominati
Bound;
Bitmask rapresentation: altrimenti sono implementati
mediante una bitmask composta da 8 interi a 32 bit, dove
ognuno di questi se impostato a 1 rappresenta un elemento
presente nel dominio.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
10. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Obiettivi della tesi
Ideare un algoritmo parallelo efficiente per il vincolo table;
Integrarlo sul solver;
Dimostrare l’effettiva possibilit`a di propagare i vincoli in
parallelo.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
11. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Il Table Constraint
Si tratta di un constraint estensionale definito elencando
esplicitamente una lista di n tuple di valori permessi per le variabili
nel suo scope.
Esempio: table([X1, X2, X3], [ 1, 2, 3 , 4, 5, 6 , 7, 8, 9 ]) con
D1, D2 e D3 fissati a [1, ..., 10]. La tabella associata al vincolo
pu`o quindi essere vista come:
X1 X2 X3
t1 1 2 3
t2 4 5 6
t3 7 8 9
Dopo il filtering: D1 = {1, 4, 7}, D2 = {2, 5, 8} e D3 = {3, 6, 9}
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
12. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
L’algoritmo di consistenza sequenziale
L’algoritmo di consistenza `e stato innanzitutto pensato per
un’esecuzione sequenziale su CPU e in particolare vuole sfruttare la
rappresentazione dei domini fornita dal solver.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
13. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Bound consistency sequenziale per una variabile
Nel caso di esecuzione su domini con rappresentazione tramite
coppia di bound viene eseguita la consistenza per la variabile dello
scope selezionata solo sul lower e sull’upper bound.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
14. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Arc consistency sequenziale per una variabile
Nel caso di esecuzione su domini con rappresentazione tramite
bitmask viene eseguita la consistenza per la variabile dello scope
selezionata su ogni elemento del dominio.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
15. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Schema di implementazione con multithreading su CUDA
La prima implementazione fa utilizzo di un solo blocco con 256
thread in esecuzione parallela.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
16. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Bound consistency con multithreading
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
17. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Arc consistency con multithreading
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
18. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Risultati ottenuti con il multithreading su CUDA(I)
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
19. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Risultati ottenuti con il multithreading su CUDA(II)
L’andamento dovrebbe essere a tempo costante parallelo;
Non accade perch`e un thread si occupa di un valore del
dominio, che pu`o essere associato a molte tuple;
Nel test il numero di queste tuple per`o era lineare rispetto alla
dimensione della table.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
20. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Schema dell’implementazione con il parallelismo dinamico
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
21. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Consistency con il parallelismo dinamico
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
22. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Confronto tra le due implementazioni parallele
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
23. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Lavori futuri
Integrazione dei vincoli estensionali sul parser di iNVIDIOSO;
Integrazione del parallelismo dinamico su iNVIDIOSO;
Modifica dell’algoritmo di ordinamento con un mergesort
parallelo;
Bilanciamento del lavoro tra i thread in caso di distribuzione
non uniforme dei valori nelle tuple.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
24. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Conclusioni
Gli obiettivi inizialmente proposti sono stati raggiunti ed in
particolare:
La propagazione dei vincoli su GPU `e possibile;
L’algoritmo implementato(specie nel caso del parallelismo
dinamico) `e efficiente e filtra correttamente le soluzioni.
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based
25. Introduzione ai CSP Introduzione a CUDA Il solver iNVIDIOSO Il Table Constraint
Grazie per l’attenzione!
Tommaso Campari
Implementazione di un vincolo table su un CSP solver GPU-based