The document describes experience with partial SIMDization in the Open64 compiler using dynamic programming. It discusses applying partial SIMDization to pack isomorphic instructions into SIMD instructions without fully vectorizing loops. It outlines applying a dynamic programming approach based on the Barik-Zhao-Sarkar algorithm to choose optimal SIMD tile combinations. Initial results show applying this to a hot loop in GROMACS was able to SIMDize over 88% of operations, demonstrating the potential of the technique.
The International Journal of Engineering and Sciencetheijes
This document summarizes a research paper that proposes a block cipher involving a key matrix and key bunch matrix supplemented with permutation. The cipher encrypts a plaintext matrix using modular multiplication with the key matrices over 256. It adds a permutation function that circularly rotates and swaps bits in the plaintext matrix in each round. Cryptanalysis showed the cipher cannot be broken by general attacks. The decryption uses the inverse key matrix and multiplicative inverse of the encryption key bunch matrix.
The document is a master's thesis titled "Automatic Program Parallelization for GPU Environment". It discusses using a compiler called C2CUDA to automatically parallelize sequential C programs for execution on a GPU. The thesis presents techniques for data dependence analysis and loop transformations to expose parallelism. It also provides results of experiments parallelizing matrix operations and magnetic resonance imaging algorithms using C2CUDA. Speedups of up to 58x were achieved for the parallelized applications compared to sequential execution.
The document discusses challenges and limitations with existing tools for refactoring code clones. It proposes a new approach to improve clone refactoring that involves three main phases:
1) Control structure matching to identify refactorable clones that have identical control flow.
2) PDG mapping to find the optimal mapping between clone statements that maximizes matches and minimizes differences.
3) Precondition examination to ensure refactoring preserves program behavior.
The goal is to parameterize more types of differences, find optimal matches, and add behavior preservation checks. This would allow safer and more effective refactoring of clones.
Introduction to CNN with Application to Object RecognitionArtifacia
This is the presentation from our second AI Meet held on Dec 10, 2016.
You can join Artifacia AI Meet Bangalore Group: https://www.meetup.com/Artifacia-AI-Meet/
The document discusses binary index trees (also called Fenwick trees) and segment trees, which are data structures that allow efficient querying of array prefixes and intervals. Binary index trees support adding values to array elements and retrieving prefix sums in O(log n) time. Segment trees similarly support adding values and finding maximum/minimum values in intervals in O(log n) time. Both achieve faster query times than naive solutions by representing the array as a tree structure.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
Polyhedral compilation uses the polyhedral model to represent programs as systems of affine inequalities over iteration variables. This allows loop transformations like fusion, distribution, skewing and reversal to be expressed as affine mappings on the iteration space. The key aspects are representing the iteration domain, scheduling functions that determine the execution order of statements, and memory accesses in terms of iteration vectors. Loop transformations are specified by changing the scheduling functions to map iterations to new logical execution times while preserving semantics. This enables optimizations at the level of whole programs or subprograms.
The document summarizes a presentation on revocable identity-based encryption (RIBE) from codes with rank metric. Key points:
- RIBE adds an efficient revocation procedure to identity-based encryption by using a binary tree structure and key updates.
- The construction is based on low rank parity-check codes, with the master secret key defined as the "trapdoor" generated by the RankSign algorithm.
- Security relies on the rank syndrome decoding problem. Key updates are done efficiently through the binary tree with logarithmic complexity.
- Parameters are given that allow decoding of up to 2wr errors with small failure probability, suitable for the identity-based encryption scheme.
The International Journal of Engineering and Sciencetheijes
This document summarizes a research paper that proposes a block cipher involving a key matrix and key bunch matrix supplemented with permutation. The cipher encrypts a plaintext matrix using modular multiplication with the key matrices over 256. It adds a permutation function that circularly rotates and swaps bits in the plaintext matrix in each round. Cryptanalysis showed the cipher cannot be broken by general attacks. The decryption uses the inverse key matrix and multiplicative inverse of the encryption key bunch matrix.
The document is a master's thesis titled "Automatic Program Parallelization for GPU Environment". It discusses using a compiler called C2CUDA to automatically parallelize sequential C programs for execution on a GPU. The thesis presents techniques for data dependence analysis and loop transformations to expose parallelism. It also provides results of experiments parallelizing matrix operations and magnetic resonance imaging algorithms using C2CUDA. Speedups of up to 58x were achieved for the parallelized applications compared to sequential execution.
The document discusses challenges and limitations with existing tools for refactoring code clones. It proposes a new approach to improve clone refactoring that involves three main phases:
1) Control structure matching to identify refactorable clones that have identical control flow.
2) PDG mapping to find the optimal mapping between clone statements that maximizes matches and minimizes differences.
3) Precondition examination to ensure refactoring preserves program behavior.
The goal is to parameterize more types of differences, find optimal matches, and add behavior preservation checks. This would allow safer and more effective refactoring of clones.
Introduction to CNN with Application to Object RecognitionArtifacia
This is the presentation from our second AI Meet held on Dec 10, 2016.
You can join Artifacia AI Meet Bangalore Group: https://www.meetup.com/Artifacia-AI-Meet/
The document discusses binary index trees (also called Fenwick trees) and segment trees, which are data structures that allow efficient querying of array prefixes and intervals. Binary index trees support adding values to array elements and retrieving prefix sums in O(log n) time. Segment trees similarly support adding values and finding maximum/minimum values in intervals in O(log n) time. Both achieve faster query times than naive solutions by representing the array as a tree structure.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
Polyhedral compilation uses the polyhedral model to represent programs as systems of affine inequalities over iteration variables. This allows loop transformations like fusion, distribution, skewing and reversal to be expressed as affine mappings on the iteration space. The key aspects are representing the iteration domain, scheduling functions that determine the execution order of statements, and memory accesses in terms of iteration vectors. Loop transformations are specified by changing the scheduling functions to map iterations to new logical execution times while preserving semantics. This enables optimizations at the level of whole programs or subprograms.
The document summarizes a presentation on revocable identity-based encryption (RIBE) from codes with rank metric. Key points:
- RIBE adds an efficient revocation procedure to identity-based encryption by using a binary tree structure and key updates.
- The construction is based on low rank parity-check codes, with the master secret key defined as the "trapdoor" generated by the RankSign algorithm.
- Security relies on the rank syndrome decoding problem. Key updates are done efficiently through the binary tree with logarithmic complexity.
- Parameters are given that allow decoding of up to 2wr errors with small failure probability, suitable for the identity-based encryption scheme.
The document discusses attribute-based encryption (ABE) schemes, including Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE). It defines the components of KP-ABE and CP-ABE, including setup, encryption, key generation, and decryption algorithms. It also describes the security models and proves the selective security of the GPSW KP-ABE scheme and correctness of the Waters CP-ABE scheme under the decisional bilinear Diffie-Hellman assumption. The document outlines the KP-ABE and CP-ABE constructions and security proofs in detail.
The International Journal of Engineering and Science (IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Positive and negative solutions of a boundary value problem for a fractional ...journal ijrtem
: In this work, we study a boundary value problem for a fractional
q, -difference equation. By
using the monotone iterative technique and lower-upper solution method, we get the existence of positive or
negative solutions under the nonlinear term is local continuity and local monotonicity. The results show that we
can construct two iterative sequences for approximating the solutions
We allow Eve to modify DH parameters as well as public keys of Alice and Bob. This allows Eve to derive the secret key and break the DH crypto system. We demonstrate that the DH key exchange algorithm should not be used without digital signatures.
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)Hansol Kang
LSGAN은 기존의 GAN loss가 아닌 MSE loss를 사용하여, 더욱 realistic한 데이터를 생성함.
LSGAN 논문 리뷰 및 PyTorch 기반의 구현.
[참고]
Mao, Xudong, et al. "Least squares generative adversarial networks." Proceedings of the IEEE International Conference on Computer Vision. 2017.
This week, Luke Pearson (Polychain Capital) and Joshua Fitzgerald (Anoma) present their work on Plonkup, a protocol that combines Plookup and PLONK into a single, efficient protocol. The protocol relies on a new hash function, called Reinforced Concrete, written by Dmitry Khovratovich. The three of them will present their work together at this week's edition of zkStudyClub!
Slides:
---
To Follow the Zero Knowledge Podcast us at https://www.zeroknowledge.fm
To the listeners of Zero Knowledge Podcast, if you like what we do:
- Follow us on Twitter - @zeroknowledgefm
- Join us on Telegram - https://t.me/joinchat/TORo7aknkYNLHmCM
- Support our Gitcoin Grant - https://gitcoin.co/grants/329/zero-knowledge-podcast-2
- Support us on Patreon - https://www.patreon.com/zeroknowledge
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...Deepak Malani
This document presents an ILP formulation to maximize toggling in combinational circuits for power estimation. It defines variables and constraints for logic gates. The constraints include I/O behavior, linearization of products, and toggling. The objective is to maximize the sum of toggles on all nets. The approach was tested on ISCAS benchmarks and found to toggle over 50% of nets within hours on a desktop computer using ILP solvers like GLPK and CPLEX. Future work includes handling delays, weighted fanouts, and extending the approach to sequential circuits.
The document describes a construction of a verifiable random function (VRF) that improves upon previous work. It directly constructs a VRF without using Goldreich-Levin hardcore bits, allows inputs of arbitrary size without encoding, and has proofs and keys consisting of a constant number of group elements. The security of the VRF relies on complexity assumptions in bilinear groups and it finds applications like non-interactive lotteries and compact e-cash schemes.
1. The document derives formulas for variational Bayesian inference of correlated topic models (CTM).
2. It presents the generative process of CTM, which models correlations between topics using Gaussian distributions over topic proportions.
3. Variational inference is used to optimize an evidence lower bound, deriving update formulas for the variational distributions of topics, topic-word distributions, and correlations between topics.
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
* Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
IRJET- On Certain Subclasses of Univalent Functions: An ApplicationIRJET Journal
This document presents research on certain subclasses of univalent functions. Specifically, it studies the class WR(λ,β,α,μ,θ) which consists of analytic and univalent functions with negative coefficients in the open disk, defined by the Hadamard product with the Rafid operator. The paper obtains coefficient bounds and extreme points for this class. It also examines the weighted mean, arithmetic mean, and derives some additional results for the class.
This document discusses backtracking algorithms and provides examples for solving problems using backtracking, including:
1) Generating all subsets and permutations of a set using backtracking.
2) The eight queens problem, which can be solved using a backtracking algorithm that places queens on a chessboard one by one while checking for threats.
3) Key components of backtracking algorithms including candidate construction, checking for solutions, and pruning search spaces for efficiency.
ZK Study Club: Sumcheck Arguments and Their ApplicationsAlex Pruden
Talk given at the ZK Study Club by Jonathan Bootle and Katerina Sotiraki about the universality of sumcheck arguments and their importance in zero-knowledge cryptography.
This document summarizes a presentation on augmenting descriptors for fine-grained visual categorization using polynomial embedding. The presentation introduces polynomial embedding as a method to exploit co-occurrence information between neighboring local descriptors. Polynomial embedding compresses polynomials of neighboring local feature vectors with supervised dimensionality reduction to obtain discriminative latent descriptors. Experiments on fine-grained categorization datasets show that polynomial embedding improves classification accuracy over baselines and state-of-the-art methods. However, the method is less effective for object and scene categorization problems.
Can we reveal the RSA private exponent d from its public key <e, n>? We study this question for two specific cases: e = 3 and e = 65537. Using demos, we verify that RSA reveals the most significant half of the private exponent d when the public exponent e is small. For example, for 2048-bit RSA, the most significant 1024 bits are revealed!
This document discusses the implementation of a lock-free queue in C++. It begins with an overview of blocking vs non-blocking behavior and synchronization primitives like mutexes. It then presents the implementation of a single-producer, single-consumer lock-free queue using atomic operations. It shows how push and pop operations are performed concurrently without locking by atomically updating shared queue pointers. It also discusses techniques like helping other threads complete operations to ensure lock-free progress.
This document contains programs written in C++ to perform encryption and decryption using Caesar cipher, Vigenere cipher, and Playfair cipher. It includes the code for encrypting and decrypting text using Caesar cipher. It also includes code for encrypting and decrypting text using Vigenere cipher's autokey method. There is incomplete code provided for Playfair cipher that distributes the alphabet and key into a 5x5 matrix but does not include the encryption logic.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
The document discusses attribute-based encryption (ABE) schemes, including Key-Policy ABE (KP-ABE) and Ciphertext-Policy ABE (CP-ABE). It defines the components of KP-ABE and CP-ABE, including setup, encryption, key generation, and decryption algorithms. It also describes the security models and proves the selective security of the GPSW KP-ABE scheme and correctness of the Waters CP-ABE scheme under the decisional bilinear Diffie-Hellman assumption. The document outlines the KP-ABE and CP-ABE constructions and security proofs in detail.
The International Journal of Engineering and Science (IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Positive and negative solutions of a boundary value problem for a fractional ...journal ijrtem
: In this work, we study a boundary value problem for a fractional
q, -difference equation. By
using the monotone iterative technique and lower-upper solution method, we get the existence of positive or
negative solutions under the nonlinear term is local continuity and local monotonicity. The results show that we
can construct two iterative sequences for approximating the solutions
We allow Eve to modify DH parameters as well as public keys of Alice and Bob. This allows Eve to derive the secret key and break the DH crypto system. We demonstrate that the DH key exchange algorithm should not be used without digital signatures.
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)Hansol Kang
LSGAN은 기존의 GAN loss가 아닌 MSE loss를 사용하여, 더욱 realistic한 데이터를 생성함.
LSGAN 논문 리뷰 및 PyTorch 기반의 구현.
[참고]
Mao, Xudong, et al. "Least squares generative adversarial networks." Proceedings of the IEEE International Conference on Computer Vision. 2017.
This week, Luke Pearson (Polychain Capital) and Joshua Fitzgerald (Anoma) present their work on Plonkup, a protocol that combines Plookup and PLONK into a single, efficient protocol. The protocol relies on a new hash function, called Reinforced Concrete, written by Dmitry Khovratovich. The three of them will present their work together at this week's edition of zkStudyClub!
Slides:
---
To Follow the Zero Knowledge Podcast us at https://www.zeroknowledge.fm
To the listeners of Zero Knowledge Podcast, if you like what we do:
- Follow us on Twitter - @zeroknowledgefm
- Join us on Telegram - https://t.me/joinchat/TORo7aknkYNLHmCM
- Support our Gitcoin Grant - https://gitcoin.co/grants/329/zero-knowledge-podcast-2
- Support us on Patreon - https://www.patreon.com/zeroknowledge
ILP Based Approach for Input Vector Controlled (IVC) Toggle Maximization in C...Deepak Malani
This document presents an ILP formulation to maximize toggling in combinational circuits for power estimation. It defines variables and constraints for logic gates. The constraints include I/O behavior, linearization of products, and toggling. The objective is to maximize the sum of toggles on all nets. The approach was tested on ISCAS benchmarks and found to toggle over 50% of nets within hours on a desktop computer using ILP solvers like GLPK and CPLEX. Future work includes handling delays, weighted fanouts, and extending the approach to sequential circuits.
The document describes a construction of a verifiable random function (VRF) that improves upon previous work. It directly constructs a VRF without using Goldreich-Levin hardcore bits, allows inputs of arbitrary size without encoding, and has proofs and keys consisting of a constant number of group elements. The security of the VRF relies on complexity assumptions in bilinear groups and it finds applications like non-interactive lotteries and compact e-cash schemes.
1. The document derives formulas for variational Bayesian inference of correlated topic models (CTM).
2. It presents the generative process of CTM, which models correlations between topics using Gaussian distributions over topic proportions.
3. Variational inference is used to optimize an evidence lower bound, deriving update formulas for the variational distributions of topics, topic-word distributions, and correlations between topics.
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
* Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
IRJET- On Certain Subclasses of Univalent Functions: An ApplicationIRJET Journal
This document presents research on certain subclasses of univalent functions. Specifically, it studies the class WR(λ,β,α,μ,θ) which consists of analytic and univalent functions with negative coefficients in the open disk, defined by the Hadamard product with the Rafid operator. The paper obtains coefficient bounds and extreme points for this class. It also examines the weighted mean, arithmetic mean, and derives some additional results for the class.
This document discusses backtracking algorithms and provides examples for solving problems using backtracking, including:
1) Generating all subsets and permutations of a set using backtracking.
2) The eight queens problem, which can be solved using a backtracking algorithm that places queens on a chessboard one by one while checking for threats.
3) Key components of backtracking algorithms including candidate construction, checking for solutions, and pruning search spaces for efficiency.
ZK Study Club: Sumcheck Arguments and Their ApplicationsAlex Pruden
Talk given at the ZK Study Club by Jonathan Bootle and Katerina Sotiraki about the universality of sumcheck arguments and their importance in zero-knowledge cryptography.
This document summarizes a presentation on augmenting descriptors for fine-grained visual categorization using polynomial embedding. The presentation introduces polynomial embedding as a method to exploit co-occurrence information between neighboring local descriptors. Polynomial embedding compresses polynomials of neighboring local feature vectors with supervised dimensionality reduction to obtain discriminative latent descriptors. Experiments on fine-grained categorization datasets show that polynomial embedding improves classification accuracy over baselines and state-of-the-art methods. However, the method is less effective for object and scene categorization problems.
Can we reveal the RSA private exponent d from its public key <e, n>? We study this question for two specific cases: e = 3 and e = 65537. Using demos, we verify that RSA reveals the most significant half of the private exponent d when the public exponent e is small. For example, for 2048-bit RSA, the most significant 1024 bits are revealed!
This document discusses the implementation of a lock-free queue in C++. It begins with an overview of blocking vs non-blocking behavior and synchronization primitives like mutexes. It then presents the implementation of a single-producer, single-consumer lock-free queue using atomic operations. It shows how push and pop operations are performed concurrently without locking by atomically updating shared queue pointers. It also discusses techniques like helping other threads complete operations to ensure lock-free progress.
This document contains programs written in C++ to perform encryption and decryption using Caesar cipher, Vigenere cipher, and Playfair cipher. It includes the code for encrypting and decrypting text using Caesar cipher. It also includes code for encrypting and decrypting text using Vigenere cipher's autokey method. There is incomplete code provided for Playfair cipher that distributes the alphabet and key into a 5x5 matrix but does not include the encryption logic.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
The document discusses a pedagogical model for a school. It mentions PEI, which stands for Proyecto Educativo Institucional or Institutional Educational Project, and a pedagogical model. However, no other details are provided about the contents or goals of the PEI or pedagogical model.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive function. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
Ringkasan singkat presentasi Tri Tunisih:
1. Presentasi perkenalan diri Tri Tunisih, mahasiswa D3 Teknik Komputer di STT Telematika Cakrawala.
2. Menguraikan latar belakang pribadi seperti nama, alamat, status sebagai mahasiswa, pekerjaan sampingan, serta kontak media sosial.
3. Menyampaikan ciri khas dirinya yang mudah bergaul, ramah, dan supel.
This document discusses data science and introduces Ryan Swanstrom, a data scientist and blogger. It defines data science as the creation of data products through a complete data lifecycle process involving question formulation, data preparation, analysis and modeling to achieve descriptive, predictive and prescriptive goals. Big data is distinguished from data science in that data science does not necessarily require large volumes of data and can be performed on smaller datasets. The data science workflow is described as non-linear and iterative rather than a perfect linear process. Specialization and teamwork are emphasized as important for data science work.
The document discusses organizational agility and enabling adaptive enterprises. It provides statistics showing that while many CEOs feel their organizations are effective at strategic planning, far fewer feel they are effective at implementing strategies. It also discusses how few organizations effectively manage strategic initiatives. Additionally, it explores how enterprises can use techniques like enterprise architecture, reference models, and business capability models to better connect strategies to execution through projects and programs. This would help organizations overcome issues translating strategies into the right projects.
The document provides information about several notable events from the 1980s:
- The 1980 eruption of Mount St. Helens in Washington, which was the worst volcanic disaster in U.S. history but also provided a rare opportunity for scientific study of volcanoes.
- The 1980 release and popularity of the iconic video game Pac-Man.
- The location of the wreck of the RMS Titanic, which sank in 1912.
- The 1986 Chernobyl nuclear power plant explosion in Ukraine, one of the worst nuclear disasters in history.
- The 1988 bombing of Pan Am Flight 103 over Lockerbie, Scotland that killed 270 people.
- The 1989 fall of the Berlin Wall, when East Germany
The document discusses the results of a study on the effects of a new drug on memory and cognitive function in older adults. The double-blind study involved 100 participants aged 65-80 who were given either the drug or a placebo daily for 6 months. Researchers found that those who received the drug performed significantly better on memory and problem-solving tests at the end of the study compared to those who received the placebo.
- مقدمة
- مفهوم التعليم الفردي:
- أهمية التعليم الفردي:
- خصائص التعليم الفردي :
- المبادئ الأساسية للتعليم الفردي:
- أساليب وأنواع التعلم الفردي
- مهارات التعليم التفريدي
- دور المعلم في التعليم الفردي:
- الإجراءات المطلوبة لتنفيذ التعليم المفرد في الواقع التعليمي:
- التعلم التعاوني :
- تعريف التعليم الجماعي:
- آراء بعض من العلماء بالعمل الجماعي:
- المبادئ الأساسية للتعلم التعاوني:
- العناصر الأساسية للتعلم التعاوني:
- دور المعلم في التعلم التعاوني:
- استراتيجيات التعلم التعاوني:
- تشكيل مجموعات العمل التعاوني :
- اقتراحات تسهم في تنظيم عمل المجموعات:
- الصعوبات التي تواجه تطبيق التعليم التعاوني:
- ما يجب توفره في العمليات الجماعية :
- مزايا الأساليب الجماعية التعاونية:
- القيم التربوية للتعلم الجماعي التعاوني:
- المراجع.
The document provides instructions for examining the skull, scalp, hair, nose and paranasal sinuses, ears, eyes, abdomen, and vestibulocochlear nerve. Key steps include observing the skull shape and scalp, checking for lice or lesions on the scalp, examining the nose shape and drainage, checking the ear canals and eardrums, observing the eyelids and testing tear drainage, listening to bowel sounds in the abdomen, and assessing hearing through voice and watch tests and tuning fork tests. Normal findings are described for each area examined.
Big Data and Data Science are hot buzzwords right now. The buzzwords might go away but the ideas will not. This talk will explain the buzzwords, and it will cover some of the best resources for attaining data science skills.
Open Data Kit (ODK) is a free and open-source set of tools for mobile data collection. It allows researchers to create online forms, interface with a data server to download blank forms and submit completed forms from mobile devices even without internet access. Key components include ODK Build for authoring forms, ODK Collect for mobile data collection, and ODK Aggregate for storing submitted data. The tools make data collection more efficient, robust and cost-effective compared to paper-based methods. ODK has been successfully used in various deployments including surveys, project monitoring, and crisis mapping.
Presentation about bytecode and what is going on at the JVM at AndroidMakers conference Paris. Fine-tuned information from my presentation at 360|AnDev Denver.
COMPAPPABCA49085rFunrAP__Practical Number 9 & 10.docxTashiBhutia12
This program performs 3D transformations (translation, rotation, scaling, and reflection) on a cube. It uses matrix transformations to manipulate the coordinates of the cube's vertices. The user selects the type of transformation and provides any necessary parameters. The cube is drawn before and after the transformation to visualize the effect. Key steps include setting up transformation matrices, applying them by pre-multiplying to a global matrix, transforming the vertex coordinates, and redrawing the cube.
This document discusses optimizing the Box2D physics engine using SIMD in JavaScript. It begins with background on Box2D, describing how it simulates 2D physics with rigid bodies. It then covers SIMD in JavaScript, how Emscripten can compile C/C++ to JavaScript with SIMD, and how Box2D was ported to JavaScript. Performance profiling showed the position constraint solver could benefit from SIMD. The document explores vectorizing math operations and specializing the solver to process groups of 4 constraints simultaneously.
The document discusses dynamic programming and provides examples of problems that can be solved using dynamic programming including unidirectional traveling salesman problem, coin change, longest common subsequence, and longest increasing subsequence. Source code is presented for solving these problems using dynamic programming including dynamic programming tables, tracing optimal solutions, and time complexity analysis. Various online judges are listed that contain sample problems relating to these dynamic programming techniques.
In the early days of computer science coding was viewed as an art. In the modern world of software engineering we may have lost the art to make way for rules and best practices. The International Obfuscated C Code Contest offers a chance for the coder to think beyond the rules of software engineering and unleash their creative side. We'll explore some of the more interesting entries in the past, take a closer look at some exotic C syntax, and finish up by exploring Bruce Holloway's 1986 entry.
From the Un-Distinguished Lecture Series (http://ws.cs.ubc.ca/~udls/). The talk was given Feb. 2, 2007
Automatically Describing Program Structure and Behavior (PhD Defense)Ray Buse
This document discusses code readability and summarization. It begins by presenting a sample code snippet and noting that understanding code is difficult. It then explores ways to quantify how readable code is, such as using Flesch-Kincaid readability scores and measuring inter-rater agreement through correlation statistics. The document proposes building a machine learning model to predict code readability based on code features and weights learned from human annotations. Potential code features discussed include line length, comments, and identifier length. The overall goal is to automatically describe program structure and behavior to aid in code comprehension.
This document discusses using the C to Go translation tool c2go to translate C code implementing quicksort algorithms into Go code. It provides examples of translating simple quicksort C code, improving the translation by using a configuration file, and how c2go handles standard C functions like qsort by translating them to their Go equivalents. The examples demonstrate how c2go can generate valid Go code from C code but may require some manual fixes or configuration to handle certain data structures or language differences.
The document discusses Java bytecode and how it compares to other compiled languages. It notes that Java bytecode is stack-based rather than register-based, which makes it easy to interpret but less performant. It provides examples of simple Java code and the resulting bytecode. It also discusses how just-in-time (JIT) compilers and ahead-of-time (AOT) compilers like Jack can optimize bytecode to produce more efficient machine code. Finally, it touches on language features like autoboxing that add overhead at the bytecode level.
This document discusses types of adders and provides details on half adders and full adders. It begins by identifying half adders and full adders as types of adders. It explains that digital computers perform arithmetic operations like addition and the basic operation is adding two binary digits. When adding more than two bits, the operation is called a full adder. Truth tables are provided for half adders and full adders. The document then shows the simplified sum of products form for a full adder using K-maps and provides the logic diagram. It concludes with assigning short notes on topics like manufacturing testing, functional testing, files and text I/O, and differentiating CPLD and FPGA architectures.
This document describes an experiment using the Newton-Raphson method to find the roots of nonlinear equations in MATLAB. Two nonlinear equations are given as an example: x^2+xy=10 and y+3xy^2=57. The MATLAB code implements the Newton-Raphson method to iteratively calculate the roots. For the given equations, the method converges after 15 iterations with roots of x=4.3937 and y=-2.1178. The experiment demonstrated the use of the Newton-Raphson method to solve nonlinear equations numerically in MATLAB.
The document summarizes a presentation titled "Yoyak" given by Heejong Lee at ScalaDays 2015. The presentation introduces Yoyak, a static analysis framework developed by the speaker. It covers the following topics:
- Static analysis and abstract interpretation theory
- Implementation highlights of the Yoyak framework
- Experiences using Scala in developing Yoyak
- The roadmap for future development of Yoyak
This document summarizes the K-means clustering algorithm. It provides an outline of the topics covered, which include an introduction to clustering and K-means, how to calculate K-means using steps 0 through 2, results and suggestions, and references. It then provides more detail on the three steps of K-means: 1) initialize centroids, 2) assign points to closest centroids, and 3) recalculate centroids. Pseudocode is provided to demonstrate how to code K-means in Visual Basic.
The document discusses Arm C Language Extensions (ACLE) for supporting Arm features in C and C++. It provides an overview of ACLE intrinsics and data types for SVE, NEON, and FP16. It describes how to include headers to use different Arm features and provides examples of using SVE ACLE intrinsics to vectorize a scalar loop.
The document discusses goal programming and its application to a case study of a medical manufacturing company. It introduces goal programming and describes how it can address multiple goals through deviations from target values. The case study establishes strategic, intermediate, and tactical goals for two medical products related to engineering cost, quality cost, production cost, setup time, delivery reliability and operations cost. A goal programming model is constructed to minimize deviations from these goals. The model is solved in steps to find optimal values for the decision variables to meet the goals.
The document discusses ESL (electronic system level) design and some challenges with adopting ESL flows that use C/C++ as a design entry language. It provides examples of how C/C++ code can unintentionally result in inefficient hardware implementations if the designer does not consider the hardware implications. The document advocates that ESL adoption needs to be driven by designer needs and preferences rather than management decisions. It also argues that ESL tools need to provide predictability of results, education for designers on the hardware implications of different coding styles, and robust verification methods for ESL to be widely adopted.
The document contains a function that returns the MAC address of the network adapter. It loads the rpcrt4.dll library and calls the UuidCreateSequential function to generate two GUIDs. It then compares bytes of the two GUIDs and concatenates the bytes to form the MAC address string if they are equal.
This document discusses simulation of a mobile robot. It includes the following sections:
1. Introduction of the Pioneer 3-DX mobile robot and goals of the simulation project.
2. Dynamics and kinematics equations for the robot, including equations of motion.
3. A Simulink block diagram for PD control of the robot. Equations for calculating joint accelerations from velocities are also presented.
In this paper, we present a complete digital signature message stream, just the way the RSA digital
signature scheme does it. We will focus on the operations with large numbers due to the fact that operating
with large numbers is the essence of RSA that cannot be understood by the usual illustrative examples with
small numbers[1].
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Psimd open64 workshop_2012-new
1. EXPERIENCE WITH PARTIAL
SIMDIZATION IN OPEN64 COMPILER
USING DYNAMIC PROGRAMMING
Dibyendu Das, Soham Sundar Chakraborty, Michael Lai
AMD
{dibyendu.das, soham.chakraborty, michael.lai}@amd.com
Open64 Workshop, June 15th , 2012, Beijing
2. AGENDA
Introduction to Partial Simdization
Overview of Barik-Zhao-Sarkar Dynamic Programming-based algorithm
Motivating Example – hot loop in gromacs
Our Approach
Initial Results
Conclusion and Future Work
2 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
3. PARTIAL SIMDIZATION
Effectively utilize and exploit the SIMD (vector) ISA of modern x86 microprocessors
Traditional loop-based vectorization adopt an all-or-nothing approach in SIMDizing loops - frequently
employing fission or other techniques to separate the non-vectorizable part from the vectorizable part
Partial simdization packs isomorphic instructions to SIMD instructions (if possible),
– Most of the opportunities still loop-based
– Need not have all instructions packed
SLP (Superworld-level-parallelism) [ Larsen et al.] one of the most well-known partial simidization
techniques
A simple example of partial simdization:
3 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
5. OVERVIEW OF BARIK-ZHAO-SARKAR (BZS) ALGORITHM
Dynamic Programming based approach
– Data Dependence Graph is built for each basic block
– k-tiles are built, each tile consists of k isomorphic independent operations
– Two sets of costs built
Scalar Cost of each node n
Vector/SIMD Cost of k-tiles
– [Pass 1] Dependence Graph traversed from source to sink, build scalar and vector costs, use dynamic
programming rules
– [Pass 2] Best costs are determined , decide between whether a k-tile should be SIMDized or not
Use scalar cost vs vector cost comparison for this purpose
– [Pass 3] Generate SIMD+scalar code
5 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
6. BARIK-ZHAO-SARKAR (BZS) ALGORITHM - ILLUSTRATED
Various combinations of packing ( assume 2-tiles) possible
– <ld a[it], ld b[it]> OR <ld a[it], ld c[it]> OR <ld a[it], d[it]> OR <ld a[it], e[it]> OR <ld a[it], f[it]> …
– Which is better ? Depends on whether <+1,+2> OR <+1,+3> OR <+1,+4> is better and so on
– vcost(<+1,+2>) = vcost(<ld a[it], ld c[it]>) + vcost(<ld b[it], ld d[it]>) OR
– vcost(<+1,+2>) = vcost(<ld a[it], ld b[it]>) + vcost(<ld c[it], ld d[it]>) + unpcost(a,b)+unpcost(c,d)+
pcost(a,c)+pcost(b,d) ( Dynamic Programming)
In actuality,
st r[it]
Total Scalar Cost = 20
Total Vector Cost = 21
DON’T SIMDIZE +
+ +
st i st j st k st l
+ 1 + 2 + 3 + 4
ld a[it] ld b[it] ld c[it] ld d[it] ld e[it] ld f[it] ld g[it] ld h[it]
6 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
7. A MOTIVATING EXAMPLE – HOT LOOP IN GROMACS
7 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
8. OUR APPROACH
Phase I LDID ix1
ILOAD ILOAD ILOAD
pos[j3] LDID iy1 pos[j3+1] LDID iz3 pos[j3+8]
– Initialization (Preparatory phase)
Build the Data Dependence Graph ( still STID jx1 STID jy1 STID jz3
basic-block oriented ), only for INNER loops,
preferably HOT
Topological sort and create levels ( provision LDID jx1 LDID jy1 LDID jz3
to do ALAP or ASAP )
– Difference with BZS approach which is SUB SUB SUB
ready-list based
– We statically build all the possible tiles to STID dx11 STID dy11 STID dz33
control compile cost and ease of design
LDID dx11 LDID dx11 LDID dy11 LDID dy11 LDID dz33 LDID dz33
MPY MPY MPY
ADD
ADD
STID rsq11
8 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
9. OUR APPROACH
Phase II
– Tiling
Pack isomorphic operations as k-tiles
In BZS k = SIMD width / scalar op width , ex: for simd width = 128, scalar = int, k = 4
Not the best mechanism to pick k
Iterative process, k = 2,3,4 …
Traverse through every level in topological order
– Pick k nodes with isomporphic ops , create k-tile <op1,op2, …, opk>
– Associate a dummy cost with the tile
– Not all isomporphic nodes at the same level checked for packing, too computationally intensive
– Use heuristics, ex: nodes with common parent/grandparent not packed OR pack nodes which appear as the left
child, right child etc.
9 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
10. OUR APPROACH
Phase III scost(n) = vcost(t1) + vcost(t2) +
scost(n) =scost(lchild(n))
unpcost(t1) + unpcost(t2) + op(n)
– Dynamic Programming + scost(rchild(n)) + op(n)
Traverse from Level = 0 (source) to Level = n n
maxLevel (sink)
– Build all the scalar costs, scost(n) for
every node n
lchild(n) rchild(n)
– Build the simd costs of the k-tiles lchild(n) rchild(n)
based on dynamic programming rules
– For each tile choose the best vector Tile: t1 Tile: t2
cost option at any point, store the
configuration
– Pick Best Costs/Configuration Tile: t o1 o2 ok
Traverse from Level = maxLevel(sink) to
Level = 0
o11 o12 o1k
– Pick k-tiles and chose those where Tile: t1
vcost(tile) < Σ scost(node that is part of
the tile) o21 o22 o2k
Tile: t2
– The non-chosen tiles are assumed to
be scalar operations vcost(t) = vcost(t1) + vcost(t2) +op(t)
– The chosen tiles are the SIMD ops
10 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
11. OUR APPROACH
Phase IV
– Code Generation
Generate WHIRL vector types for SIMD tiles
Generate unpacking/packing instructions at the boundary of non-
SIMD to SIMD and SIMD to non-SIMD code generation
Case 1
– Use SIMD intrinsics for packing/unpacking
Case I – General Case
Case 2 – Handling LDID/ILOAD tiles (also packing)
Case 3 – Handling STID/ISTORE tiles (also unpacking)
Arbitrary packing/unpacking not allowed
Case 3
Case 2
11 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
12. APPLYSIMD ALGORITHM Input: WN - the WHIRL Tree of the loop
Output: Simdized WHIRL Tree
Phase-I
Procedure ApplySimd() {
- Create dependence DAG among operations from 1: // Phase-I
data dependence graph. 2: /* Initialization: create the DAG from the WHIRL using data flow analysis. */
- Create topological levels for the DAG nodes. 3: _dag = CreateDAG();
4: /* level creation in DAG: topological ordering of the DAG nodes.*/
Phase-II 5: _setOfLevels = CreateLevelsForDAG();
- At each level
* for each operator having n DAG nodes 6: // Phase-II
> creates tile combos where k is the tile size. 7: /* possible vector tiles are created for each level. */
8: _setOfTiles = CreateVector KTilesForLevels();
Phase-III
-Compute scalar and vector cost for all tiles as per 9: // Phase-III
10: /* scalar and vector costs are computed according to
cost model following dynamic programming
11: the defined cost function using dynamic programming */
approach. 12: _setOfTiles< scost, vcost > = Find KTileSimpleBestCost();
- Choose the best set of tiles with minimum cost for 13: /* choose the best set of tiles for maximal benefits */
maximal benefit. 14: _chosenSetOfTiles< scost, vcost > = FindBestVectorScalar Ktiles();
Phase-IV 15: // Phase-IV
-Generate the simdized WHIRL code from which 16: /* generate the simdized code by transforming the WHIRL */
code generator generates simdized code in 17: _simdizedWHIRL = GenerateVectorScalarCode KTile();
consequent compiler phase. }
12 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
13. OUR APPROACH – MAJOR DIFFERENCES WITH BZS
Level-based DAG approach to create tiles
– More restricted compared to BZS which rely on an „ready list‟ but helps control compile-time
Implemented in the LNO
– Higher level implementation helps reason about the simidization implementation than bother about low-
level details.
– Easy vector type support in WHIRL which can be lowered
Iterative „k‟-tiling as opposed to a rigid selection of „k‟
– Helps with the natural structure of the loop
13 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
14. INITIAL RESULTS & EXPERIMENTS - GROMACS(INL1130)
Inl1130‟s innermost loop hottest at 70%+
Loop body demonstrates 3-way parallelism, ideally suited
for 3-tiling
Several complexities over the implementation outlined
– Simdizing reductions
– Supertiling ( identify some special patterns for better
code generation )
– LiveIn, LiveOut Code Generation
#Nodes #Levels #Nodes Simdized #Simdized %Simdized
tiles
763 43 678 226 88.8
14 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
15. INITIAL RESULTS & EXPERIMENTS
Implemented in the Open64 4.5.1 AMD release %speedup
Enabled under a special flag –LNO:simd=3 at –O3
and above 30
ApplySimd invoked in LNO, after regular
vectorization fails 20
LNO/simd.cxx – code home
Compile-time intensive due to explicit tile creation
10
Our main targets are gromacs (inl1130) (and namd,
WIP)
0
Not available in open64.net yet
Targets AVX, though 256 bit registers not utilized yet
15 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
16. CONCLUSION & FUTURE WORK
Illustrates the efficacy of the dynamic programming approach
First top-of-the-line compiler to effectively simdize gromacs with such huge gains
– Some efforts do simdize gromacs but uses old legacy compilers like SUIF
Can be extended to CFGs
Iterative “k”- tiling factor decided on percentage simdized
The underlying graph problem is identifying subgraph-isomporphism (hard). Refer to the paper in Slide 2
by Mahlke et al.
– Should we look at disjoint clusters before tiling ?
Still lot of work
– Challenges in reducing compile time overhead
– Code Generation too tied to the ISA (virtual vectors may help, see Zaks et al.)
– Scatter/Gather operations may make things attractive (Haswell, future AMD processors)
16 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public
17. REFERENCES
Efficient Selection of Vector Instructions using Dynamic Programming. Rajkishore Barik, Jisheng
Zhao, Vivek Sarkar (Rice University), MICRO 2010.
Efficient SIMD Code Generation for Irregular Kernels. Seonggun Kim and Hwansoo Han
Sungkyunkwan University, PPoPP 2012.
A Compiler Framework for Extracting Superword Level Parallelism. Jun Liu, Yuanrui Zhang, Ohyoung
Jang, Wei Ding, Mahmut Kandemir, The Pennsylvania State University, PLDI 2012.
SIMD Defragmenter: Efficient ILP Realization on Data-parallel Architectures. Yongjun Park (University
of Michigan, Ann Arbor), Sangwon Seo (University of Michigan, Ann Arbor), Hyunchul Park (Intel), Hyoun
Kyu Cho (University of Michigan, Ann Arbor) and Scott Mahlke (University of Michigan, Ann
Arbor), ASPLOS 2012.
17 | Experience with Partial Simdization in Open64 Compiler Using Dynamic Programming | June 15th, 2012 | Open64 Workshop 2012 | Public