We describe ocl, a Python library built on top of pyOpenCL and numpy. It allows programming GPU devices using Python. Python functions which are marked up using the provided decorator, are converted into C99/OpenCL and compiled using the JIT at runtime. This approach lowers the barrier to entry to programming GPU devices since it requires only
Python syntax and no external compilation or linking steps. The resulting Python program runs
even if a GPU is not available. As an example of application, we solve the problem of computing the covariance matrix for historical stock prices and determining the optimal portfolio according to Modern Portfolio Theory.
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
In this webinar presentation, ArrayFire COO Oded Green demonstrates best practices to help you quickly get started with OpenCL™ programming. Learn how to get the best performance from AMD hardware in various programming languages using ArrayFire. Oded discusses the latest advancements in the OpenCL™ ecosystem, including cutting edge OpenCL™ libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. Examples are shown in real code for common application domains.
Watch the webinar here: http://bit.ly/1obT0M2
For more developer resources, visit:
http://arrayfire.com/
http://developer.amd.com/
Follow us on Twitter: https://twitter.com/AMDDevCentral
See info in the slides for more contact information and resource links!
ADMS'13 High-Performance Holistic XML Twig Filtering Using GPUsty1er
Current state of the art in information dissemination com- prises of publishers broadcasting XML-coded documents, in turn selectively forwarded to interested subscribers. The de- ployment of XML at the heart of this setup greatly increases the expressive power of the profiles listed by subscribers, using the XPath language. On the other hand, with great expressive power comes great performance responsibility: it is becoming harder for the matching infrastructure to keep up with the high volumes of data and users. Traditionally, general purpose computing platforms have generally been favored over customized computational setups, due to the simplified usability and significant reduction of development time. The sequential nature of these general purpose com- puters however limits their performance scalability. In this work, we propose the implementation of the filtering infras- tructure using the massively parallel Graphical Processing Units (GPUs). We consider the holistic (no post-processing) evaluation of thousands of complex twig-style XPath queries in a streaming (single-pass) fashion, resulting in a speedup over CPUs up to 9x in the single-document case and up to 4x for large batches of documents. A thorough set of exper- iments is provided, detailing the varying effects of several factors on the CPU and GPU filtering platforms
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
In this webinar presentation, ArrayFire COO Oded Green demonstrates best practices to help you quickly get started with OpenCL™ programming. Learn how to get the best performance from AMD hardware in various programming languages using ArrayFire. Oded discusses the latest advancements in the OpenCL™ ecosystem, including cutting edge OpenCL™ libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. Examples are shown in real code for common application domains.
Watch the webinar here: http://bit.ly/1obT0M2
For more developer resources, visit:
http://arrayfire.com/
http://developer.amd.com/
Follow us on Twitter: https://twitter.com/AMDDevCentral
See info in the slides for more contact information and resource links!
ADMS'13 High-Performance Holistic XML Twig Filtering Using GPUsty1er
Current state of the art in information dissemination com- prises of publishers broadcasting XML-coded documents, in turn selectively forwarded to interested subscribers. The de- ployment of XML at the heart of this setup greatly increases the expressive power of the profiles listed by subscribers, using the XPath language. On the other hand, with great expressive power comes great performance responsibility: it is becoming harder for the matching infrastructure to keep up with the high volumes of data and users. Traditionally, general purpose computing platforms have generally been favored over customized computational setups, due to the simplified usability and significant reduction of development time. The sequential nature of these general purpose com- puters however limits their performance scalability. In this work, we propose the implementation of the filtering infras- tructure using the massively parallel Graphical Processing Units (GPUs). We consider the holistic (no post-processing) evaluation of thousands of complex twig-style XPath queries in a streaming (single-pass) fashion, resulting in a speedup over CPUs up to 9x in the single-document case and up to 4x for large batches of documents. A thorough set of exper- iments is provided, detailing the varying effects of several factors on the CPU and GPU filtering platforms
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
By Hiroshi Nakashima, Kyoto University / RIKEN AICS
As a part of the evaluation of Post-K’s compilers, we have been investigating compiled codes of vectorizable kernel loops in a particle-in-cell simulation program. This talk will reveal how the latest version of LLVM compiler (v1.4) works on the loops together with the qualitative and quantitative comparison with the code generated by Intel’s compiler for KNL.
Hiroshi Nakashima Bio
Currently working as a professor of Kyoto University’s supercomputer center (ACCMS) for R&D on HPC programming and supercomputer system architecture, as well as a visiting senior researcher of RIKEN AICS for the evaluation of Post-K computer and its compilers.
Email
h.nakashima@media.kyoto-u.ac.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
Application developers like to stick to the object-oriented style of programming by designing their application's logic as interaction between different object entities. In this process, every entity is modeled as a C++ class or structure. Array of structure (AoS) maintains a collection of those entities, which makes the code more readable and easier to maintain. But, this user-friendly code can potentially pose a challenge when it comes to vectorization efficiency. Often, the data needed for populating the vector register is gathered since the data is laid out in non-unit stride fashion in the main memory. To make the data layout more vector-friendly, developers often had to change their data structures manually from AoS to a structure of arrays (SoA). Single instruction multiple data (SIMD) layout templates from Intel help developers preserve an AoS interface while programming but, under the hood, the data structure is laid out in an SoA format. This is a win-win solution for both object-oriented and vector-friendly programming.
This presentation demonstrates how to analyze the memory access pattern in your performance-sensitive loops and how to enable the layout templates to make changes from constant- and variable-strided memory accesses to unit-strided memory access wherever possible.
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
In this presentation, we focus on an alternative approach that uses nodes that contain Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Programming models and the development tools are identical for these resources, greatly simplifying development. We discuss how the same models for vectorization and threading can be used across these compute resources to create software that performs well on them. We further propose an extension to the Intel® Threading Building Blocks (Intel® TBB) flow graph interface that enables intra-node distributed memory programming, simplifying communication, and load balancing between the processors and coprocessors. Finally, we validate this approach by presenting a benchmark of a risk analysis implementation that achieves record-setting performance.
Presentation of NvFX: an effect layer that allows encapsulation of GLSL and/or D3D shading language.
The basic concept follows the footprints of NVIDIA CgFX
https://github.com/tlorach/nvFX
TWJUG x Oracle Groundbreakers 2019 Taiwan - What’s New in Last Java VersionsJoseph Kuo
We all know that Java's release cycle now is 6 months. The Java 12 has been released in March 2019, with the next version, Java 13, to come in September 2019. This session is to introduce new features and functions in Java 12 and 13. Also, we will mention other relevant features released in the previous Java versions.
https://cyberjos.blog/java/seminar/oracle-groundbreakers-2019-taiwan-whats-new-in-last-java-versions/
JCConf 2018 - Retrospect and Prospect of JavaJoseph Kuo
It has been more than 2 decades since the first version of Java was released in 1996. As of today, Java has been applied in many different fields form large-scale distributed computing services with scalability and stability, to millions of various APPs installed in mobile devices/cellphones/cars all over the world. At this time when Java 11 is being ready to introduce more new enhancements and deprecate legacy libraries, let us retrospect the past history of Java from the beginning, focus on recent significant changes from Java 8 to 10, prospect new features included in Java 11, and speculate what functionalities may come out in the future.
https://cyberjos.blog/java/seminar/jcconf-2018-retrospect-and-prospect-of-java/
In the slide, i describe the basis of python programming and their function. If any doubt in the slide, contact me through mail or linked in. My mail id is mdsathees@gmail.com
Recorded video here:
http://on-demand.gputechconf.com/siggraph/2017/video/sig1757-tristan-lorach-vkFX-effective-approach-for-vulkan-api.html
Vulkan is a complex low-level API, full of structures and dedicated objects. Using it may be tedious and often leads to complicated source code. We propose here a way to define and use Vulkan components in a convenient and readable way. Then we will show how this infrastructure allows to introduce and use higher concepts, such as Techniques, Passes; and even how to instantiate resources, render-targets right from within the effect, making it self-sufficient and consistent as a general description. The overall purpose of this open-source project is to improve and enhance the use of Vulkan API, while keeping its strength and flexibility. This project can run in two different ways: either as a compiler generating C++ code for you; or at runtime, to load effects and use them right away.
vkFx comes from a former project called nvFx, presented few years ago. While nvFx was intended to be Generic (OpenGL & D3D compliant), vkFx is Vulkan-specific: so the project is thin and doesn’t break important paradigms that Vulkan requires to stay powerful.
Aggregate programming is a novel paradigm that addresses, at the core, many issues commonly found in the development of large-scale, situated, self-adaptive systems. It is a particular macro-programming approach where a developer expresses the behaviour of the system at the aggregate-level, by targeting the distributed computational machine which is given by the entire set of (possibly mobile and heterogeneous) networked devices pervading the environment. It is the model that takes care of turning a system-level behaviour specification into the concrete, device-centric programs executed locally by each component.
Aggregate computing is formally grounded in the field calculus, a minimal functional language that works with computational fields, i.e., distributed data structures mapping devices (digital representatives of space-time portions) to computational objects. Fields are a useful unifying abstraction for drawing a connection between the physical and the computational world, and between the local and global programming viewpoints. This approach is compositional, allowing to define layers of building blocks of increasing abstraction, and is also amenable to formal analyses.
In this talk, I will present scafi (SCAla with computational FIels), an aggregate computing framework for the Scala programming language which provides (i) an internal DSL for expressing aggregate computations as well as (ii) a library support for the configuration and execution of aggregate systems. There is no need to learn ad-hoc external DSLs anymore: with scafi, Scala programmers can instantaneously start playing with this new, intriguing development approach!
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
By Hiroshi Nakashima, Kyoto University / RIKEN AICS
As a part of the evaluation of Post-K’s compilers, we have been investigating compiled codes of vectorizable kernel loops in a particle-in-cell simulation program. This talk will reveal how the latest version of LLVM compiler (v1.4) works on the loops together with the qualitative and quantitative comparison with the code generated by Intel’s compiler for KNL.
Hiroshi Nakashima Bio
Currently working as a professor of Kyoto University’s supercomputer center (ACCMS) for R&D on HPC programming and supercomputer system architecture, as well as a visiting senior researcher of RIKEN AICS for the evaluation of Post-K computer and its compilers.
Email
h.nakashima@media.kyoto-u.ac.jp
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
Application developers like to stick to the object-oriented style of programming by designing their application's logic as interaction between different object entities. In this process, every entity is modeled as a C++ class or structure. Array of structure (AoS) maintains a collection of those entities, which makes the code more readable and easier to maintain. But, this user-friendly code can potentially pose a challenge when it comes to vectorization efficiency. Often, the data needed for populating the vector register is gathered since the data is laid out in non-unit stride fashion in the main memory. To make the data layout more vector-friendly, developers often had to change their data structures manually from AoS to a structure of arrays (SoA). Single instruction multiple data (SIMD) layout templates from Intel help developers preserve an AoS interface while programming but, under the hood, the data structure is laid out in an SoA format. This is a win-win solution for both object-oriented and vector-friendly programming.
This presentation demonstrates how to analyze the memory access pattern in your performance-sensitive loops and how to enable the layout templates to make changes from constant- and variable-strided memory accesses to unit-strided memory access wherever possible.
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
In this presentation, we focus on an alternative approach that uses nodes that contain Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Programming models and the development tools are identical for these resources, greatly simplifying development. We discuss how the same models for vectorization and threading can be used across these compute resources to create software that performs well on them. We further propose an extension to the Intel® Threading Building Blocks (Intel® TBB) flow graph interface that enables intra-node distributed memory programming, simplifying communication, and load balancing between the processors and coprocessors. Finally, we validate this approach by presenting a benchmark of a risk analysis implementation that achieves record-setting performance.
Presentation of NvFX: an effect layer that allows encapsulation of GLSL and/or D3D shading language.
The basic concept follows the footprints of NVIDIA CgFX
https://github.com/tlorach/nvFX
TWJUG x Oracle Groundbreakers 2019 Taiwan - What’s New in Last Java VersionsJoseph Kuo
We all know that Java's release cycle now is 6 months. The Java 12 has been released in March 2019, with the next version, Java 13, to come in September 2019. This session is to introduce new features and functions in Java 12 and 13. Also, we will mention other relevant features released in the previous Java versions.
https://cyberjos.blog/java/seminar/oracle-groundbreakers-2019-taiwan-whats-new-in-last-java-versions/
JCConf 2018 - Retrospect and Prospect of JavaJoseph Kuo
It has been more than 2 decades since the first version of Java was released in 1996. As of today, Java has been applied in many different fields form large-scale distributed computing services with scalability and stability, to millions of various APPs installed in mobile devices/cellphones/cars all over the world. At this time when Java 11 is being ready to introduce more new enhancements and deprecate legacy libraries, let us retrospect the past history of Java from the beginning, focus on recent significant changes from Java 8 to 10, prospect new features included in Java 11, and speculate what functionalities may come out in the future.
https://cyberjos.blog/java/seminar/jcconf-2018-retrospect-and-prospect-of-java/
In the slide, i describe the basis of python programming and their function. If any doubt in the slide, contact me through mail or linked in. My mail id is mdsathees@gmail.com
Recorded video here:
http://on-demand.gputechconf.com/siggraph/2017/video/sig1757-tristan-lorach-vkFX-effective-approach-for-vulkan-api.html
Vulkan is a complex low-level API, full of structures and dedicated objects. Using it may be tedious and often leads to complicated source code. We propose here a way to define and use Vulkan components in a convenient and readable way. Then we will show how this infrastructure allows to introduce and use higher concepts, such as Techniques, Passes; and even how to instantiate resources, render-targets right from within the effect, making it self-sufficient and consistent as a general description. The overall purpose of this open-source project is to improve and enhance the use of Vulkan API, while keeping its strength and flexibility. This project can run in two different ways: either as a compiler generating C++ code for you; or at runtime, to load effects and use them right away.
vkFx comes from a former project called nvFx, presented few years ago. While nvFx was intended to be Generic (OpenGL & D3D compliant), vkFx is Vulkan-specific: so the project is thin and doesn’t break important paradigms that Vulkan requires to stay powerful.
Aggregate programming is a novel paradigm that addresses, at the core, many issues commonly found in the development of large-scale, situated, self-adaptive systems. It is a particular macro-programming approach where a developer expresses the behaviour of the system at the aggregate-level, by targeting the distributed computational machine which is given by the entire set of (possibly mobile and heterogeneous) networked devices pervading the environment. It is the model that takes care of turning a system-level behaviour specification into the concrete, device-centric programs executed locally by each component.
Aggregate computing is formally grounded in the field calculus, a minimal functional language that works with computational fields, i.e., distributed data structures mapping devices (digital representatives of space-time portions) to computational objects. Fields are a useful unifying abstraction for drawing a connection between the physical and the computational world, and between the local and global programming viewpoints. This approach is compositional, allowing to define layers of building blocks of increasing abstraction, and is also amenable to formal analyses.
In this talk, I will present scafi (SCAla with computational FIels), an aggregate computing framework for the Scala programming language which provides (i) an internal DSL for expressing aggregate computations as well as (ii) a library support for the configuration and execution of aggregate systems. There is no need to learn ad-hoc external DSLs anymore: with scafi, Scala programmers can instantaneously start playing with this new, intriguing development approach!
.NET Core, ASP.NET Core Course, Session 3aminmesbahi
Session 3,
Introducing to Compiler
What is the LLVM?
LLILC
RyuJIT
AOT Compilation
Preprocessors and Conditional Compilation
An Overview on Dependency Injection
The Hitchhiker's Guide to Faster Builds. Viktor Kirilov. CoreHard Spring 2019corehard_by
C++ is known for things such as performance, expressiveness, the lack of a standard build system and package management, complexity and long compile times. The inability to iterate quickly is one of the biggest killers of productivity. This talk is aimed at anyone interested in improving the last of these points - it will provide insights into why compilation (and linking) take so long for C++ and will then provide an exhaustive list of techniques and tools to mitigate the problem, such as: - tooling and infrastructure - hardware, build systems, caching, distributed builds, diagnostics of bottlenecks, code hygiene - techniques - unity builds, precompiled headers, linking (static vs shared libraries) - source code modification - the PIMPL idiom, better template use, annotations - modules - what they are, when they are coming to C++ and what becomes obsolete because of them
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.
Presentation of a paper accepted in Supercomputing Frontiers Asia 2022
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
In the framework of the Intel Parallel Computing Centre at the Research Campus Garching in Munich, our group at LRZ presents recent results on performance optimization of Gadget-3, a widely used community code for computational astrophysics. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm and focus on threading parallelism optimization, change of the data layout into Structure of Arrays (SoA), compiler auto-vectorization and algorithmic improvements in the particle sorting. We measure lower execution time and improved threading scalability both on Intel Xeon (2.6× on Ivy Bridge) and Xeon Phi (13.7× on Knights Corner) systems. First tests on second generation Xeon Phi (Knights Landing) demonstrate the portability of the devised optimization solutions to upcoming architectures.
Come costruire una Platform As A Service con Docker, Kubernetes Go e JavaCodemotion
"Come costruire una Platform As A Service con Docker, Kubernetes Go e Java" by Massimiliano Dessì
Per automatizzare la CI e la CD, durante sviluppo, test, in preproduzione e in produzione si utilizzano le tecniche chiamate attualmente DevOps, in locale con Vagrant oppure su una PAAS su cloud, privati o pubblici. Possiamo costruire una PAAS scalabile utilizzando solo Docker, Docker e Kubernetes oppure soluzioni già pronte come Openshift 3 (che sta sopra Docker e Kubernetes). Nella presentazione vedremo come avere questi tre tipi di PAAS con in più uno strato di orchestrazione in GO/Java e Ansible per automatizzare il comportamento in base ad eventi monitorati
Kubernetes has become the defacto standard as a platform for container orchestration. Its ease of extending and many integrations has paved the way for a wide variety of data science and research tooling to be built on top of it.
From all encompassing tools like Kubeflow that make it easy for researchers to build end-to-end Machine Learning pipelines to specific orchestration of analytics engines such as Spark; Kubernetes has made the deployment and management of these things easy. This presentation will showcase some of the larger research tools in the ecosystem and go into how Kubernetes has enabled this easy form of application management.
This PPT File helps IT freshers with the Basic Interview Questions, which will boost there confidence before going to the Interview. For more details and Interview Questions please log in www.rekruitin.com and click on Job Seeker tools. Also register on the and get employed.
By ReKruiTIn.com
Similar to OpenCL programming using Python syntax (20)
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
Using social media in education provides learners with an informal way for communication. Informal communication tends to remove barriers and hence promotes student engagement. This paper presents our experience in using three different social media technologies in teaching software project management course. We conducted different surveys at the end of every semester to evaluate students’ satisfaction and engagement. Results show that using social media enhances students’ engagement and satisfaction. However, familiarity with the tool is an important factor for student satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The amount of piracy in the streaming digital content in general and the music industry in specific is posing a real challenge to digital content owners. This paper presents a DRM solution to monetizing, tracking and controlling online streaming content cross platforms for IP enabled devices. The paper benefits from the current advances in Blockchain and cryptocurrencies. Specifically, the paper presents a Global Music Asset Assurance (GoMAA) digital currency and presents the iMediaStreams Blockchain to enable the secure dissemination and tracking of the streamed content. The proposed solution provides the data owner the ability to control the flow of information even after it has been released by creating a secure, selfinstalled, cross platform reader located on the digital content file header. The proposed system provides the content owners’ options to manage their digital information (audio, video, speech, etc.), including the tracking of the most consumed segments, once it is release. The system benefits from token distribution between the content owner (Music Bands), the content distributer (Online Radio Stations) and the content consumer(Fans) on the system blockchain.
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This paper discusses the importance of verb suffix mapping in Discourse translation system. In
discourse translation, the crucial step is Anaphora resolution and generation. In Anaphora
resolution, cohesion links like pronouns are identified between portions of text. These binders
make the text cohesive by referring to nouns appearing in the previous sentences or nouns
appearing in sentences after them. In Machine Translation systems, to convert the source
language sentences into meaningful target language sentences the verb suffixes should be
changed as per the cohesion links identified. This step of translation process is emphasized in
the present paper. Specifically, the discussion is on how the verbs change according to the
subjects and anaphors. To explain the concept, English is used as the source language (SL) and
an Indian language Telugu is used as Target language (TL)
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The using of information technology resources is rapidly increasing in organizations,
businesses, and even governments, that led to arise various attacks, and vulnerabilities in the
field. All resources make it a must to do frequently a penetration test (PT) for the environment
and see what can the attacker gain and what is the current environment's vulnerabilities. This
paper reviews some of the automated penetration testing techniques and presents its
enhancement over the traditional manual approaches. To the best of our knowledge, it is the
first research that takes into consideration the concept of penetration testing and the standards
in the area.This research tackles the comparison between the manual and automated
penetration testing, the main tools used in penetration testing. Additionally, compares between
some methodologies used to build an automated penetration testing platform.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
In order to treat and analyze real datasets, fuzzy association rules have been proposed. Several
algorithms have been introduced to extract these rules. However, these algorithms suffer from
the problems of utility, redundancy and large number of extracted fuzzy association rules. The
expert will then be confronted with this huge amount of fuzzy association rules. The task of
validation becomes fastidious. In order to solve these problems, we propose a new validation
method. Our method is based on three steps. (i) We extract a generic base of non redundant
fuzzy association rules by applying EFAR-PN algorithm based on fuzzy formal concept analysis.
(ii) we categorize extracted rules into groups and (iii) we evaluate the relevance of these rules
using structural equation model.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
2. 58 Computer Science & Information Technology (CS & IT)
C99 while the CUDA dialect is more powerful and, for example, supports templates.
A major complexity in writing CUDA and OpenCL code consists in having to write code inside
code (the kernel managed by the host). This process can be somewhat simplified using a different
language for the host. For example, the pyCUDA and pyOpenCL libraries created by Andreas
Klöckner[3] allow compiling, queuing, and running kernels (in CUDA and OpenCL respectively)
from inside a Python program. They are integrated with numpy[4], the Python numeric library.
In this paper we discuss a library called ocl which is built on top of pyOpenCL. It allows writing
both the host program and the kernels using the Python languages without loss of performance
compared to native OpenCL. ocl consists of two parts:
• A thin layer on top of pyOpenCL which encapsulates the device state (consisting of a
context and a task queue).
• A function decorator which, at runtime, converts decorated Python functions into C99
code and uses the OpenCL JIT to compile them.
Python + numpy + pyOpenCL + ocl allow development of parallel programs which run
everywhere (CPUs and GPUs), are written in a single language, and do not require any outside
compilation, linking, or other building step.
ocl is similar to Cython[8] and Clyther[9] in the sense that it works by parsing the Python code
into a Python Abstract Syntax Tree (AST) and then serializing the AST into C99 code. There are
differences between Clyther and ocl: The implementation is different. The semantic used to build
the kernel body is different and closer to C in the ocl case. Moreover, the ocl implementation is
extensible and, in fact, it can be used to generate and compile C code (without the need for
OpenCL, like Cython) but it can also be used to generate JavaScript code (to run, for example, in
a browser). Javascript code generation is beyond the scope of this paper since its application is in
web development, not high performance computing.
In order to explain the syntax of ocl we will consider first the simple example of adding two
vectors and then a more complex financial example of computing the correlation matrix for
arithmetic return of multiple time series and determine the optimal portfolio according to Modern
Portfolio Theory.
2. SIMPLE EXAMPLE
In our first example we consider the following Python code which defines two arrays, fills them
with random Gaussian floating numbers, and adds them using the available devices.
from ocl import Device
import numpy
import random
n = 1000000
a = numpy.zeros(n, dtype=numpy.float32)
b = numpy.zeros(n, dtype=numpy.float32)
for k in xrange(n):
a[k] = random.gauss(0,1)
3. Computer Science & Information Technology (CS & IT) 59
b[k] = random.gauss(0,1)
device = Device()
a_buffer = device.buffer(source=a, mode=device.flags.READ_ONLY)
b_buffer = device.buffer(source=b)
@device.compiler.define_kernel(a='global:const:ptr_float',
b='global:ptr_float')
def add(a, b):
k = new_int(get_global_id(0))
b[k] = a[k]+b[k]
program = device.compile()
program.add(device.queue, [n], None, a_buffer, b_buffer)
b = device.retrieve(b_buffer,shape=(n,))
In this example:
• Lines 1-3 import the requires modules.
• Lines 6-7 define two numpy arrays (a and b) of size n in the main RAM memory and fill
them with zeros.
• Lines 9-11 populate the arrays a and b with random Gaussian numbers with mean 0 and
standard deviation 1.
• Line 13 determines the available device and incorporates its state.
• Lines 14-15 copies the arrays a and b into the memory of the device represented by
a_buffer and b_buffer respectively.
• Lines 17-21 define a new kernel called “add”.
• Line 17-18 decorate the “add” function to specify it is a kernel and needs to be converted
into C99 code and declare the types of the function arguments.
• Lines 19-21 define the body of the kernel.
• Line 20 runs on the device (one instance per thread) and on each thread returns a different
value for k.
• Line 23 converts and compiles the kernel, and loads it into the device.
• Line 24 calls the function “add”. More precisely it queues [n] threads each calling the
function “add” and determines that a_buffer and b_buffer are to be passed as arguments.
• Line 25 retrieves the output array b_buffer from the device and copies it into b.
The program above is run with a single Bash shell command:
python myprogram.py
Notice how the kernel function “add” is preceded by the decorator
@device.compiler.define_kernel(...)
This decorator performs introspection of the function at runtime, parses the body of the function
into an Abstract Syntax Tree (AST) for later conversion to C99/OpenCL code. This allows us to
write OpenCL code using Python syntax.
Supported control structures and commands include def, if..elif else, for ... range, while, break,
and continue.
4. 60 Computer Science & Information Technology (CS & IT)
The decorated code is converted at runtime into the following Python AST
FunctionDef(name='add',
args=arguments(args=[Name(id='a'), Name(id='b', ctx=Param())]),
body=[Assign(targets=[Name(id='k')],
value=Call(func=Name(id='new_int'),
args=[Call(func=Name(id='get_global_id'),
args=[Num(n=0)]]),
Assign(targets=[Subscript(value=Name(id='b'),
slice=Index(value=Name(id='k'))],
value=BinOp(left=Subscript(value=Name(id='a'),
slice=Index(value=Name(id='k'))),
op=Add(),
right=Subscript(value=Name(id='b'),
slice=Index(value=Name(id='k'))))),
Return(value=Name(id='None'))])
(we simplified the AST by omitting irrelevant parameters).
The call to device.compile(...) converts the AST into the following code:
__kernel void add(__global const float* a, __global float* b) {
int k;
k = get_global_id(0);
b[k] = (a[k] + b[k]);
}
__kernel is an OpenCL modifier which declares the function to be a kernel. __global and
__local are used to declared the type of memory storage: global of the device or local of the core,
respectively.
One complication is that C99/OpenCL is a statically typed language while Python is only
strongly typed. Therefore one needs a way to declare the type of variables using Python syntax.
For the function arguments this is done in the decorator arguments. For example:
a = "global:const:ptr_float"
is converted into
__global const float *a
The type of local variables is defined using the pseudo-casting operators new_int, new_float,
etc. For example:
k = new_int(get_global_id(0))
is converted into
int k;
k = get_global_id(0);
Variable types must be declared when first assigned. The converter checks that this is the case to
5. Computer Science & Information Technology (CS & IT) 61
prevent further and more obscure compile time errors.
In this example, get_global_id(0) is an OpenCL function user to identify the rank of the current
running instance of the kernel. If one queues [n] kernel tasks, it will return all values from 0 to
n−1, a different value for each running task. The order in which queued tasks are executed is not
guaranteed.
The return type of a function is determined automatically but one must return a variable (or
None), not an expression.
Other variable types and pointers can be defined and manipulated using the syntax shown in the
following table:
ocl C99/OpenCL
x = new_type(...)
x = new_prt_type(...)
x = new_prt_prt_type(...)
ADDR(x)
REFD(x)
CAST(prt_type,x)
type x = ...;
type *x = ...;
type **x = ...;
&x
*x
(type*)x
OpenCL provides special types of variables to take advantages of vector registries. For example
float4 can store four floating point numbers called x, y, z, and w. These special vector types can
be used from ocl as follows:
q = new_float4((0.0,0.0,0.0,0.0))
q.x = 1.0
Notice that the goal of ocl is not to translate arbitrary Python code into C99/OpenCL but to allow
the use of Python syntax to write OpenCL kernels and functions. Only C99/OpenCL variable
types are allowed, not Pyhton types such as lists and dictionaries. This may be supported in future
versions of ocl. Moreover all functions called in the Python code which constitutes the body of
the add function are intended to be OpenCL functions not Python functions. While Python code is
normally garbage collected, the decorated code runs on the device as C code and it is not garbage
collected therefore explicit memory management may be necessary:
@device.compiler.define_kernel(...)
def some_kernel(...):
d = new_ptr_float(malloc(size))
...
free(d)
which would be converted into:
__kernel void some_kernel(...):
float* d;
d = (float*)malloc(size);
...
6. 62 Computer Science & Information Technology (CS & IT)
free(d);
}
One can define multiple kernels within the same program and call them when needed. One can
use the @device.compiler.define(...) decorator to define other functions to be executed on the
device. They will be visible and callable by the kernels, but they will not be callable from the host
program.
One can pass constants from the Python context to the OpenCL context:
device.compile(..., constants = dict(n = m))
where n is a constant in the kernel and m is a variable in the host program.
Additional C files can be included using the syntax
device.compile(..., inlcudes = ['#include <math.h>'])
where includes is a list of #include statements.
3. FINANCIAL APPLICATION EXAMPLE
In the following example, we solve the typical financial problem of analyzing multiple time series
such as stock prices, p[k,t] (simulated stock prices). For each time series k we compute the
arithmetic daily returns, r[k,t], and the average returns, mu[k]. We then compute the covariance
matrix, cov[i,j], and the correlation matrix, cor[i,j]. We use different kernels for each part of the
computation.
Finally, to make the application more practical, we use Modern Portfolio Theory [10] to compute
a tangency portfolio which maximizes the Sharpe ratio, i.e. return/risk under the assumption of
normal returns:
Here µ is the vector of average returns (mu), Σ is the covariance matrix (cov), and is the input
risk free interest rate. The tangency portfolio is identified by the vector x (aka array x in the code)
whose terms indicate the amount to be invested in each stock (must add up to one dollar). We
perform this maximization on the CPU to demonstrate integration with the numpy Linear Algebra
package.
We use the symbols i and j to identify the stock time series, and the symbol t for time (for daily
data t is a day). n is the number of stocks and m is the number of trading days.
We use the canvas[6] library, based on the Python matplotlib library, to display one of the stock
price series and the resulting correlation matrix. Below is the complete code. The output from the
code can be seen in fig. 1.
from ocl import Device
from canvas import Canvas
import random
7. Computer Science & Information Technology (CS & IT) 63
import numpy
from math import exp
n = 1000 # number of time series
m = 250 # number of trading days for time series
p = numpy.zeros((n,m), dtype=numpy.float32)
r = numpy.zeros((n,m), dtype=numpy.float32)
mu = numpy.zeros(n, dtype=numpy.float32)
cov = numpy.zeros((n,n), dtype=numpy.float32)
cor = numpy.zeros((n,n), dtype=numpy.float32)
for k in xrange(n):
p[k,0] = 100.0
for t in xrange(1,m):
c = 1.0 if k==0 else (p[k-1,t]/p[k-1,t-1])
p[k,t] = p[k,t-1]*exp(random.gauss(0.0001,0.10))*c
device = Device()
p_buffer = device.buffer(source=p, mode=device.flags.READ_ONLY)
r_buffer = device.buffer(source=r)
mu_buffer = device.buffer(source=mu)
cov_buffer = device.buffer(source=cov)
cor_buffer = device.buffer(source=cor)
@device.compiler.define_kernel(p='global:const:ptr_float',
r='global:ptr_float')
def compute_r(p, r):
i = new_int(get_global_id(0))
for t in range(0,m-1):
r[i*m+t] = p[i*m+t+1]/p[i*m+t] - 1.0
@device.compiler.define_kernel(r='global:ptr_float',
mu='global:ptr_float')
def compute_mu(r, mu):
i = new_int(get_global_id(0))
sum = new_float(0.0)
for t in range(0,m-1):
sum = sum + r[i*m+t]
mu[i] = sum/(m-1)
@device.compiler.define_kernel(r='global:ptr_float',
mu='global:ptr_float', cov='global:ptr_float')
def compute_cov(r, mu, cov):
i = new_int(get_global_id(0))
j = new_int(get_global_id(1))
sum = new_float(0.0)
for t in range(0,m-1):
sum = sum + r[i*m+t]*r[j*m+t]
cov[i*n+j] = sum/(m-1)-mu[i]*mu[j]
8. 64 Computer Science & Information Technology (CS & IT)
@device.compiler.define_kernel(cov='global:ptr_float',
cor='global:ptr_float')
def compute_cor(cov, cor):
i = new_int(get_global_id(0))
j = new_int(get_global_id(1))
cor[i*n+j] = cov[i*n+j] / sqrt(cov[i*n+i]*cov[j*n+j])
program = device.compile(constants=dict(n=n,m=m))
q = device.queue
program.compute_r(q, [n], None, p_buffer, r_buffer)
program.compute_mu(q, [n], None, r_buffer, mu_buffer)
program.compute_cov(q, [n,n], None, r_buffer, mu_buffer, cov_buffer)
program.compute_cor(q, [n,n], None, cov_buffer, cor_buffer)
r = device.retrieve(r_buffer,shape=(n,m))
mu = device.retrieve(mu_buffer,shape=(n,))
cov = device.retrieve(cov_buffer,shape=(n,n))
cor = device.retrieve(cor_buffer,shape=(n,n))
points = [(x,y) for (x,y) in enumerate(p[0])]
Canvas(title='Price').plot(points).save(filename='price.png')
Canvas(title='Correlations').imshow(cor).save(filename='cor.png')
rf = 0.05/m # input daily risk free interest rate
x = numpy.linalg.solve(cov,mu-rf) # cov*x = (mu-rf)
x *= 1.00/sum(x) # assumes 1.00 dollars in total investment
open('optimal_portfolio','w').write(repr(x))
Notice how the memory buffers are always 1-dimensional therefore the i,j indexes have to be
mapped into a 1D index i*n+j. Also notice that while kernels compute_r and compute_mu are
called [n] times (once per stock k), kernels compute_cov and compute_cor are called [n,n] times,
once per each couple of stocks i,j. The values of i,j are retrieved by get_global_id(0) and (1)
respectively.
In this program we have defined multiple kernels and complied them at once. We call one kernel
at the time to make sure that the call to the previous kernel is completed before running the next
one.
9. Computer Science & Information Technology (CS & IT) 65
Figure 1: The image on the left shows one of the randomly generated stock price histories. The image on
the right represents the computed correlation matrix. Rows and columns corresponds to stock returns and
the color at the intersection is their correlation (red for high correlation and blue for no correlation). The
resulting shape is an artifact of the algorithm used to generate random data.
4. OCL WITHOUT OpenCL
ocl can be used to generate C99 code, compile it, and run it without OpenCL. In this case the
code always runs on the CPU and not on the GPU. Here is a simple example:
from ocl import Compiler
c99 = Compiler()
@c99.define(n='int')
def factorial(n):
output = 1
for k in range(1, n + 1):
output = output * k
return output
compiled = c99.compile()
print(compiled.factorial(10))
assert compiled.factorial(10) == factorial(10)
In this example, we have replaced device.compiler with c99 = Compiler() since “device” is an
OpenCL specific concept. We have been careful to engineer the function “factorial” so that it can
run both in Python (factorial) and compiled in C99 (compiled.factorial). This is not always
possible but it possible for functions that only call mathematical libraries, such as our factorial
function. Notice that c99.compile() requires the presence of gcc installed instead of the OpenCL
JIT.
5. CONCLUSIONS
In this paper we provide an introduction to ocl, a library that allows writing OpenCL programs
and kernels using exclusively Python syntax. ocl is built on top of pyOpenCL[2], meta[7], and
numpy[4] (the Python Scientific Libraries). The Python code is converted to C99/OpenCL and
compiled at run-time. It can run on any OpenCL compatible device including ordinary Intel/AMD
CPUs, Nvidia/ATI GPUs, and ARM CPUs with the same performance as native OpenCL code.
10. 66 Computer Science & Information Technology (CS & IT)
Our approach does not necessarily simplifies the algorithms but it makes them more readable,
portable, and eliminates external compilation and linking steps thus lowering the barrier of entry
to GPU programming. It does not eliminate the need to understand OpenCL semantic but allows
the use of a simpler syntax.
As an example of application we considered the computation of the covariance and correlation
matrices from financial data series, in order determine the optimal portfolio according to Model
Portfolio Theory [10].
We plan to extend ocl to support a richer syntax, generate CUDA as well as OpenCL code, and
provide better debugging information. Moreover, since Python code can easily be serialized and
deserialized at runtime, we plan to improve the library to allow distributed computations where
the code is written once (in Python), distributed, then compiled and ran on different worker nodes
using heterogenous devices.
ACKNOWLEDGEMENTS
I thank Andreas Klöckner for pyOpenCL and the excellent documentation.
REFERENCES
[1] Lindholm, Erik, et al. “NVIDIA Tesla: A unified graphics and computing architecture.” Micro, IEEE
28.2 (2008): 39-55.
[2] Munshi, Aaftab. “OpenCL: Parallel Computing on the GPU and CPU.” SIGGRAPH, Tutorial (2008).
[3] Klöckner, Andreas, et al. “Pycuda and pyopencl: A scripting-based approach to gpu run-time code
generation.” Parallel Computing 38.3 (2012): 157-174.
[4] Oliphant, Travis E. A Guide to NumPy. Vol. 1. USA: Trelgol Publishing, 2006.
[5] https://github.com/mdipierro/ocl
[6] https://github.com/mdipierro/canvas
[7] http://srossross.github.com/Meta/html/index.html
[8] Behnel, Stefan, et al. “Cython: The best of both worlds.” Computing in Science & Engineering 13.2
(2011): 31-39.
[9] http://srossross.github.com/Clyther/
[10] Markowitz, Harry M. “Foundations of portfolio theory.” The Journal of Finance 46.2 (2012):
469-477.