1) Object oriented programming can lead to performance issues on modern hardware due to excessive data encapsulation. Keeping related data together in memory is important for performance.
2) Converting a scene graph from an object oriented hierarchy to linear passes can improve performance by processing data in a contiguous, sequential manner that optimizes cache usage. This involves separating transform updates from bounding sphere calculations.
3) Other optimizations include using custom allocators to store related data consecutively in memory, removing virtual functions, and processing all nodes in a scene globally rather than through each object.
Supervised Learning Algorithms - Analysis of different approachesPhilip Yankov
This talk is about - why AI/ML is so important today and how to follow an approach of doing Supervised Machine Learning algorithms from prototype to production - how to choose and transform the data, how to choose the model/algorithm and evaluate it so you have the best results in the future.
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
Deep learning models can learn hierarchical feature representations from raw input data. These learned features can then be used to build simple classifiers that achieve high accuracy, even when training data is limited. Transfer learning involves using features extracted from a model pre-trained on a large dataset to build classifiers for other related problems. This approach has been shown to outperform traditional feature engineering with hand-designed features. Deep features extracted from neural networks trained on large image or text datasets have proven to work well as general purpose features for other visual and language problems.
04 accelerating dl inference with (open)capi and posit numbersYutaka Kawai
This was presented by Louis Ledoux and Marc Casas at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/1a/presentation_louis_ledoux_posit.pdf
This document provides an introduction to deep learning. It begins by discussing modeling human intelligence with machines and the history of neural networks. It then covers concepts like supervised learning, loss functions, and gradient descent. Deep learning frameworks like Theano, Caffe, Keras, and Torch are also introduced. The document provides examples of deep learning applications and discusses challenges for the future of the field like understanding videos and text. Code snippets demonstrate basic network architecture.
Supervised Learning Algorithms - Analysis of different approachesPhilip Yankov
This talk is about - why AI/ML is so important today and how to follow an approach of doing Supervised Machine Learning algorithms from prototype to production - how to choose and transform the data, how to choose the model/algorithm and evaluate it so you have the best results in the future.
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
Deep learning models can learn hierarchical feature representations from raw input data. These learned features can then be used to build simple classifiers that achieve high accuracy, even when training data is limited. Transfer learning involves using features extracted from a model pre-trained on a large dataset to build classifiers for other related problems. This approach has been shown to outperform traditional feature engineering with hand-designed features. Deep features extracted from neural networks trained on large image or text datasets have proven to work well as general purpose features for other visual and language problems.
04 accelerating dl inference with (open)capi and posit numbersYutaka Kawai
This was presented by Louis Ledoux and Marc Casas at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/1a/presentation_louis_ledoux_posit.pdf
This document provides an introduction to deep learning. It begins by discussing modeling human intelligence with machines and the history of neural networks. It then covers concepts like supervised learning, loss functions, and gradient descent. Deep learning frameworks like Theano, Caffe, Keras, and Torch are also introduced. The document provides examples of deep learning applications and discusses challenges for the future of the field like understanding videos and text. Code snippets demonstrate basic network architecture.
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This is a 2 hours overview on the deep learning status as for Q1 2017.
Starting with some basic concepts, continue to basic networks topologies , tools, HW/Accelerators and finally Intel's take on the the different fronts.
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
A practical talk by Anirudh Koul aimed at how to run Deep Neural Networks to run on memory and energy constrained devices like smart phones. Highlights some frameworks and best practices.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, CEO and Founder of Auviz Systems, presents the "Trade-offs in Implementing Deep Neural Networks on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Video and images are a key part of Internet traffic—think of all the data generated by social networking sites such as Facebook and Instagram—and this trend continues to grow. Extracting usable information from video and images is thus a growing requirement in the data center. For example, object and face recognition are valuable for a wide range of uses, from social applications to security applications. Deep neural networks are currently the most popular form of convolutional neural networks (CNN) used in data centers for such applications. 3D convolutions are a core part of CNNs. Nagesh presents alternative implementations of 3D convolutions on FPGAs, and discusses trade-offs among them.
Moving Toward Deep Learning Algorithms on HPCC SystemsHPCC Systems
The document discusses implementing optimization algorithms like L-BFGS on the HPCC Systems platform. It provides an overview of L-BFGS and describes how HPCC Systems uses a dataflow programming model and the ECL language to distribute computations across clusters. Examples are given of using ECL to implement L-BFGS locally and globally, including for applications like softmax regression.
This document discusses optimizing object-oriented code for performance. It begins with an overview of object-oriented programming and how CPU and memory performance have changed significantly since C++ was first created. It then analyzes a common scene tree example and finds it is slow due to excessive cache misses from scattered data. The solution is to restructure the code to have homogeneous, sequential data by allocating nodes and matrices contiguously in memory. Processing data in order and removing virtual function calls further improves performance. Prefetching is also able to reduce cache misses, resulting in a 6x speedup over the original implementation. The key lessons are to optimize for data locality and consider data-oriented design principles when performance is important.
Pitfalls of object_oriented_programming_gcap_09Royce Lu
This document summarizes some of the pitfalls of object-oriented programming and provides recommendations to improve performance. It discusses how CPU and memory performance have changed significantly since the development of C++, with memory access now being much slower relative to CPU speed. This can negatively impact object-oriented code that encapsulates both data and behavior, leading to excessive cache misses. The document recommends using a data-oriented design that processes data in a linear, sequential way to optimize for cache usage and minimize misses. It provides an example comparing object and data-oriented approaches to updating and culling a scene graph.
This document provides an overview of JavaFX, including its target developers, strategy, architecture, and roadmap. Some key points:
- JavaFX is designed for both native and rich web applications using Java. It aims to position Java as the leading platform for both domains.
- The architecture includes a retained mode scene graph, CSS support, layout capabilities, 3D rendering, and a WebView component for embedding web content.
- Controls can be styled using CSS. The default stylesheet is called Caspian.
- Oracle is committed to continuing development of JavaFX and maintaining an open Java ecosystem.
The document summarizes post-exploitation techniques on OSX and iPhone. It describes a technique called "userland-exec" that allows executing applications on OSX without using the kernel. This technique was adapted to work on jailbroken iPhones by injecting a non-signed library and hijacking the dynamic linker (dlopen) to map and link the library. With some additional patches, the authors were able to load an arbitrary non-signed library into the address space of a process on factory iPhones, representing the first reliable way to execute payloads on these devices despite code signing protections.
The document discusses data-oriented design principles for game engine development in C++. It emphasizes understanding how data is represented and used to solve problems, rather than focusing on writing code. It provides examples of how restructuring code to better utilize data locality and cache lines can significantly improve performance by reducing cache misses. Booleans packed into structures are identified as having extremely low information density, wasting cache space.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
A separately excited dc motor is driven from a 240v, 50HZ supply via a HC
SCR-bridge with a fly-wheel diode. The motor has an armature resistance
1Ω, an armature voltage constant Kv of 0.8 V. s/rad. The field current is
constant. Assume steady armature current. Determine the armature current
and torque for 1600 rpm and a firing angle delay of a) 30° b) 60
The beautiful thing about software engineering is that it gives you the warm and fuzzy illusion of total understanding: I control this machine because I know how it operates. This is the result of layers upon layers of successful abstractions, which hide immense sophistication and complexity. As with any abstraction, though, these sometimes leak, and that's when a good grounding in what's under the hood pays off.
The second talk in this series peels a few layers of abstraction and takes a look under the hood of our "car engine", the CPU. While hardly anyone codes in assembly language anymore, your C# or JavaScript (or Scala or...) application still ends up executing machine code instructions on a processor; that is why Java has a memory model, why memory layout still matters at scale, and why you're usually free to ignore these considerations and go about your merry way.
You'll come away knowing a little bit about a lot of different moving parts under the hood; after all, isn't understanding how the machine operates what this is all about?
(From a talk given at BuildStuff 2016 in Vilnius, Lithuania.)
This document discusses using data-oriented design to write efficient code. It provides an example comparing iterating by columns versus rows in a matrix, showing that iterating by rows is 2.5x faster due to improved cache performance. It discusses concepts like cache misses, memory hierarchies, and how CPUs work. It advocates designing data structures and algorithms to improve data locality and reduce cache misses.
The document provides an outline for a lecture on GPGPU performance and tools, discussing threads, physical and logical memory, efficient GPU programming, examples of GPU programming, and CUDA programming tools including the CUDA debugger and visual profiler. It emphasizes reading documentation, using profilers and debuggers to optimize code, and challenges common assumptions about GPU programming.
This document provides an introduction to building machine learning models using IBM Data Science Experience. It first discusses data science and machine learning concepts like the CRISP-DM methodology and neural networks. It then introduces IBM Data Science Experience, describing how it allows users to work with big data on the cloud using Python or R on Spark. The document concludes by introducing TensorFlow and providing an overview of key TensorFlow concepts like tensors, data flow graphs, and how neural networks and deep learning are represented.
This document discusses improving Spark performance on many-core machines by implementing an in-memory shuffle. It finds that Spark's disk-based shuffle is inefficient on such hardware due to serialization, I/O contention, and garbage collection overhead. An in-memory shuffle avoids these issues by copying data directly between memory pages. This results in a 31% median performance improvement on TPC-DS queries compared to the default Spark shuffle. However, more work is needed to address other performance bottlenecks and extend the in-memory shuffle to multi-node clusters.
Tachyon is used as a transparent cache layer for Spark SQL queries at Baidu to improve query performance from tens of minutes to seconds. It provides a 30x speedup by caching data in memory across a cluster of 100 nodes with over 1PB of storage. Several problems were encountered including failed cache blocks, locality issues, and out of memory errors. Advanced features like tiered storage across memory and disk help provide more storage capacity. Performance is further optimized through write optimizations and a garbage collection feature is planned for the future.
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
Online learning, Vowpal Wabbit and Hadoop
Online learning has recently caught a lot of attention, following some competitions, and especially after Criteo released 11GB for the training set of a Kaggle contest.
Online learning allows to process massive data as the learner processes data in a sequential way using up a low amount of memory and limited CPU ressources. It is also particularly suited for handling time-evolving date.
Vowpal Wabbit has become quite popular: it is a handy, light and efficient command line tool allowing to do online learning on GB of data, even on a standard laptop with standard memory. After a reminder of the online learning principles, we present how to run Vowpal Wabbit on Hadoop in a distributed fashion.
Francesc Alted (UberResearch GmbH), “New Trends In Storing And Analyzing Large Data Silos With Python”.
Bio: Teacher, developer and consultant in a wide variety of business applications. Particularly interested in the field of very large databases, with special emphasis in squeezing the last drop of performance out of computer as whole, i.e. not only the CPU, but the memory and I/O subsystems.
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This is a 2 hours overview on the deep learning status as for Q1 2017.
Starting with some basic concepts, continue to basic networks topologies , tools, HW/Accelerators and finally Intel's take on the the different fronts.
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
A practical talk by Anirudh Koul aimed at how to run Deep Neural Networks to run on memory and energy constrained devices like smart phones. Highlights some frameworks and best practices.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/auvizsystems/embedded-vision-training/videos/pages/may-2015-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Nagesh Gupta, CEO and Founder of Auviz Systems, presents the "Trade-offs in Implementing Deep Neural Networks on FPGAs" tutorial at the May 2015 Embedded Vision Summit.
Video and images are a key part of Internet traffic—think of all the data generated by social networking sites such as Facebook and Instagram—and this trend continues to grow. Extracting usable information from video and images is thus a growing requirement in the data center. For example, object and face recognition are valuable for a wide range of uses, from social applications to security applications. Deep neural networks are currently the most popular form of convolutional neural networks (CNN) used in data centers for such applications. 3D convolutions are a core part of CNNs. Nagesh presents alternative implementations of 3D convolutions on FPGAs, and discusses trade-offs among them.
Moving Toward Deep Learning Algorithms on HPCC SystemsHPCC Systems
The document discusses implementing optimization algorithms like L-BFGS on the HPCC Systems platform. It provides an overview of L-BFGS and describes how HPCC Systems uses a dataflow programming model and the ECL language to distribute computations across clusters. Examples are given of using ECL to implement L-BFGS locally and globally, including for applications like softmax regression.
This document discusses optimizing object-oriented code for performance. It begins with an overview of object-oriented programming and how CPU and memory performance have changed significantly since C++ was first created. It then analyzes a common scene tree example and finds it is slow due to excessive cache misses from scattered data. The solution is to restructure the code to have homogeneous, sequential data by allocating nodes and matrices contiguously in memory. Processing data in order and removing virtual function calls further improves performance. Prefetching is also able to reduce cache misses, resulting in a 6x speedup over the original implementation. The key lessons are to optimize for data locality and consider data-oriented design principles when performance is important.
Pitfalls of object_oriented_programming_gcap_09Royce Lu
This document summarizes some of the pitfalls of object-oriented programming and provides recommendations to improve performance. It discusses how CPU and memory performance have changed significantly since the development of C++, with memory access now being much slower relative to CPU speed. This can negatively impact object-oriented code that encapsulates both data and behavior, leading to excessive cache misses. The document recommends using a data-oriented design that processes data in a linear, sequential way to optimize for cache usage and minimize misses. It provides an example comparing object and data-oriented approaches to updating and culling a scene graph.
This document provides an overview of JavaFX, including its target developers, strategy, architecture, and roadmap. Some key points:
- JavaFX is designed for both native and rich web applications using Java. It aims to position Java as the leading platform for both domains.
- The architecture includes a retained mode scene graph, CSS support, layout capabilities, 3D rendering, and a WebView component for embedding web content.
- Controls can be styled using CSS. The default stylesheet is called Caspian.
- Oracle is committed to continuing development of JavaFX and maintaining an open Java ecosystem.
The document summarizes post-exploitation techniques on OSX and iPhone. It describes a technique called "userland-exec" that allows executing applications on OSX without using the kernel. This technique was adapted to work on jailbroken iPhones by injecting a non-signed library and hijacking the dynamic linker (dlopen) to map and link the library. With some additional patches, the authors were able to load an arbitrary non-signed library into the address space of a process on factory iPhones, representing the first reliable way to execute payloads on these devices despite code signing protections.
The document discusses data-oriented design principles for game engine development in C++. It emphasizes understanding how data is represented and used to solve problems, rather than focusing on writing code. It provides examples of how restructuring code to better utilize data locality and cache lines can significantly improve performance by reducing cache misses. Booleans packed into structures are identified as having extremely low information density, wasting cache space.
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
Slides for talk Abhishek Sharma and I gave at the Gennovation tech talks (https://gennovationtalks.com/) at Genesis. The talk was part of outreach for the Deep Learning Enthusiasts meetup group at San Francisco. My part of the talk is covered from slides 19-34.
A separately excited dc motor is driven from a 240v, 50HZ supply via a HC
SCR-bridge with a fly-wheel diode. The motor has an armature resistance
1Ω, an armature voltage constant Kv of 0.8 V. s/rad. The field current is
constant. Assume steady armature current. Determine the armature current
and torque for 1600 rpm and a firing angle delay of a) 30° b) 60
The beautiful thing about software engineering is that it gives you the warm and fuzzy illusion of total understanding: I control this machine because I know how it operates. This is the result of layers upon layers of successful abstractions, which hide immense sophistication and complexity. As with any abstraction, though, these sometimes leak, and that's when a good grounding in what's under the hood pays off.
The second talk in this series peels a few layers of abstraction and takes a look under the hood of our "car engine", the CPU. While hardly anyone codes in assembly language anymore, your C# or JavaScript (or Scala or...) application still ends up executing machine code instructions on a processor; that is why Java has a memory model, why memory layout still matters at scale, and why you're usually free to ignore these considerations and go about your merry way.
You'll come away knowing a little bit about a lot of different moving parts under the hood; after all, isn't understanding how the machine operates what this is all about?
(From a talk given at BuildStuff 2016 in Vilnius, Lithuania.)
This document discusses using data-oriented design to write efficient code. It provides an example comparing iterating by columns versus rows in a matrix, showing that iterating by rows is 2.5x faster due to improved cache performance. It discusses concepts like cache misses, memory hierarchies, and how CPUs work. It advocates designing data structures and algorithms to improve data locality and reduce cache misses.
The document provides an outline for a lecture on GPGPU performance and tools, discussing threads, physical and logical memory, efficient GPU programming, examples of GPU programming, and CUDA programming tools including the CUDA debugger and visual profiler. It emphasizes reading documentation, using profilers and debuggers to optimize code, and challenges common assumptions about GPU programming.
This document provides an introduction to building machine learning models using IBM Data Science Experience. It first discusses data science and machine learning concepts like the CRISP-DM methodology and neural networks. It then introduces IBM Data Science Experience, describing how it allows users to work with big data on the cloud using Python or R on Spark. The document concludes by introducing TensorFlow and providing an overview of key TensorFlow concepts like tensors, data flow graphs, and how neural networks and deep learning are represented.
This document discusses improving Spark performance on many-core machines by implementing an in-memory shuffle. It finds that Spark's disk-based shuffle is inefficient on such hardware due to serialization, I/O contention, and garbage collection overhead. An in-memory shuffle avoids these issues by copying data directly between memory pages. This results in a 31% median performance improvement on TPC-DS queries compared to the default Spark shuffle. However, more work is needed to address other performance bottlenecks and extend the in-memory shuffle to multi-node clusters.
Tachyon is used as a transparent cache layer for Spark SQL queries at Baidu to improve query performance from tens of minutes to seconds. It provides a 30x speedup by caching data in memory across a cluster of 100 nodes with over 1PB of storage. Several problems were encountered including failed cache blocks, locality issues, and out of memory errors. Advanced features like tiered storage across memory and disk help provide more storage capacity. Performance is further optimized through write optimizations and a garbage collection feature is planned for the future.
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
Online learning, Vowpal Wabbit and Hadoop
Online learning has recently caught a lot of attention, following some competitions, and especially after Criteo released 11GB for the training set of a Kaggle contest.
Online learning allows to process massive data as the learner processes data in a sequential way using up a low amount of memory and limited CPU ressources. It is also particularly suited for handling time-evolving date.
Vowpal Wabbit has become quite popular: it is a handy, light and efficient command line tool allowing to do online learning on GB of data, even on a standard laptop with standard memory. After a reminder of the online learning principles, we present how to run Vowpal Wabbit on Hadoop in a distributed fashion.
Francesc Alted (UberResearch GmbH), “New Trends In Storing And Analyzing Large Data Silos With Python”.
Bio: Teacher, developer and consultant in a wide variety of business applications. Particularly interested in the field of very large databases, with special emphasis in squeezing the last drop of performance out of computer as whole, i.e. not only the CPU, but the memory and I/O subsystems.
1. The document discusses the history and evolution of GPUs and GPGPU programming. It describes how GPUs started as dedicated graphics cards but now have programmable capabilities through shaders.
2. It explains the key concepts of GPGPU including the host/device model, memory models, and execution models using concepts like work items, work groups, and ND ranges.
3. The document uses OpenCL as an example programming model, covering memory transfers between host and device, data types, and how a matrix multiplication kernel could be implemented in OpenCL using the execution model.
Scaling up Machine Learning Algorithms for Classificationsmatsus
This document discusses methods for scaling up machine learning algorithms to massive datasets. It presents the Dual Cached Loops method which runs disk input/output (I/O) and computation simultaneously to avoid bottlenecks. This outperforms existing methods on large, sparse datasets. The document also discusses ongoing work on distributed asynchronous optimization across multiple machines to further improve scaling. The goal is methods that can optimize machine learning models on datasets too large to fit in memory.
The document provides an overview and agenda for an introduction to running AI workloads on PowerAI. It discusses PowerAI and how it combines popular deep learning frameworks, development tools, and accelerated IBM Power servers. It then demonstrates AI workloads using TensorFlow and PyTorch, including running an MNIST workload to classify handwritten digits using basic linear regression and convolutional neural networks in TensorFlow, and an introduction to PyTorch concepts like tensors, modules, and softmax cross entropy loss.
Fast Big Data Analytics with Spark on TachyonAlluxio, Inc.
This document discusses using Tachyon as a cache layer to accelerate big data analytics workloads at Baidu. It describes how Tachyon provides a 30x speedup over directly accessing remote storage by serving data from an in-memory cache. The document outlines Baidu's architecture using Tachyon with Spark SQL, discusses problems encountered in production and their solutions, and proposes advanced features like tiered storage to expand Tachyon's caching capacity.
Similar to Pitfalls of Object Oriented Programming (20)
New Millennium for Computer Entertainment - KutaragiSlide_N
This document discusses the next generation of computer entertainment and Sony's vision for the future. It summarizes Sony's development of new technologies including the Emotion Engine processor and Graphics Synthesizer that will power the next PlayStation console. These new components provide significantly more processing and graphics capabilities compared to existing consoles and PCs. Sony aims to advance from sound and graphics synthesis to emotion synthesis by using these technologies to generate realistic animations and simulate human emotions in games.
Ken Kutaragi was the Executive Deputy President and COO in charge of Home, Broadband and Semiconductor Solutions Network Companies, and Game Business Group at Sony. He saw digital consumer electronics like digital flat TVs, home servers, and digital cameras as the new driving force for next generation technologies. He believed future homes would be powered by technologies like artificial intelligence, broadband networks, optical/wireless connectivity, and the PlayStation portable game console. Semiconductors would be the "heart" powering various digital devices, and entertainment was viewed as the "key" application that would drive new digital content and computing platforms like Sony's CELL processor.
The document outlines Nobuyuki Idei's transformation plan for Sony to improve profitability through structural reform. The plan involves two phases from FY2003-FY2006: 1) reducing fixed costs by 330 billion yen through streamlining operations and headcount reductions, and 2) implementing "convergence strategies" across businesses to enhance core businesses and create new areas of growth. The goal is to increase the group operating profit margin to over 10% by FY2006.
Moving Innovative Game Technology from the Lab to the Living RoomSlide_N
Richard Marks discusses moving innovative game technology from research labs into consumer living rooms. He provides examples of how Sony has developed new input and sensing technologies like the EyeToy webcam and PlayStation Move motion controller through research and then incorporated them into popular gaming products. Marks explains the process from initial research concepts and prototypes to mass production and commercial launches. He also looks at future trends in areas like immersive displays, life gaming, and haptic feedback.
Cell Technology for Graphics and VisualizationSlide_N
The document discusses Cell technology for graphics and visualization. It provides an overview of the Cell architecture including its Power Processor Element (PPE) and Synergistic Processor Elements (SPEs). The PPE handles operating system tasks while the SPEs provide computational performance. The document outlines programming models for the Cell including function offload, application specific accelerators, computational acceleration, streaming, and a shared memory multiprocessor model. It also discusses heterogeneous threading and a single source compiler approach.
This document summarizes an IBM presentation on industry trends in microprocessor design. It discusses how single-thread performance growth has slowed due to power limitations, leading chipmakers to adopt multi-core designs. It then outlines IBM's Cell/B.E. microprocessor and roadmap, including its heterogeneous multi-core architecture combining general-purpose and specialized processing elements. Finally, it notes both AMD and Intel are moving toward heterogeneous designs that integrate CPU and GPU capabilities to better handle high-performance computing workloads.
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
The document discusses Ocelot, a binary translation framework that allows architectures other than NVIDIA GPUs to execute programs written in PTX, an intermediate representation used by NVIDIA GPUs. It describes how Ocelot maps the PTX thread hierarchy to different architectures, uses translation techniques to hide memory latency, and emulates GPU data structures. It also provides details on the implementation of the translator and a case study of translating a PTX program to IBM Cell Processor assembly code.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Building Production Ready Search Pipelines with Spark and Milvus
Pitfalls of Object Oriented Programming
1. Tony Albrecht – Technical Consultant
Developer Services
Sony Computer Entertainment Europe
Research & Development Division
Pitfalls of Object Oriented Programming
2. Slide 2
• A quick look at Object Oriented (OO)
programming
• A common example
• Optimisation of that example
• Summary
What I will be covering
3. Slide 3
• What is OO programming?
– a programming paradigm that uses "objects" – data structures
consisting of datafields and methods together with their interactions – to
design applications and computer programs.
(Wikipedia)
• Includes features such as
– Data abstraction
– Encapsulation
– Polymorphism
– Inheritance
Object Oriented (OO) Programming
4. Slide 4
• OO programming allows you to think about
problems in terms of objects and their
interactions.
• Each object is (ideally) self contained
– Contains its own code and data.
– Defines an interface to its code and data.
• Each object can be perceived as a „black box‟.
What’s OOP for?
5. Slide 5
• If objects are self contained then they can be
– Reused.
– Maintained without side effects.
– Used without understanding internal
implementation/representation.
• This is good, yes?
Objects
6. Slide 6
• Well, yes
• And no.
• First some history.
Are Objects Good?
7. Slide 7
A Brief History of C++
C++ development started
1979 2009
11. Slide 11
A Brief History of C++
Release of v2.0
1979 20091989
Added
• multiple inheritance,
• abstract classes,
• static member functions,
• const member functions
• protected members.
15. Slide 15
• Many more features have
been added to C++
• CPUs have become much
faster.
• Transition to multiple cores
• Memory has become faster.
So what has changed since 1979?
http://www.vintagecomputing.com
16. Slide 16
CPU performance
Computer architecture: a quantitative approach
By John L. Hennessy, David A. Patterson, Andrea C. Arpaci-Dusseau
18. Slide 18
• One of the biggest changes is that memory
access speeds are far slower (relatively)
– 1980: RAM latency ~ 1 cycle
– 2009: RAM latency ~ 400+ cycles
• What can you do in 400 cycles?
What has changed since 1979?
19. Slide 19
• OO classes encapsulate code and data.
• So, an instantiated object will generally
contain all data associated with it.
What has this to do with OO?
20. Slide 20
• With modern HW (particularly consoles),
excessive encapsulation is BAD.
• Data flow should be fundamental to your
design (Data Oriented Design)
My Claim
21. Slide 21
• Base Object class
– Contains general data
• Node
– Container class
• Modifier
– Updates transforms
• Drawable/Cube
– Renders objects
Consider a simple OO Scene Tree
22. Slide 22
• Each object
– Maintains bounding sphere for culling
– Has transform (local and world)
– Dirty flag (optimisation)
– Pointer to Parent
Object
26. Slide 26
Consider the following code…
• Update the world transform and world
space bounding sphere for each object.
27. Slide 27
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheres
28. Slide 28
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheres
What‟s wrong with this
code?
29. Slide 29
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheres
If m_Dirty=false then we get branch
misprediction which costs 23 or 24
cycles.
30. Slide 30
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheresCalculation of the world bounding sphere
takes only 12 cycles.
31. Slide 31
Consider the following code…
• Leaf nodes (objects) return transformed
bounding spheresSo using a dirty flag here is actually
slower than not using one (in the case
where it is false)
47. Slide 47
• 11,111 nodes/objects in a
tree 5 levels deep
• Every node being
transformed
• Hierarchical culling of tree
• Render method is empty
The Test
55. Slide 55
Some rough calculations
Branch Mispredictions: 50,421 @ 23 cycles each ~= 0.36ms
56. Slide 56
Some rough calculations
Branch Mispredictions: 50,421 @ 23 cycles each ~= 0.36ms
L2 Cache misses: 36,345 @ 400 cycles each ~= 4.54ms
57. Slide 57
• From Tuner, ~ 3 L2 cache misses per
object
– These cache misses are mostly sequential
(more than 1 fetch from main memory can
happen at once)
– Code/cache miss/code/cache miss/code…
58. Slide 58
• How can we fix it?
• And still keep the same functionality and
interface?
Slow memory is the problem here
59. Slide 59
• Use homogenous, sequential sets of data
The first step
63. Slide 63
• Process data in order
• Use implicit structure for hierarchy
– Minimise to and fro from nodes.
• Group logic to optimally use what is already in
cache.
• Remove regularly called virtuals.
What next?
69. Slide 69
Hierarchy
Node
Node Node
Node Node Node NodeNode Node Node Node
A lot of this
information can be
inferred
Node Node Node NodeNode Node Node Node
73. Slide 73
• Make the processing global rather than
local
– Pull the updates out of the objects.
• No more virtuals
– Easier to understand too – all code in one
place.
74. Slide 74
• OO version
– Update transform top down and expand WBS
bottom up
Need to change some things…
83. Slide 83
• Hierarchical bounding spheres pass info up
• Transforms cascade down
• Data use and code is „striped‟.
– Processing is alternating
84. Slide 84
• To do this with a „flat‟ hierarchy, break it
into 2 passes
– Update the transforms and bounding
spheres(from top down)
– Expand bounding spheres (bottom up)
Conversion to linear
85. Slide 85
Transform and BS updates
Node
Node Node
Node Node Node NodeNode Node Node Node
For each node at each level (top down)
{
multiply world transform by parent‟s
transform wbs by world transform
}
:2
:4
:4
86. Slide 86
Update bounding sphere hierarchies
Node
Node Node
Node Node Node NodeNode Node Node Node
For each node at each level (bottom up)
{
add wbs to parent‟s
}
:2
:4
:4
For each node at each level (bottom up)
{
add wbs to parent‟s
cull wbs against frustum
}
97. Slide 97
• Tuner scans show about 1.7 cache misses
per node.
• But, these misses are much more frequent
– Code/cache miss/cache miss/code
– Less stalling
99. Slide 99
• Data accesses are now predictable
• Can use prefetch (dcbt) to warm the cache
– Data streams can be tricky
– Many reasons for stream termination
– Easier to just use dcbt blindly
• (look ahead x number of iterations)
Prefetching
100. Slide 100
• Prefetch a predetermined number of
iterations ahead
• Ignore incorrect prefetches
Prefetching example
102. Slide 102
• This example makes very heavy use of the
cache
• This can affect other threads‟ use of the
cache
– Multiple threads with heavy cache use may
thrash the cache
A Warning on Prefetching
108. Slide 108
• Just reorganising data locations was a win
• Data + code reorganisation= dramatic
improvement.
• + prefetching equals even more WIN.
In Summary
109. Slide 109
• Be careful not to design yourself into a corner
• Consider data in your design
– Can you decouple data from objects?
– …code from objects?
• Be aware of what the compiler and HW are
doing
OO is not necessarily EVIL
110. Slide 110
• Optimise for data first, then code.
– Memory access is probably going to be your
biggest bottleneck
• Simplify systems
– KISS
– Easier to optimise, easier to parallelise
Its all about the memory
111. Slide 111
• Keep code and data homogenous
– Avoid introducing variations
– Don‟t test for exceptions – sort by them.
• Not everything needs to be an object
– If you must have a pattern, then consider
using Managers
Homogeneity
112. Slide 112
• You are writing a GAME
– You have control over the input data
– Don‟t be afraid to preformat it – drastically if
need be.
• Design for specifics, not generics
(generally).
Remember
113. Slide 113
• Better performance
• Better realisation of code optimisations
• Often simpler code
• More parallelisable code
Data Oriented Design Delivers