The document discusses optimizations made to improve the performance of Hadoop Distributed File System (HDFS) on ARM platforms. It details the testing methodology used and various optimizations tried, including using NEON to optimize the CRC32c checksum algorithm, replacing the PureJavaCrc32C class with a native implementation, and using direct buffers for data writes. Benchmark results show a reduction in CPU usage for both read and write paths after applying these optimizations. Further optimizations to disk I/O and tracking of Java virtual machines are suggested.
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
Linaro is building an OpenStack based Developer Cloud. Here we present what was required to bring OpenStack to 64-bit ARM, the pitfalls, successes and lessons learnt; what’s missing and what’s next.
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
This document discusses computationally intensive machine learning at large scales. It compares the algorithmic and statistical perspectives of computer scientists and statisticians when analyzing big data. It describes three science applications that use linear algebra techniques like PCA, NMF and CX decompositions on large datasets. Experiments are presented comparing the performance of these techniques implemented in Spark and MPI on different HPC platforms. The results show Spark can be 4-26x slower than optimized MPI codes. Next steps proposed include developing Alchemist to interface Spark and MPI more efficiently and exploring communication-avoiding machine learning algorithms.
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Databricks
Apache Spark performance on SQL and DataFrame/DataSet workloads has made impressive progress, thanks to Catalyst and Tungsten, but there is still a significant gap towards what is achievable by best-of-breed query engines or hand-written low-level C code on modern server-class hardware. This session presents Flare, a new experimental back-end for Spark SQL that yields significant speed-ups by compiling Catalyst query plans to native code.
Flare’s low-level implementation takes full advantage of native execution, using techniques such as NUMA-aware scheduling and data layouts to leverage ‘mechanical sympathy’ and bring execution closer to the metal than current JVM-based techniques on big memory machines. Thus, with available memory increasingly in the TB range, Flare makes scale-up on server-class hardware an interesting alternative to scaling out across a cluster, especially in terms of data center costs. This session will describe the design of Flare, and will demonstrate experiments on standard SQL benchmarks that exhibit order of magnitude speedups over Spark 2.1.
Apache Spark: The Next Gen toolset for Big Data Processingprajods
The Spark project from Apache(spark.apache.org), is the next generation of Big Data processing systems. It uses a new architecture and in-memory processing for orders of magnitude improvement in performance. Some would call it the successor to the Hadoop set of tools. Hadoop is a batch mode Big Data processor and depends on disk based files. Spark improves on this and supports real time and interactive processing, in addition to batch processing.
Table of contents:
1. The Big Data triangle
2. Hadoop stack and its limitations
3. Spark: An Overview
3.a. Spark Streaming
3.b. GraphX: Graph processing
3.c. MLib: Machine Learning
4. Performance characteristics of Spark
The document discusses using Hadoop for scientific workloads and summarizes early results from benchmarking Hadoop. It explores using Hadoop and MapReduce for data-intensive scientific applications like BLAST sequence analysis. Performance results show that Hadoop can provide comparable performance to existing parallel file systems. Challenges include lack of turn-key solutions, managing data formats, and performance tuning. The research aims to understand the unique needs of science clouds and how to effectively support data-intensive scientific applications on cloud platforms.
Introduction to Machine Learning in Spark. Presented at Bangalore Apache Spark Meetup by Shashank L and Shashidhar E S on 17/10/2015.
http://www.meetup.com/Bangalore-Apache-Spark-Meetup/events/225649429/
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakDatabricks
There has been growing interest in harnessing the parallelism of Graphics Processing Units (GPUs) to accelerate analytics workloads. GPUs have become the standard platform for many machine learning algorithms, particularly in the field of deep neural networks (DNNs), while making increasing inroads into more traditional domains such as analytics databases and visual analytics. However there is a strong need to couple these new platforms with Apache Spark, which has emerged as the de facto analytics platform for data scientists. In this talk we discuss how we built a connector from Spark to the open source GPU-powered MapD Analytics Platform, and the use cases such a connector enables around being able to pull high value data from Spark and cache it on the GPU for subsequent interactive visual analysis and machine learning. We will conclude with a brief demo of an end-to-end Spark-to-MapD pipeline.
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
Linaro is building an OpenStack based Developer Cloud. Here we present what was required to bring OpenStack to 64-bit ARM, the pitfalls, successes and lessons learnt; what’s missing and what’s next.
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
This document discusses computationally intensive machine learning at large scales. It compares the algorithmic and statistical perspectives of computer scientists and statisticians when analyzing big data. It describes three science applications that use linear algebra techniques like PCA, NMF and CX decompositions on large datasets. Experiments are presented comparing the performance of these techniques implemented in Spark and MPI on different HPC platforms. The results show Spark can be 4-26x slower than optimized MPI codes. Next steps proposed include developing Alchemist to interface Spark and MPI more efficiently and exploring communication-avoiding machine learning algorithms.
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Databricks
Apache Spark performance on SQL and DataFrame/DataSet workloads has made impressive progress, thanks to Catalyst and Tungsten, but there is still a significant gap towards what is achievable by best-of-breed query engines or hand-written low-level C code on modern server-class hardware. This session presents Flare, a new experimental back-end for Spark SQL that yields significant speed-ups by compiling Catalyst query plans to native code.
Flare’s low-level implementation takes full advantage of native execution, using techniques such as NUMA-aware scheduling and data layouts to leverage ‘mechanical sympathy’ and bring execution closer to the metal than current JVM-based techniques on big memory machines. Thus, with available memory increasingly in the TB range, Flare makes scale-up on server-class hardware an interesting alternative to scaling out across a cluster, especially in terms of data center costs. This session will describe the design of Flare, and will demonstrate experiments on standard SQL benchmarks that exhibit order of magnitude speedups over Spark 2.1.
Apache Spark: The Next Gen toolset for Big Data Processingprajods
The Spark project from Apache(spark.apache.org), is the next generation of Big Data processing systems. It uses a new architecture and in-memory processing for orders of magnitude improvement in performance. Some would call it the successor to the Hadoop set of tools. Hadoop is a batch mode Big Data processor and depends on disk based files. Spark improves on this and supports real time and interactive processing, in addition to batch processing.
Table of contents:
1. The Big Data triangle
2. Hadoop stack and its limitations
3. Spark: An Overview
3.a. Spark Streaming
3.b. GraphX: Graph processing
3.c. MLib: Machine Learning
4. Performance characteristics of Spark
The document discusses using Hadoop for scientific workloads and summarizes early results from benchmarking Hadoop. It explores using Hadoop and MapReduce for data-intensive scientific applications like BLAST sequence analysis. Performance results show that Hadoop can provide comparable performance to existing parallel file systems. Challenges include lack of turn-key solutions, managing data formats, and performance tuning. The research aims to understand the unique needs of science clouds and how to effectively support data-intensive scientific applications on cloud platforms.
Introduction to Machine Learning in Spark. Presented at Bangalore Apache Spark Meetup by Shashank L and Shashidhar E S on 17/10/2015.
http://www.meetup.com/Bangalore-Apache-Spark-Meetup/events/225649429/
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd MostakDatabricks
There has been growing interest in harnessing the parallelism of Graphics Processing Units (GPUs) to accelerate analytics workloads. GPUs have become the standard platform for many machine learning algorithms, particularly in the field of deep neural networks (DNNs), while making increasing inroads into more traditional domains such as analytics databases and visual analytics. However there is a strong need to couple these new platforms with Apache Spark, which has emerged as the de facto analytics platform for data scientists. In this talk we discuss how we built a connector from Spark to the open source GPU-powered MapD Analytics Platform, and the use cases such a connector enables around being able to pull high value data from Spark and cache it on the GPU for subsequent interactive visual analysis and machine learning. We will conclude with a brief demo of an end-to-end Spark-to-MapD pipeline.
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...Spark Summit
This document presents a Spark framework for personalized DNA analysis at large scale for under $100 and less than 1 hour. The framework segments input DNA data and runs it through three stages on a Spark cluster: 1) mapping and static load balancing, 2) sorting and dynamic load balancing, and 3) Picard deduplication and GATK variant calling. It achieves high CPU utilization, scales linearly from 1 to 20 nodes, analyzes 400GB of data in under an hour on a 35-node cluster for under $100, and has a 99.1% concordance with serial GATK. Future work involves accelerating it using FPGAs.
Overview of myHadoop 0.30, a framework for deploying Hadoop on existing high-performance computing infrastructure. Discussion of how to install it, spin up a Hadoop cluster, and use the new features.
myHadoop 0.30's project page is now on GitHub (https://github.com/glennklockwood/myhadoop) and the latest release tarball can be downloaded from my website (glennklockwood.com/files/myhadoop-0.30.tar.gz)
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
Deep Learning architectures, such as deep neural networks, are currently the hottest emerging areas of data science, especially in Big Data. Deep Learning could be effectively exploited to address some major issues of Big Data, such as fast information retrieval, data classification, semantic indexing and so on. In this work, we designed and implemented a framework to train deep neural networks using Spark, fast and general data flow engine for large scale data processing, which can utilize cluster computing to train large scale deep networks. Training Deep Learning models requires extensive data and computation. Our proposed framework can accelerate the training time by distributing the model replicas, via stochastic gradient descent, among cluster nodes for data resided on HDFS.
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks
Methods that scale with available computation are the future of AI. Distributed deep learning is one such method that enables data scientists to massively increase their productivity by (1) running parallel experiments over many devices (GPUs/TPUs/servers) and (2) massively reducing training time by distributing the training of a single network over many devices. Apache Spark is a key enabling platform for distributed deep learning, as it enables different deep learning frameworks to be embedded in Spark workflows in a secure end-to-end pipeline. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows to build distributed deep learning applications.
We will analyse the different frameworks for integrating Spark with Tensorflow, from Horovod to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We will also look at where you will find the bottlenecks when training models (in your frameworks, the network, GPUs, and with your data scientists) and how to get around them. We will look at how to use Spark Estimator model to perform hyper-parameter optimization with Spark/TensorFlow and model-architecture search, where Spark executors perform experiments in parallel to automatically find good model architectures.
The talk will include a live demonstration of training and inference for a Tensorflow application embedded in a Spark pipeline written in a Jupyter notebook on the Hops platform. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training. The demo will be run on the Hops platform, currently used by over 450 researchers and students in Sweden, as well as at companies such as Scania and Ericsson.
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Databricks
DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but it presents some unexpected issues that can compromise performance and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them, coming from lessons learned when putting things in production.
This document provides an introduction and overview of Apache Spark. It discusses what Spark is, its performance advantages over Hadoop MapReduce, its core abstraction of resilient distributed datasets (RDDs), and how Spark programs are executed. Key features of Spark like its interactive shell, transformations and actions on RDDs, and Spark SQL are explained. Recent new features in Spark like DataFrames, external data sources, and the Tungsten performance optimizer are also covered. The document aims to give attendees an understanding of Spark's capabilities and how it can provide faster performance than Hadoop for certain applications.
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
We will give a detailed introduction to Apache Spark and why and how Spark can change the analytics world. Apache Spark's memory abstraction is RDD (Resilient Distributed DataSet). One of the key reason why Apache Spark is so different is because of the introduction of RDD. You cannot do anything in Apache Spark without knowing about RDDs. We will give a high level introduction to RDD and in the second half we will have a deep dive into RDDs.
Parallel Linear Regression in Interative Reduce and YARNDataWorks Summit
Online learning techniques, such as Stochastic Gradient Descent (SGD), are powerful when applied to risk minimization and convex games on large problems. However, their sequential design prevents them from taking advantage of newer distributed frameworks such as Hadoop/MapReduce. In this session, we will take a look at how we parallelized linear regression parameter optimization on the next-gen YARN framework Iterative Reduce.
Re-Architecting Spark For Performance UnderstandabilityJen Aman
The document describes a new architecture called "monotasks" for Apache Spark that aims to make reasoning about Spark job performance easier. The monotasks architecture decomposes Spark tasks so that each task uses only one resource (e.g. CPU, disk, network). This avoids issues where Spark tasks bottleneck on different resources over time or experience resource contention. With monotasks, dedicated schedulers control resource contention and monotask timing data can be used to model ideal performance. Results show monotasks match Spark's performance and provide clearer insight into bottlenecks.
This slide introduces Hadoop Spark.
Just to help you construct an idea of Spark regarding its architecture, data flow, job scheduling, and programming.
Not all technical details are included.
Spark is a powerful, scalable real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel, GPU clusters is fast becoming the default way to quickly develop and train deep learning models. As data science teams and data savvy companies mature, they will need to invest in both platforms if they intend to leverage both big data and artificial intelligence for competitive advantage.
This talk will discuss and show in action:
* Leveraging Spark and Tensorflow for hyperparameter tuning
* Leveraging Spark and Tensorflow for deploying trained models
* An examination of DeepLearning4J, CaffeOnSpark, IBM's SystemML, and Intel's BigDL
* Sidecar GPU cluster architecture and Spark-GPU data reading patterns
* Pros, cons, and performance characteristics of various approaches
Attendees will leave this session informed on:
* The available architectures for Spark and Deep Learning and Spark with and without GPUs for Deep Learning
* Several deep learning software frameworks, their pros and cons in the Spark context and for various use cases, and their performance characteristics
* A practical, applied methodology and technical examples for tackling big data deep learning
Introduction to Apache Spark Developer TrainingCloudera, Inc.
Apache Spark is a next-generation processing engine optimized for speed, ease of use, and advanced analytics well beyond batch. The Spark framework supports streaming data and complex, iterative algorithms, enabling applications to run 100x faster than traditional MapReduce programs. With Spark, developers can write sophisticated parallel applications for faster business decisions and better user outcomes, applied to a wide variety of architectures and industries.
Learn What Apache Spark is and how it compares to Hadoop MapReduce, How to filter, map, reduce, and save Resilient Distributed Datasets (RDDs), Who is best suited to attend the course and what prior knowledge you should have, and the benefits of building Spark applications as part of an enterprise data hub.
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
The document provides an overview of Spark and its machine learning library MLlib. It discusses how Spark uses resilient distributed datasets (RDDs) to perform distributed computing tasks across clusters in a fault-tolerant manner. It summarizes the key capabilities of MLlib, including its support for common machine learning algorithms and how MLlib can be used together with other Spark components like Spark Streaming, GraphX, and SQL. The document also briefly discusses future directions for MLlib, such as tighter integration with DataFrames and new optimization methods.
Let Spark Fly: Advantages and Use Cases for Spark on HadoopMapR Technologies
http://bit.ly/1BTaXZP – Apache Spark is currently one of the most active projects in the Hadoop ecosystem, and as such, there’s been plenty of hype about it in recent months, but how much of the discussion is marketing spin? And what are the facts? MapR and Databricks, the company that created and led the development of the Spark stack, will cut through the noise to uncover practical advantages for having the full set of Spark technologies at your disposal and reveal the benefits for running Spark on Hadoop
This presentation was given at a webinar hosted by Data Science Central and co-presented by MapR + Databricks.
To see the webinar, please go to: http://www.datasciencecentral.com/video/let-spark-fly-advantages-and-use-cases-for-spark-on-hadoop
This document provides an overview and introduction to Spark, including:
- Spark is a general purpose computational framework that provides more flexibility than MapReduce while retaining properties like scalability and fault tolerance.
- Spark concepts include resilient distributed datasets (RDDs), transformations that create new RDDs lazily, and actions that run computations and return values to materialize RDDs.
- Spark can run on standalone clusters or as part of Cloudera's Enterprise Data Hub, and examples of its use include machine learning, streaming, and SQL queries.
Impala is a massively parallel processing SQL query engine for Hadoop. It allows users to issue SQL queries directly to their data in Apache Hadoop. Impala uses a distributed architecture where queries are executed in parallel across nodes by Impala daemons. It uses a new execution engine written in C++ with runtime code generation for high performance. Impala also supports commonly used Hadoop file formats and can query data stored in HDFS and HBase.
Lambda Architecture using Google Cloud plus AppsSimon Su
This is a demo that use apps script to demo the lambda dashboard. The apps script publish a endpoint and client using fluentd to post data to apps script and also bigquery. Then, you can see the realtime and batch query in the same view.
This document discusses DeepLeanring4j, a framework for data parallel deep learning on Spark. It provides an overview of the current landscape of deep learning tools, and proposes a solution using Javacpp, libnd4j, and SKIL (Skymind Intelligence Layer) to leverage Spark and the JVM ecosystem while allowing for heterogeneous compute. Key aspects include efficient access to image and audio data, an ND4j library for numerical compute, deployment via Juju, and an ETL pipeline interface in Canova. The goal is to build on the JVM's strengths while addressing its limitations for numerical compute through hardware acceleration.
This document introduces Apache Spark, an open-source cluster computing system that provides fast, general execution engines for large-scale data processing. It summarizes key Spark concepts including resilient distributed datasets (RDDs) that let users spread data across a cluster, transformations that operate on RDDs, and actions that return values to the driver program. Examples demonstrate how to load data from files, filter and transform it using RDDs, and run Spark programs on a local or cluster environment.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...Spark Summit
This document presents a Spark framework for personalized DNA analysis at large scale for under $100 and less than 1 hour. The framework segments input DNA data and runs it through three stages on a Spark cluster: 1) mapping and static load balancing, 2) sorting and dynamic load balancing, and 3) Picard deduplication and GATK variant calling. It achieves high CPU utilization, scales linearly from 1 to 20 nodes, analyzes 400GB of data in under an hour on a 35-node cluster for under $100, and has a 99.1% concordance with serial GATK. Future work involves accelerating it using FPGAs.
Overview of myHadoop 0.30, a framework for deploying Hadoop on existing high-performance computing infrastructure. Discussion of how to install it, spin up a Hadoop cluster, and use the new features.
myHadoop 0.30's project page is now on GitHub (https://github.com/glennklockwood/myhadoop) and the latest release tarball can be downloaded from my website (glennklockwood.com/files/myhadoop-0.30.tar.gz)
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
Deep Learning architectures, such as deep neural networks, are currently the hottest emerging areas of data science, especially in Big Data. Deep Learning could be effectively exploited to address some major issues of Big Data, such as fast information retrieval, data classification, semantic indexing and so on. In this work, we designed and implemented a framework to train deep neural networks using Spark, fast and general data flow engine for large scale data processing, which can utilize cluster computing to train large scale deep networks. Training Deep Learning models requires extensive data and computation. Our proposed framework can accelerate the training time by distributing the model replicas, via stochastic gradient descent, among cluster nodes for data resided on HDFS.
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDatabricks
Methods that scale with available computation are the future of AI. Distributed deep learning is one such method that enables data scientists to massively increase their productivity by (1) running parallel experiments over many devices (GPUs/TPUs/servers) and (2) massively reducing training time by distributing the training of a single network over many devices. Apache Spark is a key enabling platform for distributed deep learning, as it enables different deep learning frameworks to be embedded in Spark workflows in a secure end-to-end pipeline. In this talk, we examine the different ways in which Tensorflow can be included in Spark workflows to build distributed deep learning applications.
We will analyse the different frameworks for integrating Spark with Tensorflow, from Horovod to TensorflowOnSpark to Databrick’s Deep Learning Pipelines. We will also look at where you will find the bottlenecks when training models (in your frameworks, the network, GPUs, and with your data scientists) and how to get around them. We will look at how to use Spark Estimator model to perform hyper-parameter optimization with Spark/TensorFlow and model-architecture search, where Spark executors perform experiments in parallel to automatically find good model architectures.
The talk will include a live demonstration of training and inference for a Tensorflow application embedded in a Spark pipeline written in a Jupyter notebook on the Hops platform. We will show how to debug the application using both Spark UI and Tensorboard, and how to examine logs and monitor training. The demo will be run on the Hops platform, currently used by over 450 researchers and students in Sweden, as well as at companies such as Scania and Ericsson.
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Databricks
DeepLearning4J (DL4J) is a powerful Open Source distributed framework that brings Deep Learning to the JVM (it can serve as a DIY tool for Java, Scala, Clojure and Kotlin programmers). It can be used on distributed GPUs and CPUs. It is integrated with Hadoop and Apache Spark. ND4J is a Open Source, distributed and GPU-enabled library that brings the intuitive scientific computing tools of the Python community to the JVM. Training neural network models using DL4J, ND4J and Spark is a powerful combination, but it presents some unexpected issues that can compromise performance and nullify the benefits of well written code and good model design. In this talk I will walk through some of those problems and will present some best practices to prevent them, coming from lessons learned when putting things in production.
This document provides an introduction and overview of Apache Spark. It discusses what Spark is, its performance advantages over Hadoop MapReduce, its core abstraction of resilient distributed datasets (RDDs), and how Spark programs are executed. Key features of Spark like its interactive shell, transformations and actions on RDDs, and Spark SQL are explained. Recent new features in Spark like DataFrames, external data sources, and the Tungsten performance optimizer are also covered. The document aims to give attendees an understanding of Spark's capabilities and how it can provide faster performance than Hadoop for certain applications.
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
We will give a detailed introduction to Apache Spark and why and how Spark can change the analytics world. Apache Spark's memory abstraction is RDD (Resilient Distributed DataSet). One of the key reason why Apache Spark is so different is because of the introduction of RDD. You cannot do anything in Apache Spark without knowing about RDDs. We will give a high level introduction to RDD and in the second half we will have a deep dive into RDDs.
Parallel Linear Regression in Interative Reduce and YARNDataWorks Summit
Online learning techniques, such as Stochastic Gradient Descent (SGD), are powerful when applied to risk minimization and convex games on large problems. However, their sequential design prevents them from taking advantage of newer distributed frameworks such as Hadoop/MapReduce. In this session, we will take a look at how we parallelized linear regression parameter optimization on the next-gen YARN framework Iterative Reduce.
Re-Architecting Spark For Performance UnderstandabilityJen Aman
The document describes a new architecture called "monotasks" for Apache Spark that aims to make reasoning about Spark job performance easier. The monotasks architecture decomposes Spark tasks so that each task uses only one resource (e.g. CPU, disk, network). This avoids issues where Spark tasks bottleneck on different resources over time or experience resource contention. With monotasks, dedicated schedulers control resource contention and monotask timing data can be used to model ideal performance. Results show monotasks match Spark's performance and provide clearer insight into bottlenecks.
This slide introduces Hadoop Spark.
Just to help you construct an idea of Spark regarding its architecture, data flow, job scheduling, and programming.
Not all technical details are included.
Spark is a powerful, scalable real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel, GPU clusters is fast becoming the default way to quickly develop and train deep learning models. As data science teams and data savvy companies mature, they will need to invest in both platforms if they intend to leverage both big data and artificial intelligence for competitive advantage.
This talk will discuss and show in action:
* Leveraging Spark and Tensorflow for hyperparameter tuning
* Leveraging Spark and Tensorflow for deploying trained models
* An examination of DeepLearning4J, CaffeOnSpark, IBM's SystemML, and Intel's BigDL
* Sidecar GPU cluster architecture and Spark-GPU data reading patterns
* Pros, cons, and performance characteristics of various approaches
Attendees will leave this session informed on:
* The available architectures for Spark and Deep Learning and Spark with and without GPUs for Deep Learning
* Several deep learning software frameworks, their pros and cons in the Spark context and for various use cases, and their performance characteristics
* A practical, applied methodology and technical examples for tackling big data deep learning
Introduction to Apache Spark Developer TrainingCloudera, Inc.
Apache Spark is a next-generation processing engine optimized for speed, ease of use, and advanced analytics well beyond batch. The Spark framework supports streaming data and complex, iterative algorithms, enabling applications to run 100x faster than traditional MapReduce programs. With Spark, developers can write sophisticated parallel applications for faster business decisions and better user outcomes, applied to a wide variety of architectures and industries.
Learn What Apache Spark is and how it compares to Hadoop MapReduce, How to filter, map, reduce, and save Resilient Distributed Datasets (RDDs), Who is best suited to attend the course and what prior knowledge you should have, and the benefits of building Spark applications as part of an enterprise data hub.
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
The document provides an overview of Spark and its machine learning library MLlib. It discusses how Spark uses resilient distributed datasets (RDDs) to perform distributed computing tasks across clusters in a fault-tolerant manner. It summarizes the key capabilities of MLlib, including its support for common machine learning algorithms and how MLlib can be used together with other Spark components like Spark Streaming, GraphX, and SQL. The document also briefly discusses future directions for MLlib, such as tighter integration with DataFrames and new optimization methods.
Let Spark Fly: Advantages and Use Cases for Spark on HadoopMapR Technologies
http://bit.ly/1BTaXZP – Apache Spark is currently one of the most active projects in the Hadoop ecosystem, and as such, there’s been plenty of hype about it in recent months, but how much of the discussion is marketing spin? And what are the facts? MapR and Databricks, the company that created and led the development of the Spark stack, will cut through the noise to uncover practical advantages for having the full set of Spark technologies at your disposal and reveal the benefits for running Spark on Hadoop
This presentation was given at a webinar hosted by Data Science Central and co-presented by MapR + Databricks.
To see the webinar, please go to: http://www.datasciencecentral.com/video/let-spark-fly-advantages-and-use-cases-for-spark-on-hadoop
This document provides an overview and introduction to Spark, including:
- Spark is a general purpose computational framework that provides more flexibility than MapReduce while retaining properties like scalability and fault tolerance.
- Spark concepts include resilient distributed datasets (RDDs), transformations that create new RDDs lazily, and actions that run computations and return values to materialize RDDs.
- Spark can run on standalone clusters or as part of Cloudera's Enterprise Data Hub, and examples of its use include machine learning, streaming, and SQL queries.
Impala is a massively parallel processing SQL query engine for Hadoop. It allows users to issue SQL queries directly to their data in Apache Hadoop. Impala uses a distributed architecture where queries are executed in parallel across nodes by Impala daemons. It uses a new execution engine written in C++ with runtime code generation for high performance. Impala also supports commonly used Hadoop file formats and can query data stored in HDFS and HBase.
Lambda Architecture using Google Cloud plus AppsSimon Su
This is a demo that use apps script to demo the lambda dashboard. The apps script publish a endpoint and client using fluentd to post data to apps script and also bigquery. Then, you can see the realtime and batch query in the same view.
This document discusses DeepLeanring4j, a framework for data parallel deep learning on Spark. It provides an overview of the current landscape of deep learning tools, and proposes a solution using Javacpp, libnd4j, and SKIL (Skymind Intelligence Layer) to leverage Spark and the JVM ecosystem while allowing for heterogeneous compute. Key aspects include efficient access to image and audio data, an ND4j library for numerical compute, deployment via Juju, and an ETL pipeline interface in Canova. The goal is to build on the JVM's strengths while addressing its limitations for numerical compute through hardware acceleration.
This document introduces Apache Spark, an open-source cluster computing system that provides fast, general execution engines for large-scale data processing. It summarizes key Spark concepts including resilient distributed datasets (RDDs) that let users spread data across a cluster, transformations that operate on RDDs, and actions that return values to the driver program. Examples demonstrate how to load data from files, filter and transform it using RDDs, and run Spark programs on a local or cluster environment.
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Testing Delphix: easy data virtualizationFranck Pachot
The document summarizes the author's testing of the Delphix data virtualization software. Some key points:
- Delphix allows users to easily provision virtual copies of database sources on demand for tasks like testing, development, and disaster recovery.
- It works by maintaining incremental snapshots of source databases and virtualizing the data access. Copies can be provisioned in minutes and rewound to past points in time.
- The author demonstrated provisioning a copy of an Oracle database using Delphix and found the process very simple. Delphix integrates deeply with databases.
- Use cases include giving databases to each tester/developer, enabling continuous integration testing, creating QA environments with real
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
Après la petite intro sur le stockage distribué et la description de Ceph, Jian Zhang réalise dans cette présentation quelques benchmarks intéressants : tests séquentiels, tests random et surtout comparaison des résultats avant et après optimisations. Les paramètres de configuration touchés et optimisations (Large page numbers, Omap data sur un disque séparé, ...) apportent au minimum 2x de perf en plus.
This document summarizes a presentation about Ceph, an open-source distributed storage system. It discusses Ceph's introduction and components, benchmarks Ceph's block and object storage performance on Intel architecture, and describes optimizations like cache tiering and erasure coding. It also outlines Intel's product portfolio in supporting Ceph through optimized CPUs, flash storage, networking, server boards, software libraries, and contributions to the open source Ceph community.
This document provides an overview of several advanced Hadoop topics, including:
- YARN, the resource manager that allocates resources and manages job scheduling in Hadoop. It uses a global ResourceManager and per-application ApplicationMasters.
- Testing HDFS I/O throughput with TestDFSIO, a tool that measures read and write performance through MapReduce jobs. It reports metrics like throughput and IO rates.
- The mrjob Python library, which provides a framework for writing multi-step MapReduce jobs in Python that can be run locally or on a Hadoop cluster. Sample code demonstrates defining a job class with mapper, reducer, and step methods.
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
The Apache Beam programming model is designed to support several advanced data processing features such as autoscaling and dynamic work rebalancing. In this talk, we will first explain how dynamic work rebalancing not only provides a general and robust solution to the problem of stragglers in traditional data processing pipelines, but also how it allows autoscaling to be truly effective. We will then present how dynamic work rebalancing works as implemented in the Google Cloud Dataflow runner and which path other Apache Beam runners link Apache Flink can follow to benefit from it.
Experience In Building Scalable Web Sites Through Infrastructure's ViewPhuwadon D
The document discusses strategies for building scalable web sites, including using caching technologies like Memcached, database replication and sharding, and load balancing. It provides recommendations for hardware, software architectures, and technologies to use at different stages of a site's growth to scale efficiently. Tips are also given for optimizing performance through caching, splitting content delivery, and other best practices.
The document discusses new features in version 0.9.4 of the DivConq file transfer software, including file tasks that can be triggered by uploads, scheduling, or file system events. It introduces dcScript, the scripting language that allows users to string together various file operations and tasks. Key points include that dcScript scripts can run asynchronously, optimize file operations through in-memory streaming rather than disk reads/writes, and offer features to simplify complex multi-step file tasks. The document provides examples of using dcScript to encrypt, compress, split and transfer files with just a few lines of code.
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
At Databricks, we have a unique view into over a hundred different companies trying out Spark for development and production use-cases, from their support tickets and forum posts. Having seen so many different workflows and applications, some discernible patterns emerge when looking at common performance and scalability issues that our users run into. This talk will discuss some of these common common issues from an engineering and operations perspective, describing solutions and clarifying misconceptions.
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
Direct Code Execution (DCE) is a userspace kernel network stack that allows running real network stack code in a single process. DCE provides a testing platform that enables reproducible testing, fine-grained parameter tuning, and a development framework for network protocols. It achieves this through a virtualization core layer that runs multiple network nodes within a single process, a kernel layer that replaces the kernel with a shared library, and a POSIX layer that redirects system calls to the kernel library. This allows full control and observability for testing and debugging the network stack.
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
In this webinar, Ivan K will compare the performance and features of InfluxDB and Elasticsearch for common time-series workloads, specifically looking at the rates of data ingestion, on-disk data compression, and query performance. Come hear about how Ivan conducted his tests to determine which time-series db would best fit your needs. We will reserve 15 minutes at the end of the talk for you to ask Ivan directly about his test processes and independent viewpoint.
This document discusses experiments conducted to determine the optimal hardware and software configurations for building a cost-efficient Swift object storage cluster with expected performance. It describes testing different configurations for proxy and storage nodes under small and large object upload workloads. The results show that for small object uploads, high-CPU instances performed best for storage nodes while either high-CPU or high-end instances worked well for proxies. For large object uploads, large instances were most cost-effective for storage nodes and high-end instances remained suitable for proxies. The findings provide guidance on right-sizing hardware based on workload characteristics.
Java File I/O Performance Analysis - Part I - JCConf 2018Michael Fong
The document summarizes various file I/O techniques in Java including stream I/O, decorator pattern, NIO, buffers, memory mapping, sendfile, benchmarks, and more. It provides code examples and discusses performance aspects. Benchmark results are shown for sequential and random file read/write using different Java file I/O APIs and configurations.
The DrupalCampLA 2011 presentation on backend performance. The slides go over optimizations that can be done through the LAMP (or now VAN LAMMP stack for even more performance) to get everything up and running.
Nagios Conference 2012 - Dan Wittenberg - Case Study: Scaling Nagios Core at ...Nagios
Dan Wittenberg's presentation on using Nagios at a Fortune 50 Company
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Node Interactive Debugging Node.js In ProductionYunong Xiao
Learn about the tools and methodologies we use in production at Netflix to diagnose and fix performance issues, bugs and memory leaks -- all without having to restart or change our Node application. Find out about profiling and post mortem tools such as perf events and mdb, visualizations like flame graphs and latency distributions, and how they help us keep our Node stack efficient.
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
##Что такое Storage Replica
##Архитектура и сценарии
##Синхронная и асинхронная репликация
##Междисковая, межсерверная, внутрикластерная и межкластерная репликация
##Дизайн и проектирование Storage Replica
##Нововведения в Windows Server 2016 TP5
##Графический интерфейс управления, и другие возможности - демонстрация и планы развития
##Интеграция Storage Replica с Storage Spaces Direct
Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks
Deep neural networks (deep nets) are revolutionizing many machine learning (ML) applications. But there is a major bottleneck to broader adoption: the pain of model selection.
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebula Project
In this session we'll explore measuring VM performance and evaluating changes to settings or infrastructure which can affect performance positively. We'll also share the best current practice for architecture for high performance clouds from our experience.
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
Short
The growing amount of data captured by sensors and the real time constraints imply that not only big data analytics but also Machine Learning (ML) inference shall be executed at the edge. The multiple options for neural network acceleration in Arm-based platforms provide an unprecedented opportunity for new intelligent devices. It also raises the risk of fragmentation and duplication of efforts when multiple frameworks shall support multiple accelerators.
Andrea Gallo, Linaro VP of Segment Groups, will summarise the existing NN frameworks, accelerator solutions, and will describe the efforts underway in the Arm ecosystem.
Abstract
The dramatically growing amount of data captured by sensors and the ever more stringent requirements for latency and real time constraints are paving the way for edge computing, and this implies that not only big data analytics but also Machine Learning (ML) inference shall be executed at the edge. The multiple options for neural network acceleration in recent Arm-based platforms provides an unprecedented opportunity for new intelligent devices with ML inference. It also raises the risk of fragmentation and duplication of efforts when multiple frameworks shall support multiple accelerators.
Andrea Gallo, Linaro VP of Segment Groups, will summarise the existing NN frameworks, model description formats, accelerator solutions, low cost development boards and will describe the efforts underway to identify the best technologies to improve the consolidation and enable the competitive innovative advantage from all vendors.
Audience
The session will be useful for executives to engineers. Executives will gain a deeper understanding of the issues and opportunities. Engineers at NN acceleration IP design houses will take away ideas for how to collaborate in the open source community on their area of expertise, how to evaluate the performance and accelerate multiple NN frameworks without modifying them for each new IP, whether it be targeting edge computing gateways, smart devices or simple microcontrollers.
Benefits to the Ecosystem
The AI deep learning neural network ecosystem is starting just now and it has similar implications with open source as GPU and video accelerators had in the early days with user space drivers, binary blobs, proprietary APIs and all possible ways to protect their IPs. The session will outline a proposal for a collaborative ecosystem effort to create a common framework to manage multiple NN accelerators while at the same time avoiding to modify deep learning frameworks with multiple forks.
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
The document summarizes an Arm Architecture HPC Workshop held by Linaro. It discusses Linaro's work in open source software development for Arm architecture, including efforts in HPC, tools, libraries, and machine learning. It also mentions Linaro's Developer Cloud which provides access to Arm hardware for developers.
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
Huawei outlines requirements for developing a competitive ARM-based HPC solution. They plan a two-phase strategy using existing Hi1616 platforms followed by more powerful Hi1620 platforms. Requirements include high-performance CPUs, optimized software stack, support for applications and ISVs, and cloud deployment. Huawei aims to demonstrate ARM's value in HPC by 2018-2020 through partnerships and turnkey solutions.
Bud17 113: distribution ci using qemu and open qaLinaro
“Delivering a well working distribution is hard. There are a lot of different hardware platforms that need to be verified and the software stack is in a big flux during development phases. In rolling releases, this gets even worse, as nothing ever stands still. The only sane answer to that problem are working Continuous Integration tests. The SUSE way to check whether any change breaks normal distribution behavior is OpenQA. Using OpenQA we can automatically run tests that hard working QA people did manually in the old days. That way we have fast enough turnaround times to find and reject breaking changes This session shows how OpenQA works, what pitfalls we had to make ARM work with OpenQA and what we’re doing to improve it for ARM specific use cases.”
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
Speaker: Renato Golin
Speaker Bio:
He started programming in the late 80's in C for PCs after a few years playing with 8-bit computers, but he only started programming professionally in the late 90's during the .com bubble. After many years working on Internet's back-end, he moved to UK and worked a few years on bioinformatics at EBI before joining ARM, where he worked on the DS-5 debugger and on the EDG-to-LLVM bridge, where he became the LLVM Tech Lead. Recently, he worked with large clusters and big data at HPCC before moving to Linaro.
Talk Title: OpenHPC Automation with Ansible
Talk Abstract: "In order to test OpenHPC packages and components and to use it as a
platform to benchmark HPC applications, Linaro is developing an automated deployment strategy, using Ansible, Mr-Provisioner and Jenkins, to install the
OS, OpenHPC and prepare the environment on varied architectures (Arm, x86). This work is meant to replace the existing ageing Bash-based recipes upstream while still keeping the documents intact. Our aim is to make it easier to vary hardware configuration, allow for different provisioning techniques and mix internal infrastructure logic to different labs, while still using the same recipes. We hope this will help more people use OpenHPC with a better out-of-the-box experience and with more robust results"
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
Speaker: Pavel Shamis
Company: Arm
Speaker Bio:
"Pavel is a Principal Research Engineer at ARM with over 16 years of experience in development HPC solutions. His work is focused on co-design software and hardware building blocks for high-performance interconnect technologies, development communication middleware and novel programming models. Prior to joining ARM, he spent five years at Oak Ridge National Laboratory (ORNL) as a research scientist at Computer Science and Math Division (CSMD). In this role, Pavel was responsible for research and development multiple projects in high-performance communication domain including: Collective Communication Offload (CORE-Direct & Cheetah), OpenSHMEM, and OpenUCX. Before joining ORNL, Pavel spent ten years at Mellanox Technologies, where he led Mellanox HPC team and was one of the key driver in enablement Mellanox HPC software stack, including OFA software stack, OpenMPI, MVAPICH, OpenSHMEM, and other.
Pavel is a recipient of prestigious R&D100 award for his contribution in development of the CORE-Direct collective offload technology and he published in excess of 20 research papers.
"
Talk Title: HPC network stack on ARM
Talk Abstract:
Applications, programming languages, and libraries that leverage sophisticated network hardware capabilities have a natural advantage when used in today¹s and tomorrow's high-performance and data center computer environments. Modern RDMA based network interconnects provides incredibly rich functionality (RDMA, Atomics, OS-bypass, etc.) that enable low-latency and high-bandwidth communication services. The functionality is supported by a variety of interconnect technologies such as InfiniBand, RoCE, iWARP, Intel OPA, Cray¹s Aries/Gemini, and others. Over the last decade, the HPC community has developed variety user/kernel level protocols and libraries that enable a variety of high-performance applications over RDMA interconnects including MPI, SHMEM, UPC, etc. With the emerging availability HPC solutions based on ARM CPU architecture it is important to understand how ARM integrates with the RDMA hardware and HPC network software stack. In this talk, we will overview ARM architecture and system software stack, including MPI runtimes, OpenSHMEM, and OpenUCX.
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
Speaker: Jay Kruemcke
Speaker Company: SUSE
Bio:
"Jay is responsible for the SUSE Linux server products for High Performance Computing, 64-bit ARM systems, and SUSE Linux for IBM Power servers.
Jay has built an extensive career in product management including using social media for client collaboration, product positioning, driving future product directions, and evangelizing the capabilities and future directions for dozens of enterprise products.
"
Talk Title: It just keeps getting better - SUSE enablement for Arm
Talk Abstract:
SUSE has been delivering commercial Linux support for Arm based servers since 2016. Initially the focus was on high end servers for HPC and Ceph based software defined storage. But we have enabled a number of other Arm SoCs and are even supporting the Raspberry Pi. This session will cover the SUSE products that are available for the Arm platform and view to the future.
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
Speakers: Gilad Shainer and Scot Schultz
Company: Mellanox Technologies
Talk Title: Intelligent Interconnect Architecture to Enable Next
Generation HPC
Talk Abstract:
The latest revolution in HPC interconnect architecture is the development of In-Network Computing, a technology that enables handling and accelerating application workloads at the network level. By placing data-related algorithms on an intelligent network, we can overcome the new performance bottlenecks and improve the data center and applications performance. The combination of In-Network Computing and ARM based processors offer a rich set of capabilities and opportunities to build the next generation of HPC platforms.
Gilad Shainer Bio:
Gilad Shainer has served as Mellanox's vice president of marketing since March 2013. Previously, Mr. Shainer was Mellanox's vice president of marketing development from March 2012 to March 2013. Mr. Shainer joined Mellanox in 2001 as a design engineer and later served in senior marketing management roles between July 2005 and February 2012. Mr. Shainer holds several patents in the field of high-speed networking and contributed to the PCI-SIG PCI-X and PCIe specifications. Gilad Shainer holds a MSc degree (2001, Cum Laude) and a BSc degree (1998, Cum Laude) in Electrical Engineering from the Technion Institute of Technology in Israel.
Scot Schultz Bio:
Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Joining the Mellanox team in 2013, Schultz is 30-year veteran of the computing industry. Prior to joining Mellanox, he spent the past 17 years at AMD in various engineering and leadership roles in the area of high performance computing. Scot has also been instrumental with the growth and development of various industry organizations including the Open Fabrics Alliance, and continues to serve as a founding board-member of the OpenPOWER Foundation and Director of Educational Outreach and founding member of the HPC-AI Advisory Council.
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Santa Clara 2018
Bio: "Yutaka Ishikawa is the project leader of developing the post K
supercomputer. From 1987 to 2001, he was a member of AIST (former
Electrotechnical Laboratory), METI. From 1993 to 2001, he was the
chief of Parallel and Distributed System Software Laboratory at Real
World Computing Partnership. He led development of cluster system
software called SCore, which was used in several large PC cluster
systems around 2004. From 2002 to 2014, he was a professor at the
University Tokyo. He led a project to design a commodity-based
supercomputer called T2K open supercomputer. As a result, three
universities, Tsukuba, Tokyo, and Kyoto, obtained each supercomputer
based on the specification in 2008. He was also involved with the
design of the Oakleaf-PACS, the successor of T2K supercomputer in both
Tsukuba and Tokyo, whose peak performance is 25PF."
Session Title: Post-K and Arm HPC Ecosystem
Session Description:
"Post-K, a flagship supercomputer in Japan, is being developed by Riken
and Fujitsu. It will be the first supercomputer with Armv8-A+SVE.
This talk will give an overview of Post-K and how RIKEN and Fujitsu
are currently working on software stack for an Arm architecture."
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
Event: Arm Architecture HPC Workshop by Linaro and HiSilicon
Location: Santa Clara, CA
Speaker: Andrew J Younge
Talk Title: Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Supercomputing
Talk Desc: The Vanguard program looks to expand the potential technology choices for leadership-class High Performance Computing (HPC) platforms, not only for the National Nuclear Security Administration (NNSA) but for the Department of Energy (DOE) and wider HPC community. Specifically, there is a need to expand the supercomputing ecosystem by investing and developing emerging, yet-to-be-proven technologies and address both hardware and software challenges together, as well as to prove-out the viability of such novel platforms for production HPC workloads.
The first deployment of the Vanguard program will be Astra, a prototype Petascale Arm supercomputer to be sited at Sandia National Laboratories during 2018. This talk will focus on the arthictecural details of Astra and the significant investments being made towards the maturing the Arm software ecosystem. Furthermore, we will share initial performance results based on our pre-general availability testbed system and outline several planned research activities for the machine.
Bio: Andrew Younge is a R&D Computer Scientist at Sandia National Laboratories with the Scalable System Software group. His research interests include Cloud Computing, Virtualization, Distributed Systems, and energy efficient computing. Andrew has a Ph.D in Computer Science from Indiana University, where he was the Persistent Systems fellow and a member of the FutureGrid project, an NSF-funded experimental cyberinfrastructure test-bed. Over the years, Andrew has held visiting positions at the MITRE Corporation, the University of Southern California / Information Sciences Institute, and the University of Maryland, College Park. He received his Bachelors and Masters of Science from the Computer Science Department at Rochester Institute of Technology (RIT) in 2008 and 2010, respectively.
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
Session ID: HKG18-501
Session Name: HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Speaker: Chris Redpath
Track: Mobile, Kernel
★ Session Summary ★
This session will introduce the changes to EAS planned for 4.14 kernel, and how Arm hopes that EAS will develop in future. EAS has already evolved from an Arm/Linaro joint project to involving a much wider community of SoC vendors, Google and interested device manufacturers. We will highlight the product-specific pieces remaining in the Android Common Kernel EAS implementation, and our plans to provide an upstreaming plan for each product feature. In particular, the new 'simplified energy model' is designed to provide mainline-friendliness and comparable performance using a simple DT expression of cpu power/performance.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-501/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-501.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-501.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Mobile, Kernel
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
"Session ID: HKG18-501
Session Name: HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Speaker: Chris Redpath
Track: Mobile, Kernel
★ Session Summary ★
This session will introduce the changes to EAS planned for 4.14 kernel, and how Arm hopes that EAS will develop in future. EAS has already evolved from an Arm/Linaro joint project to involving a much wider community of SoC vendors, Google and interested device manufacturers. We will highlight the product-specific pieces remaining in the Android Common Kernel EAS implementation, and our plans to provide an upstreaming plan for each product feature. In particular, the new 'simplified energy model' is designed to provide mainline-friendliness and comparable performance using a simple DT expression of cpu power/performance.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-501/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-501.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-501.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Mobile, Kernel
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
"Session ID: HKG18-315
Session Name: HKG18-315 - Why the ecosystem is a wonderful thing warts and all
Speaker: Andrew Wafaa
Track: Ecosystem Day
★ Session Summary ★
The Arm ecosystem is a vibrant place, but it's not always smooth sailing. This presentation will go through the highs and lows of getting the ecosystem fully Arm enabled.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-315/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-315.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-315.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Ecosystem Day
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
"Session ID: HKG18-115
Session Name: HKG18-115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Speaker: Jan Kiszka
Track: Security
★ Session Summary ★
The open source hypervisor Jailhouse provides hard partitioning of multicore systems to co-locate multiple Linux or RTOS instances side by side. It aims at low complexity and minimal footprint to achieve deterministic behavior and enable certifications according to safety or security standards. In this session, we would like to look at the ARM-specific status of Jailhouse and discuss applications, to-dos and possible collaborations around it with the ARM community. The session is intended to be half presentation, half Q&A / discussion.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-115/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-115.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-115.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Security
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
"Session ID: HKG18-TR08
Session Name: HKG18-TR08 - Upstreaming SVE in QEMU
Speaker: Alex Bennée,Richard Henderson
Track: Enterprise
★ Session Summary ★
ARM's Scalable Vector Extensions is an innovative solution to processing highly data parallel workloads. While several out-of-tree attempts at implementing SVE support for QEMU existed, we took a fundamentally different approach to solving key challenges and therefore pursued a from-scratch QEMU SVE implementation in Linaro. Our strategic choice was driven by several factors. First as an ""upstream first"" organisation we were focused on a solution that would be readily accepted by the upstream project. This entailed doing our development in the open on the project mailing lists where early feedback and community consensus can be reached.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-tr08/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-tr08.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-tr08.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Enterprise
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
HKG18-113- Secure Data Path work with i.MX8MLinaro
"Session ID: HKG18-113
Session Name: HKG18-113 - Secure Data Path work with i.MX8M
Speaker: Cyrille Fleury
Track: Digital Home
★ Session Summary ★
NXP presentation on Secure Data Path work with i.MX8M Soc. Demonstrate 4K PlayReady playback with Android 8.1 running on i.MX8M. Focus on security (MS SL3000 and Widevine level 1)
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-113/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-113.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-113.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Digital Home
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
"Session ID: HKG18-120
Session Name: HKG18-120 - Structured Documentation and Validation for Device Tree
Speaker: Grant Likely
Track: Kernel
★ Session Summary ★
Devicetree has become the dominant hardware configuration language used when building embedded systems. Projects using Devicetree now include Linux, U-Boot, Android, FreeBSD, and Zephyr. However, it is notoriously difficult to write correct Devicetree data files. The dtc tools perform limited tests for valid data, and there there is not yet a way to add validity test for specific hardware descriptions. Neither is there a good way to document requirements for specific bindings. Work is underway to solve these problems. This session will present a proposal for adding Devicetree schema files to the Devicetree toolchain that can be used to both validate data and produce usable documentation.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-120/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-120.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-120.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Kernel
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
"Session ID: HKG18-223
Session Name: HKG18-223 - Trusted Firmware M : Trusted Boot
Speaker: Tamas Ban
Track: LITE
★ Session Summary ★
An overview of the trusted boot concept and firmware update on the ARMv8-M based platform and how MCUBoot acts as a BL2 bootloader for TF-M.
Trusted Firmware M
In October 2017, Arm announced the vision of Platform Security Architecture (PSA) - a common framework to allow everyone in the IoT ecosystem to move forward with stronger, scalable security and greater confidence. There are three key stages to the Platform Security Architecture: Analysis, Architecture and Implementation which are described at https://developer.arm.com/products/architecture/platform-security-architecture.
_Trusted Firmware M, i.e. TF-M, is the Arm project to provide an open source reference implementation firmware that will conform to the PSA specification for M-Class devices. Early access to TF-M was released in December 2017 and it is being made public during Linaro Connect. The implementation should be considered a prototype until the PSA specifications reach release state and the code aligns._
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-223/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-223.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-223.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: LITE
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
1. ASIA 2013 (LCA13)
LEG – Hadoop DFS Performance
Steve Capper <steve.capper@linaro.org>
2. ASIA 2013 (LCA13)
www.linaro.org
Hadoop Performance on ARM
I concentrated on reducing the time taken for CPU bound
tasks (the latency).
My work, so far, has focussed on the underlying cluster
filesystem, HDFS, as this underpins a lot of Hadoop
workloads.
The latest release-tagged Hadoop at the time of
experimentation was 2.0.2-alpha. So this was the version
used. (2.0.3-alpha has just come out).
Hadoop installations consist of rather a lot of moving parts.
So please let me know if I should be concentrating on
something else! :-)
3. ASIA 2013 (LCA13)
www.linaro.org
Hadoop Distributed Filesystem – Overview
HDFS is the default Hadoop distributed filesystem.
Consists of multiple “nodes”:
Namenode – holds the metadata for the filesystem and keeps track of
the datanodes.
Only one namenode is active at a time per namespace.
Thus there is a high memory requirement for a namenode.
Datanodes – store the file's blocks.
The default block size is 64MB.
(Optional) Passive namenodes – To maintain an High Availability
configuration, one can set up shared storage (via NFS or by
journalnodes) and have namenodes on standby to failover.
Filesystem blocks are replicated between datanodes.
Datanodes can be “rack aware”; data will be distributed between
racks.
Nodes can run on the same machine or on different
machines.
4. ASIA 2013 (LCA13)
www.linaro.org
Data Integrity in HDFS
Hadoop mitigates against hardware failure in the following
ways:
Metadata can be saved to multiple filesystems and regular snapshots
are usually taken to allow for disaster recovery.
On some HA configurations, multiple journalnodes maintain a quorum of
the metadata.
Data blocks are replicated across multiple datanodes (preferably to
different racks).
Data blocks are regularly transmitted:
All data streams are checksummed.
The default checksum algorithm is CRC32c.
The default number of bytes per checksum is 512 bytes.
5. ASIA 2013 (LCA13)
www.linaro.org
Test Configuration
So far I have been micro-benchmarking single machine
workloads (1 name node, 1 data node, 1 workload).
I have been working on an A9 platform.
TestDFSIO -read and -write have been tested with a
10GB filesize.
Soft-float Oracle JDK 1.7.0_10 was used.
Performance was measured using Linux perf -a, Java
samples were taken with the inbuilt profiler:
-Xrunhprof:cpu=samples,depth=10
8. ASIA 2013 (LCA13)
www.linaro.org
Some analysis of plain TestDFSIO
The namenode doesn't do much when it's in charge of 1
datanode. (this will change when things scale up).
When writing data:
Most of the Java time is spent in PureJavaCrc32C.update
Most of the CPU time is spent in Java land.
When reading data:
Most of the Java time is spent in
NativeCrc32.nativeVerifyChunkedSums.
Most of the CPU time is in the native function: crc32c_sb8
For both reading and writing data:
~10% of the CPU time is spent copying memory.
9. ASIA 2013 (LCA13)
www.linaro.org
Optimising the read and write paths
Trevor Robinson has proposed a patch to improve the write
path:
HDFS-3529 “Use direct buffers for data in write path”
I have been working on speeding up the computation of
CRC32c checksums, by using NEON.
Also, I've performed some very preliminary experiments on
replacing PureJavaCrc32C with a JNI class that references
the NEON CRC code.
10. ASIA 2013 (LCA13)
www.linaro.org
NEON optimisation of the CRC32c algorithm
I worked on an algorithm Q4 last year:
https://wiki.linaro.org/LEG/Engineering/CRC
Given an input buffer, we perform polynomial multiplication and
addition to give a slighter smaller buffer with the same CRC. This is
referred to as “folding”.
The algorithm reduced the buffer by 32 bytes at a time.
The final buffer of 32 bytes was then processed by slice-by-8 as
normal.
I found that the vast majority of CRCs in Hadoop (in both
NativeCrc32.nativeVerifyChunkedSums and
PureJavaCrc32C.update) were computed for 512 byte buffers.
11. ASIA 2013 (LCA13)
www.linaro.org
A single 64 bit fold
32
64
64
M(x)32
A(x)B(x)
64 =A(x) * (x^64 mod P(x))
=B(x) * (x^96 mod P(x))
M'(x)
+
+
64
CRC(M'(x)) = CRC(M(x))
=
12. ASIA 2013 (LCA13)
www.linaro.org
NEON implementations
With ARMv7 NEON we can only perform polynomial
multiplication on 8 bit lanes:
We need to be able to multiply at least 32 bits, thus multiple vmull.p8's
were chained together to achieve this).
There are 16 vmull.p8's per fold, so a little register pressure!
A gcc intrinsic version has been coded up:
This is considerably simpler to look at.
Unfortunately, it runs a little slower than hand-optimised assembler.
A test case has been sent to the Linaro Toolchain Working group (a
couple of weeks ago) and this is being analysed by them.
13. ASIA 2013 (LCA13)
www.linaro.org
Replacing PureJavaCrc32C
PureJavaCrc32C has two noteworthy methods:
public void update(byte[] b, int off, int len) – called mostly for lengths
of 512.
public void update(int b) – seldom called.
I created a new class implementing the Checksum interface
that:
Used the same implementation of: update(int b) as PureJavaCrc32C.
Called straight into JNI for update(byte[] b, int off, int len).
The class name NativeCrc32 was already taken by
something else, so I chose a rather silly temporary name:
HyperCrc32C
14. ASIA 2013 (LCA13)
www.linaro.org
Dealing with byte [] in JNI
We are only reading from the byte[] array, and only for a very short
time.
Thus I pinned the buffer in memory with:
GetPrimitiveArrayCritical
Then subsequently released the buffer with
ReleasePrimitiveArrayCritical(..., JNI_ABORT)
This worked for me, but perhaps a better long term solution would
be to change the backing data type to a ByteBuffer?
We could also be clever and change the alignment of these?
Rather than test every optimisation individually in this talk; I am
going to put them all together.
15. ASIA 2013 (LCA13)
www.linaro.org
TestDFSIO read & write CPU usage
Plain TestDFSIO Write New TestDFSIO Write
Other
Java
memcpy
crc
Plain TestDFSIO Read New TestDFSIO Read
16. ASIA 2013 (LCA13)
www.linaro.org
Some Analysis of the New TestDFSIO runs
The name node samples are unchanged.
For the write path:
Most of the time is now spent running native code rather than Java
code.
There is a noticeable reduction in Hadoop user CPU usage.
For the read path:
Most of the time is still spent running native code.
There is again a reduction in Hadoop user CPU usage.
For both the read and write paths there is a significant
amount of CPU time spent copying memory around.
17. ASIA 2013 (LCA13)
www.linaro.org
Conclusions and Further Work
The CPU usage required for TestDFSIO runs has been
reduced for both read/write paths.
This gives us more CPU cycles to run Hadoop jobs with!
Hadoop is known to be very sensitive to underlying disk IO.
To optimise HDFS IO, it would make sense to optimise disk/filesystem
IO as much as possible and re-run these benchmarks.
As Hadoop runs under Java:
It makes sense to keep track of JVMs.
A beta hard-float JVM has been released.
The CPU usage for memcpy is making me uneasy!
26. More about Linaro Connect: www.linaro.org/connect/
More about Linaro: www.linaro.org/about/
More about Linaro engineering: www.linaro.org/engineering/
ASIA 2013 (LCA13)