Zhang, Zhuo, et al. "Fuxi: a fault-tolerant resource management and job scheduling system at internet scale." Proceedings of the VLDB Endowment 7.13 (2014): 1393-1404.
授業の輪行で発表したものです。専門外なので間違っているところがあるかもしれませんが誰かのためになれば幸いです。
Hadoop mapreduce performance study on arm clusterairbots
This presentation demonstrates a performance study of Hadoop MapReduce based on ARM cluster. It compared MapReduce applications performance and energy consumption between ARM cluster and general x86_64 cluster.
The document discusses using Hadoop for scientific workloads and summarizes early results from benchmarking Hadoop. It explores using Hadoop and MapReduce for data-intensive scientific applications like BLAST sequence analysis. Performance results show that Hadoop can provide comparable performance to existing parallel file systems. Challenges include lack of turn-key solutions, managing data formats, and performance tuning. The research aims to understand the unique needs of science clouds and how to effectively support data-intensive scientific applications on cloud platforms.
Overview of myHadoop 0.30, a framework for deploying Hadoop on existing high-performance computing infrastructure. Discussion of how to install it, spin up a Hadoop cluster, and use the new features.
myHadoop 0.30's project page is now on GitHub (https://github.com/glennklockwood/myhadoop) and the latest release tarball can be downloaded from my website (glennklockwood.com/files/myhadoop-0.30.tar.gz)
[212]big models without big data using domain specific deep networks in data-...NAVER D2
The document discusses techniques for using deep learning with limited data. It presents methods for data synthesis, domain adaptation, and data cleaning. For data synthesis, it describes using a game engine to procedurally generate synthetic videos with automatic annotations for action recognition training. For domain adaptation, it applies a model trained on mouse tracking saliency data to eye tracking data. For data cleaning, it introduces a technique to prune noisy images from a landmark dataset to obtain reliable training annotations. The techniques aim to leverage limited data to train deep networks for tasks like saliency mapping, image retrieval, and action recognition.
There have been plenty of “explaining EXPLAIN” type talks over the years, which provide a great introduction to it. They often also cover how to identify a few of the more common issues through it. EXPLAIN is a deep topic though, and to do a good introduction talk, you have to skip over a lot of the tricky bits. As such, this talk will not be a good introduction to EXPLAIN, but instead a deeper dive into some of the things most don’t cover. The idea is to start with some of the more complex and unintuitive calculations needed to work out the relationships between operations, rows, threads, loops, timings, buffers, CTEs and subplans. Most popular tools handle at least several of these well, but there are cases where they don’t that are worth being conscious of and alert to. For example, we’ll have a look at whether certain numbers are averaged per-loop or per-thread, or both. We’ll also cover a resulting rounding issue or two to be on the lookout for. Finally, some per-operation timing quirks are worth looking out for where CTEs and subqueries are concerned, for example CTEs that are referenced more than once. As time allows, we can also look at a few rarer issues that can be spotted via EXPLAIN, as well as a few more gotchas that we’ve picked up along the way. This includes things like spotting when the query is JIT, planning, or trigger time dominated, spotting the signs of table and index bloat, issues like lossy bitmap scans or index-only scans fetching from the heap, as well as some things to be aware of when using auto_explain.
This document describes Onyx, a new flexible and extensible data processing system. Onyx aims to address limitations in existing frameworks when dealing with new resource environments like disaggregated computing and transient resources. The Onyx architecture includes a compiler that transforms dataflow programs into optimized execution plans using various passes. The runtime then executes the plans across cluster resources. Onyx allows dynamic optimization by collecting metrics during execution and generating new plans. It can harness transient resources by placing tasks strategically.
With a long history of open innovation with Hadoop, Yahoo continues to invest in and expand the platform capabilities by pushing the boundaries of what the platform can accomplish for the entire organization. In this talk, Sumeet Singh will present some of the recent innovations, open source contributions, and where things are headed when it comes to Hadoop at Yahoo.
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
Presentation held at ICCS 2015 Conference - Reykjavik, Iceland
High throughput computing (HTC) has aided the scientific community in the analysis of vast amounts of data and computational jobs in distributed environments. To manage these large workloads, several systems have been developed to efficiently allocate and provide access to distributed resources. Many of these systems rely on job characteristics estimates (e.g., job runtime) to characterize the workload behavior, which in practice is hard to obtain. In this work, we perform an exploratory analysis of the CMS experiment workload using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict job characteristics based on the collected data. Experimental results show that our process estimates job runtime with 75% of accuracy on average, and produces nearly optimal predictions for disk and memory consumption.
More information: www.rafaelsilva.com
Hadoop mapreduce performance study on arm clusterairbots
This presentation demonstrates a performance study of Hadoop MapReduce based on ARM cluster. It compared MapReduce applications performance and energy consumption between ARM cluster and general x86_64 cluster.
The document discusses using Hadoop for scientific workloads and summarizes early results from benchmarking Hadoop. It explores using Hadoop and MapReduce for data-intensive scientific applications like BLAST sequence analysis. Performance results show that Hadoop can provide comparable performance to existing parallel file systems. Challenges include lack of turn-key solutions, managing data formats, and performance tuning. The research aims to understand the unique needs of science clouds and how to effectively support data-intensive scientific applications on cloud platforms.
Overview of myHadoop 0.30, a framework for deploying Hadoop on existing high-performance computing infrastructure. Discussion of how to install it, spin up a Hadoop cluster, and use the new features.
myHadoop 0.30's project page is now on GitHub (https://github.com/glennklockwood/myhadoop) and the latest release tarball can be downloaded from my website (glennklockwood.com/files/myhadoop-0.30.tar.gz)
[212]big models without big data using domain specific deep networks in data-...NAVER D2
The document discusses techniques for using deep learning with limited data. It presents methods for data synthesis, domain adaptation, and data cleaning. For data synthesis, it describes using a game engine to procedurally generate synthetic videos with automatic annotations for action recognition training. For domain adaptation, it applies a model trained on mouse tracking saliency data to eye tracking data. For data cleaning, it introduces a technique to prune noisy images from a landmark dataset to obtain reliable training annotations. The techniques aim to leverage limited data to train deep networks for tasks like saliency mapping, image retrieval, and action recognition.
There have been plenty of “explaining EXPLAIN” type talks over the years, which provide a great introduction to it. They often also cover how to identify a few of the more common issues through it. EXPLAIN is a deep topic though, and to do a good introduction talk, you have to skip over a lot of the tricky bits. As such, this talk will not be a good introduction to EXPLAIN, but instead a deeper dive into some of the things most don’t cover. The idea is to start with some of the more complex and unintuitive calculations needed to work out the relationships between operations, rows, threads, loops, timings, buffers, CTEs and subplans. Most popular tools handle at least several of these well, but there are cases where they don’t that are worth being conscious of and alert to. For example, we’ll have a look at whether certain numbers are averaged per-loop or per-thread, or both. We’ll also cover a resulting rounding issue or two to be on the lookout for. Finally, some per-operation timing quirks are worth looking out for where CTEs and subqueries are concerned, for example CTEs that are referenced more than once. As time allows, we can also look at a few rarer issues that can be spotted via EXPLAIN, as well as a few more gotchas that we’ve picked up along the way. This includes things like spotting when the query is JIT, planning, or trigger time dominated, spotting the signs of table and index bloat, issues like lossy bitmap scans or index-only scans fetching from the heap, as well as some things to be aware of when using auto_explain.
This document describes Onyx, a new flexible and extensible data processing system. Onyx aims to address limitations in existing frameworks when dealing with new resource environments like disaggregated computing and transient resources. The Onyx architecture includes a compiler that transforms dataflow programs into optimized execution plans using various passes. The runtime then executes the plans across cluster resources. Onyx allows dynamic optimization by collecting metrics during execution and generating new plans. It can harness transient resources by placing tasks strategically.
With a long history of open innovation with Hadoop, Yahoo continues to invest in and expand the platform capabilities by pushing the boundaries of what the platform can accomplish for the entire organization. In this talk, Sumeet Singh will present some of the recent innovations, open source contributions, and where things are headed when it comes to Hadoop at Yahoo.
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
Presentation held at ICCS 2015 Conference - Reykjavik, Iceland
High throughput computing (HTC) has aided the scientific community in the analysis of vast amounts of data and computational jobs in distributed environments. To manage these large workloads, several systems have been developed to efficiently allocate and provide access to distributed resources. Many of these systems rely on job characteristics estimates (e.g., job runtime) to characterize the workload behavior, which in practice is hard to obtain. In this work, we perform an exploratory analysis of the CMS experiment workload using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict job characteristics based on the collected data. Experimental results show that our process estimates job runtime with 75% of accuracy on average, and produces nearly optimal predictions for disk and memory consumption.
More information: www.rafaelsilva.com
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
This document describes HFSP, a fair scheduling protocol for Hadoop that aims to improve performance for interactive jobs. It does so by estimating job sizes and simulating a processor-sharing model to determine job completion order. Key aspects include initial job size estimation that is refined over time, treating map and reduce phases separately, and using OS signals to suspend and resume reduce tasks for preemption instead of waiting or killing tasks. Experiments on Facebook workload traces showed HFSP significantly reduced average job completion times compared to Hadoop's default scheduler, especially for smaller clusters.
The document describes benchmark results achieved by using NVMe SSDs and GPU acceleration to improve the performance of PostgreSQL beyond typical limitations. A benchmark was run using 13 queries on a 1055GB dataset with PostgreSQL v11beta3 + PG-Strom v2.1. This achieved a maximum query execution throughput of 13.5GB/s. PG-Strom is an extension module that uses thousands of GPU cores and wide-band memory to accelerate SQL workloads. It generates GPU code from SQL and executes queries directly on the GPU, bypassing data transfers between CPU and GPU to improve performance.
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
This document discusses scaling topological data analysis (TDA) using the Mapper algorithm to analyze large datasets. It describes how the authors built the first open-source scalable implementation of Mapper called Betti Mapper using Spark. Betti Mapper uses locality-sensitive hashing to bin data points and compute topological summaries on prototype points to achieve an 8-11x performance improvement over a naive Spark implementation. The key aspects of Betti Mapper that enable scaling to enterprise datasets are locality-sensitive hashing for sampling and using prototype points to reduce the distance matrix computation.
An Energy Efficient Demand- Response Model for High performance Computing SystemJason Liu
This document presents a demand-response model for high performance computing systems to participate in demand response programs. The model uses a job scheduler that prioritizes performance during normal operation but constraints power consumption and minimizes energy use during demand response events by adjusting CPU frequencies via DVFS. The model was evaluated using a simulator and found to reduce energy and turnaround time compared to performance-only scheduling while helping stabilize power systems during demand response periods.
This document discusses integrating XGBoost machine learning with Spark and DataFrames. It provides examples of using XGBoost in Spark to train models on distributed data and make predictions on streaming data in parallel. It also discusses future work, such as using Rabbit for parallel learning, adding support to more platforms like Windows, and integrating with Spark ML pipelines.
1. The document discusses multi-resource packing of tasks with dependencies to improve cluster scheduler performance. It describes problems with current schedulers related to resource fragmentation and over-allocation.
2. A packing heuristic is proposed that assigns tasks to machines based on an alignment score to reduce fragmentation and spread load. A job completion time heuristic is also described.
3. The paper presents results showing improvements in makespan and job completion times from approaches that consider dependent tasks and multiple resource demands compared to current schedulers. It also discusses achieving trade-offs between performance and fairness.
This document discusses Kubeflow operators and how they enable Kubeflow to support multiple machine learning frameworks like TensorFlow, PyTorch, MXNet, and Chainer. It explains that operators and custom resource definitions (CRDs) allow ML jobs to be defined and managed for different frameworks. It provides examples of how jobs are defined for TensorFlow using TFJobs and for Chainer using ChainerJobs. It also summarizes how operators work by expanding the custom resources into Kubernetes objects like pods, services, and statefulsets.
PostgreSQL uses MVCC which creates multiple versions of rows during updates and deletes. This leads to bloat and fragmentation over time as unused row versions accumulate. The VACUUM command performs garbage collection to recover space from dead rows. HOT updates and pruning help reduce bloat by avoiding index bloat during certain updates. Future improvements include parallel and eager vacuuming as well as pluggable storage engines like zheap to further reduce bloat.
OpenStack is a cloud computing platform. OpenStack provides an Infrastructure as a Service
(IaaS) and constitutes of resources such as compute, storage and network resources. Resource
allocation in cloud environment deals with assigning available resources in cost effective manner.
Resource allocation in OpenStack is carried out by nova-scheduler. Logically scheduler allocates
compute, network and storage resources to instance requests made by users of OpenStack. For
efficient and cost-effective use of scheduler resource pool can be created and maintained which
contains best possible hosts available. This paper presents a time saving way for allocating resources
for large deployments of cloud in the form of virtual machines
This document discusses using Jupyter Notebook for machine learning projects with Spark. It describes running Python, Spark, and pandas code in Jupyter notebooks to work with data from various sources and build machine learning models. Key points include using notebooks for an ML pipeline, running Spark jobs, visualizing data, and building word embedding models with Spark. The document emphasizes how Jupyter notebooks allow integrating various tools for an ML workflow.
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
The document describes a resource-aware scheduler for Hadoop that aims to improve task scheduling by considering both job resource demands and node resource availability. It captures job and node profiles, estimates task execution times, and applies scheduling policies like shortest job first. Evaluation on word count and Pi estimation workloads showed the estimated task times closely matched the actual times, demonstrating the accuracy of the scheduler's resource modeling and estimations.
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...Altinity Ltd
This document discusses Instana's use of ClickHouse as a data store for application monitoring data. It provides an overview of Instana's APM capabilities, how it collects and analyzes data using ClickHouse, and lessons learned from operating ClickHouse clusters at scale. Key points include that Instana uses ClickHouse to store trace data and generate dashboards, ClickHouse helps process millions of records quickly, and Instana monitors ClickHouse using its own APM platform to gain end-to-end visibility of performance.
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitCarlo C. del Mundo
The document discusses the motivation for developing the Tensor Processing Unit (TPU), which was that DNN-based workloads were consuming a large and growing portion of datacenter compute resources. It describes how the TPU was developed by Norman Jouppi and others at Google to be much more efficient than CPUs and GPUs for DNN workloads, with up to 80x higher performance per watt. It provides details on the TPU architecture and experimental results showing it significantly outperformed GPUs on latency for DNN inference tasks.
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
Ryu Kobayashi from Treasure Data gave a presentation on using YARN (Yet Another Resource Negotiator) with Hadoop. Some key points:
- YARN was introduced to improve Hadoop resource management by separating processing from scheduling.
- Configuration changes are required when moving from MRv1 to YARN, including properties for memory allocation and scheduler configuration.
- Container execution, directories, and other components were adapted in the transition from JobTracker to the ResourceManager and NodeManager architecture in YARN.
- Proper configuration of YARN is important to avoid bugs, and tools from distributions can help with configuration.
This document discusses the development of Apache Pig on Tez, an execution engine for Pig jobs. Pig on Tez allows Pig workflows to be executed as directed acyclic graphs (DAGs) using Tez, improving performance over the default MapReduce execution. Key benefits of Tez include eliminating intermediate data writes, reducing job launch overhead, and allowing more flexible data flows. However, challenges remain around automatically determining optimal parallelism and integrating Tez with user interface and monitoring tools. Future work is needed to address these issues.
This document discusses running Spark on the cloud, including the advantages, challenges, and how Qubole addresses them. Some key advantages include using S3 for storage which allows independent scaling of storage and compute, ability to create ephemeral clusters on demand, and autoscaling capabilities. Challenges involve cluster lifecycle management, different interfaces needed, Spark autoscaling, debuggability across clusters, and handling spot instances. Qubole provides tools that automate cluster management, enable autoscaling of Spark, and make experiences seamless across clusters and interfaces.
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
Fast, Cheap and Deep – Scaling Machine Learning: Distributed high throughput machine learning is both a challenge and a key enabling technology. Using a Parameter Server template we are able to distribute algorithms efficiently over multiple GPUs and in the cloud. This allows us to design very fast recommender systems, factorization machines, classifiers, and deep networks. This degree of scalability allows us to tackle computationally expensive problems efficiently, yielding excellent results e.g. in visual question answering.
This document describes HFSP, a fair scheduling protocol for Hadoop that aims to improve performance for interactive jobs. It does so by estimating job sizes and simulating a processor-sharing model to determine job completion order. Key aspects include initial job size estimation that is refined over time, treating map and reduce phases separately, and using OS signals to suspend and resume reduce tasks for preemption instead of waiting or killing tasks. Experiments on Facebook workload traces showed HFSP significantly reduced average job completion times compared to Hadoop's default scheduler, especially for smaller clusters.
The document describes benchmark results achieved by using NVMe SSDs and GPU acceleration to improve the performance of PostgreSQL beyond typical limitations. A benchmark was run using 13 queries on a 1055GB dataset with PostgreSQL v11beta3 + PG-Strom v2.1. This achieved a maximum query execution throughput of 13.5GB/s. PG-Strom is an extension module that uses thousands of GPU cores and wide-band memory to accelerate SQL workloads. It generates GPU code from SQL and executes queries directly on the GPU, bypassing data transfers between CPU and GPU to improve performance.
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
This document discusses scaling topological data analysis (TDA) using the Mapper algorithm to analyze large datasets. It describes how the authors built the first open-source scalable implementation of Mapper called Betti Mapper using Spark. Betti Mapper uses locality-sensitive hashing to bin data points and compute topological summaries on prototype points to achieve an 8-11x performance improvement over a naive Spark implementation. The key aspects of Betti Mapper that enable scaling to enterprise datasets are locality-sensitive hashing for sampling and using prototype points to reduce the distance matrix computation.
An Energy Efficient Demand- Response Model for High performance Computing SystemJason Liu
This document presents a demand-response model for high performance computing systems to participate in demand response programs. The model uses a job scheduler that prioritizes performance during normal operation but constraints power consumption and minimizes energy use during demand response events by adjusting CPU frequencies via DVFS. The model was evaluated using a simulator and found to reduce energy and turnaround time compared to performance-only scheduling while helping stabilize power systems during demand response periods.
This document discusses integrating XGBoost machine learning with Spark and DataFrames. It provides examples of using XGBoost in Spark to train models on distributed data and make predictions on streaming data in parallel. It also discusses future work, such as using Rabbit for parallel learning, adding support to more platforms like Windows, and integrating with Spark ML pipelines.
1. The document discusses multi-resource packing of tasks with dependencies to improve cluster scheduler performance. It describes problems with current schedulers related to resource fragmentation and over-allocation.
2. A packing heuristic is proposed that assigns tasks to machines based on an alignment score to reduce fragmentation and spread load. A job completion time heuristic is also described.
3. The paper presents results showing improvements in makespan and job completion times from approaches that consider dependent tasks and multiple resource demands compared to current schedulers. It also discusses achieving trade-offs between performance and fairness.
This document discusses Kubeflow operators and how they enable Kubeflow to support multiple machine learning frameworks like TensorFlow, PyTorch, MXNet, and Chainer. It explains that operators and custom resource definitions (CRDs) allow ML jobs to be defined and managed for different frameworks. It provides examples of how jobs are defined for TensorFlow using TFJobs and for Chainer using ChainerJobs. It also summarizes how operators work by expanding the custom resources into Kubernetes objects like pods, services, and statefulsets.
PostgreSQL uses MVCC which creates multiple versions of rows during updates and deletes. This leads to bloat and fragmentation over time as unused row versions accumulate. The VACUUM command performs garbage collection to recover space from dead rows. HOT updates and pruning help reduce bloat by avoiding index bloat during certain updates. Future improvements include parallel and eager vacuuming as well as pluggable storage engines like zheap to further reduce bloat.
OpenStack is a cloud computing platform. OpenStack provides an Infrastructure as a Service
(IaaS) and constitutes of resources such as compute, storage and network resources. Resource
allocation in cloud environment deals with assigning available resources in cost effective manner.
Resource allocation in OpenStack is carried out by nova-scheduler. Logically scheduler allocates
compute, network and storage resources to instance requests made by users of OpenStack. For
efficient and cost-effective use of scheduler resource pool can be created and maintained which
contains best possible hosts available. This paper presents a time saving way for allocating resources
for large deployments of cloud in the form of virtual machines
This document discusses using Jupyter Notebook for machine learning projects with Spark. It describes running Python, Spark, and pandas code in Jupyter notebooks to work with data from various sources and build machine learning models. Key points include using notebooks for an ML pipeline, running Spark jobs, visualizing data, and building word embedding models with Spark. The document emphasizes how Jupyter notebooks allow integrating various tools for an ML workflow.
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
The document describes a resource-aware scheduler for Hadoop that aims to improve task scheduling by considering both job resource demands and node resource availability. It captures job and node profiles, estimates task execution times, and applies scheduling policies like shortest job first. Evaluation on word count and Pi estimation workloads showed the estimated task times closely matched the actual times, demonstrating the accuracy of the scheduler's resource modeling and estimations.
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...Altinity Ltd
This document discusses Instana's use of ClickHouse as a data store for application monitoring data. It provides an overview of Instana's APM capabilities, how it collects and analyzes data using ClickHouse, and lessons learned from operating ClickHouse clusters at scale. Key points include that Instana uses ClickHouse to store trace data and generate dashboards, ClickHouse helps process millions of records quickly, and Instana monitors ClickHouse using its own APM platform to gain end-to-end visibility of performance.
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitCarlo C. del Mundo
The document discusses the motivation for developing the Tensor Processing Unit (TPU), which was that DNN-based workloads were consuming a large and growing portion of datacenter compute resources. It describes how the TPU was developed by Norman Jouppi and others at Google to be much more efficient than CPUs and GPUs for DNN workloads, with up to 80x higher performance per watt. It provides details on the TPU architecture and experimental results showing it significantly outperformed GPUs on latency for DNN inference tasks.
Treasure Data on The YARN - Hadoop Conference Japan 2014Ryu Kobayashi
Ryu Kobayashi from Treasure Data gave a presentation on using YARN (Yet Another Resource Negotiator) with Hadoop. Some key points:
- YARN was introduced to improve Hadoop resource management by separating processing from scheduling.
- Configuration changes are required when moving from MRv1 to YARN, including properties for memory allocation and scheduler configuration.
- Container execution, directories, and other components were adapted in the transition from JobTracker to the ResourceManager and NodeManager architecture in YARN.
- Proper configuration of YARN is important to avoid bugs, and tools from distributions can help with configuration.
This document discusses the development of Apache Pig on Tez, an execution engine for Pig jobs. Pig on Tez allows Pig workflows to be executed as directed acyclic graphs (DAGs) using Tez, improving performance over the default MapReduce execution. Key benefits of Tez include eliminating intermediate data writes, reducing job launch overhead, and allowing more flexible data flows. However, challenges remain around automatically determining optimal parallelism and integrating Tez with user interface and monitoring tools. Future work is needed to address these issues.
This document discusses running Spark on the cloud, including the advantages, challenges, and how Qubole addresses them. Some key advantages include using S3 for storage which allows independent scaling of storage and compute, ability to create ephemeral clusters on demand, and autoscaling capabilities. Challenges involve cluster lifecycle management, different interfaces needed, Spark autoscaling, debuggability across clusters, and handling spot instances. Qubole provides tools that automate cluster management, enable autoscaling of Spark, and make experiences seamless across clusters and interfaces.
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward
In 2016, we introduced Alibaba’s compute engine Blink which was based on our private branch of flink. It enalbed many large scale applications in Alibaba’s core business, such as search, recommendation and ads. With the deep and close colaboration with the flink community, we are finally close to contribute our improvements back to the flink community. In this talk, we will present our key contributions to flink runtime recently, such as the new YARN cluster mode for Flip-6, fine-grained failover for Flip-1, async i/o for Flip-12, incremental checkpoint, and the further improvements plan from Alibaba in the near future. Moreover, we will show some production use cases to illustrate how flink works in Alibaba’s large scale online applications, which includes real-time ETL as well as online machine learning. This talk is presented by Alibaba.
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
FireWorks is a workflow management system that allows researchers to define and execute complex computational materials science workflows on local or remote computing resources in an automated manner. It provides features such as error detection and recovery, job scheduling, provenance tracking, and remote file access. The atomate library builds on FireWorks to provide a high-level interface for common materials simulation procedures like structure optimization, band structure calculation, and property prediction using popular codes like VASP. Together, these tools aim to make high-throughput computational materials discovery and design more accessible to researchers.
Clustering can provide high availability and scalability. Shared nothing architectures are best for achieving both high availability and scalability together. Oracle Real Application Cluster (RAC) offers advantages over alternative Oracle clustering configurations, but its scalability is limited. The cost-effectiveness of using RAC in a redundant array of inexpensive servers configuration is small due to its limited scalability. Alternatives may be more suitable depending on specific needs and requirements.
The document discusses Oracle database performance tuning. It covers identifying and resolving performance issues through tools like AWR and ASH reports. Common causes of performance problems include wait events, old statistics, incorrect execution plans, and I/O issues. The document recommends collecting specific data when analyzing problems and provides references and scripts for further tuning tasks.
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
This document summarizes a presentation about using Docker and Nextflow to create reproducible computational pipelines. It discusses two major challenges in computational biology being reproducibility and complexity. Containers like Docker help address these challenges by creating portable and standardized environments. Nextflow is introduced as a workflow framework that allows pipelines to run across platforms and isolates dependencies using containers, enabling fast prototyping. Examples are given of using Nextflow with Docker to run pipelines on different systems like HPC clusters in a scalable and reproducible way.
Streaming in Practice - Putting Apache Kafka in Productionconfluent
This presentation focuses on how to integrate all these components into an enterprise environment and what things you need to consider as you move into production.
We will touch on the following topics:
- Patterns for integrating with existing data systems and applications
- Metadata management at enterprise scale
- Tradeoffs in performance, cost, availability and fault tolerance
- Choosing which cross-datacenter replication patterns fit with your application
- Considerations for operating Kafka-based data pipelines in production
Speeding Up Spark Performance using Alluxio at China UnicomAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Speeding Up Spark Performance using Alluxio at China Unicom
Ce Zhang, Big Data Engineer (China Unicom)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013James McGalliard
This document discusses various workload scheduling alternatives for high performance computing environments. It begins by describing typical HPC workloads and challenges in scheduling large parallel jobs. It then covers scheduling techniques like backfill and frameworks like MapReduce and Hadoop. Alternative prioritization methods are proposed, like prioritizing based on estimated run time, wait time, or number of processors requested. The document concludes by showing results comparing different dynamic prioritization approaches.
2014 04-17 Applied SCAP, Red Hat Summit 2014Shawn Wells
The document outlines a 45 minute presentation with 3 goals: 1) detail security automation technology and initiatives including OpenSCAP, configuration compliance using SCAP Security Guides, and evolving remediation capabilities; 2) provide a live demo of configuration compliance scanning, patch and vulnerability scanning, and certification/accreditation paperwork generation; 3) discuss the roadmap for government plans, packaging, and future profiles. It then provides an overview of SCAP, the SCAP Security Guide project and contributors, and remediation capabilities including both bash and puppet approaches.
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Safe Software
Dive deep into the world of geospatial data management and transformation in our upcoming webinar focusing on the powerful integration of FME and Esri technologies. This insightful session comprises two compelling segments aimed at enhancing your geospatial workflows, while minimizing operational hurdles.
In the first segment, guest speaker Jan Roggisch from Locus unveils how Auckland Council triumphed over the challenges of handling large, frequent data updates on ArcGIS Online using FME. Discover the journey from manual data handling to an automated, streamlined process that reduced server downtime from minutes to seconds: setting a new standard for local government organizations.
The second segment, led by James Botterill from 1Spatial, unveils the magic of incorporating ArcPy into your FME workflows. Delve into real-world scenarios where ArcGIS geoprocessing is harmoniously orchestrated within FME using the PythonCaller. Gain insights into raster-vector data conversion, spatial analysis, and a host of practical tips and tricks that empower you to leverage the combined capabilities of FME and Esri for efficient data manipulation and conversion.
Join us to explore the remarkable possibilities that open up when FME and Esri technologies converge – enhancing your ability to manage and transform geospatial data with unprecedented efficiency.
Geospatial Synergy: Amplifying Efficiency with FME & EsriSafe Software
Dive deep into the world of geospatial data management and transformation in our upcoming webinar focusing on the powerful integration of FME and Esri technologies. This insightful session comprises two compelling segments aimed at enhancing your geospatial workflows, while minimizing operational hurdles.
In the first segment, guest speaker Jan Roggisch from Locus unveils how Auckland Council triumphed over the challenges of handling large, frequent data updates on ArcGIS Online using FME. Discover the journey from manual data handling to an automated, streamlined process that reduced server downtime from minutes to seconds: setting a new standard for local government organizations.
The second segment, led by James Botterill from 1Spatial, unveils the magic of incorporating ArcPy into your FME workflows. Delve into real-world scenarios where ArcGIS geoprocessing is harmoniously orchestrated within FME using the PythonCaller. Gain insights into raster-vector data conversion, spatial analysis, and a host of practical tips and tricks that empower you to leverage the combined capabilities of FME and Esri for efficient data manipulation and conversion.
Join us to explore the remarkable possibilities that open up when FME and Esri technologies converge – enhancing your ability to manage and transform geospatial data with unprecedented efficiency.
Yahoo migrated most of its Pig workload from MapReduce to Tez to achieve significant performance improvements and resource utilization gains. Some key challenges in the migration included addressing misconfigurations, bad programming practices, and behavioral changes between the frameworks. Yahoo was able to run very large and complex Pig on Tez jobs involving hundreds of vertices and terabytes of data smoothly at scale. Further optimizations are still needed around speculative execution and container reuse to improve utilization even more. The migration to Tez resulted in up to 30% reduction in runtime, memory, and CPU usage for Yahoo's Pig workload.
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
Over the years, there has been extensive and continuous effort on improving Spark SQL’s query optimizer and planner, in order to generate high quality query execution plans. One of the biggest improvements is the cost-based optimization framework that collects and leverages a variety of data statistics (e.g., row count, number of distinct values, NULL values, max/min values, etc.) to help Spark make better decisions in picking the most optimal query plan.
Building a big data intelligent application on top of xPatterns using tools that leverage Spark, Shark, Mesos, Tachyon and Cassandra. Jaws, open sourcing our own spark sql restful service and our own contributions to the Spark and Mesos projects, lessons learned
The document provides an overview of the Hadoop platform at Yahoo over the past year. It discusses the evolution of the platform infrastructure and metrics including growth in storage from 12PB to 65PB and compute capacity from 23TB to 240TB. It highlights new technologies added to the platform like CaffeOnSpark for distributed deep learning, Apache Storm for streaming analytics, and data sketches algorithms. It also discusses enhancements to existing technologies like HBase for transactions with Omid and improvements to Oozie for data pipelines. The document aims to provide insights on how the Hadoop platform at Yahoo has scaled to support growing analytics needs through consolidation, new services, and ease of use features.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
47. DAG ?( )
4. Fault Tolerant Job
Scheduling
: DAG
• A, B:
• C: A, B
A B
C
C A, B
DAG
From; https://www.quora.com/What-are-the-advantages-of-DAG-directed-acyclic-graph-execution-of-big-data-algorithms-over-MapReduce-I-know-that-Apache-Spark-Storm-and-
Tez-use-the-DAG-execution-model-over-MapReduce-Why-Are-there-any-disadvantages
47
53. (1) FuxiMaster4. Fault Tolerant Job
Scheduling
:
job description machine blacklist
hard state
soft state
FuxiMaster FuxiAgent AppMaster
hard state FuxiMaster soft state
53