1. The document discusses multi-resource packing of tasks with dependencies to improve cluster scheduler performance. It describes problems with current schedulers related to resource fragmentation and over-allocation.
2. A packing heuristic is proposed that assigns tasks to machines based on an alignment score to reduce fragmentation and spread load. A job completion time heuristic is also described.
3. The paper presents results showing improvements in makespan and job completion times from approaches that consider dependent tasks and multiple resource demands compared to current schedulers. It also discusses achieving trade-offs between performance and fairness.
Yahoo migrated most of its Pig workload from MapReduce to Tez to achieve significant performance improvements and resource utilization gains. Some key challenges in the migration included addressing misconfigurations, bad programming practices, and behavioral changes between the frameworks. Yahoo was able to run very large and complex Pig on Tez jobs involving hundreds of vertices and terabytes of data smoothly at scale. Further optimizations are still needed around speculative execution and container reuse to improve utilization even more. The migration to Tez resulted in up to 30% reduction in runtime, memory, and CPU usage for Yahoo's Pig workload.
Hive on Tez provides significant performance improvements over Hive on MapReduce by leveraging Apache Tez for query execution. Key features of Hive on Tez include vectorized processing, dynamic partitioned hash joins, and broadcast joins which avoid unnecessary data writes to HDFS. Test results show Hive on Tez queries running up to 100x faster on datasets ranging from terabytes to petabytes in size. Hive on Tez also handles concurrency well, with the ability to run 20 queries concurrently on a 30TB dataset and finish within 27.5 minutes.
This document provides performance optimization tips for Hadoop jobs, including recommendations around compression, speculative execution, number of maps/reducers, block size, sort size, JVM tuning, and more. It suggests how to configure properties like mapred.compress.map.output, mapred.map/reduce.tasks.speculative.execution, and dfs.block.size based on factors like cluster size, job characteristics, and data size. It also identifies antipatterns to avoid like processing thousands of small files or using many maps with very short runtimes.
This document discusses Pig on Tez, which runs Pig jobs on the Tez execution engine rather than MapReduce. The team introduces Pig and Tez, describes the design of Pig on Tez including logical and physical plans, custom vertices and edges, and performance optimizations like broadcast edges and object caching. Performance results show speedups of 1.5x to 6x over MapReduce. Current status is 90% feature parity with Pig on MR and future work includes supporting Tez local mode and improving stability, usability, and performance further.
This document discusses the integration of Apache Pig with Apache Tez. Pig provides a procedural scripting language for data processing workflows, while Tez is a framework for executing directed acyclic graphs (DAGs) of tasks. Migrating Pig to use Tez as its execution engine provides benefits like reduced resource usage, improved performance, and container reuse compared to Pig's default MapReduce execution. The document outlines the design changes needed to compile Pig scripts to Tez DAGs and provides examples and performance results. It also discusses ongoing work to achieve full feature parity with MapReduce and further optimize performance.
This document provides a technical introduction to Hadoop, including:
- Hadoop has been tested on a 4000 node cluster with 32,000 cores and 16 petabytes of storage.
- Key Hadoop concepts are explained, including jobs, tasks, task attempts, mappers, reducers, and the JobTracker and TaskTrackers.
- The process of launching a MapReduce job is described, from the client submitting the job to the JobTracker distributing tasks to TaskTrackers and running the user-defined mapper and reducer classes.
This is slides from our recent HadoopIsrael meetup. It is dedicated to comparison Spark and Tez frameworks.
In the end of the meetup there is small update about our ImpalaToGo project.
Yahoo migrated most of its Pig workload from MapReduce to Tez to achieve significant performance improvements and resource utilization gains. Some key challenges in the migration included addressing misconfigurations, bad programming practices, and behavioral changes between the frameworks. Yahoo was able to run very large and complex Pig on Tez jobs involving hundreds of vertices and terabytes of data smoothly at scale. Further optimizations are still needed around speculative execution and container reuse to improve utilization even more. The migration to Tez resulted in up to 30% reduction in runtime, memory, and CPU usage for Yahoo's Pig workload.
Hive on Tez provides significant performance improvements over Hive on MapReduce by leveraging Apache Tez for query execution. Key features of Hive on Tez include vectorized processing, dynamic partitioned hash joins, and broadcast joins which avoid unnecessary data writes to HDFS. Test results show Hive on Tez queries running up to 100x faster on datasets ranging from terabytes to petabytes in size. Hive on Tez also handles concurrency well, with the ability to run 20 queries concurrently on a 30TB dataset and finish within 27.5 minutes.
This document provides performance optimization tips for Hadoop jobs, including recommendations around compression, speculative execution, number of maps/reducers, block size, sort size, JVM tuning, and more. It suggests how to configure properties like mapred.compress.map.output, mapred.map/reduce.tasks.speculative.execution, and dfs.block.size based on factors like cluster size, job characteristics, and data size. It also identifies antipatterns to avoid like processing thousands of small files or using many maps with very short runtimes.
This document discusses Pig on Tez, which runs Pig jobs on the Tez execution engine rather than MapReduce. The team introduces Pig and Tez, describes the design of Pig on Tez including logical and physical plans, custom vertices and edges, and performance optimizations like broadcast edges and object caching. Performance results show speedups of 1.5x to 6x over MapReduce. Current status is 90% feature parity with Pig on MR and future work includes supporting Tez local mode and improving stability, usability, and performance further.
This document discusses the integration of Apache Pig with Apache Tez. Pig provides a procedural scripting language for data processing workflows, while Tez is a framework for executing directed acyclic graphs (DAGs) of tasks. Migrating Pig to use Tez as its execution engine provides benefits like reduced resource usage, improved performance, and container reuse compared to Pig's default MapReduce execution. The document outlines the design changes needed to compile Pig scripts to Tez DAGs and provides examples and performance results. It also discusses ongoing work to achieve full feature parity with MapReduce and further optimize performance.
This document provides a technical introduction to Hadoop, including:
- Hadoop has been tested on a 4000 node cluster with 32,000 cores and 16 petabytes of storage.
- Key Hadoop concepts are explained, including jobs, tasks, task attempts, mappers, reducers, and the JobTracker and TaskTrackers.
- The process of launching a MapReduce job is described, from the client submitting the job to the JobTracker distributing tasks to TaskTrackers and running the user-defined mapper and reducer classes.
This is slides from our recent HadoopIsrael meetup. It is dedicated to comparison Spark and Tez frameworks.
In the end of the meetup there is small update about our ImpalaToGo project.
The document summarizes new features and improvements in Pig 0.11, including the CUBE and rank operators, support for Groovy UDFs, a new DateTime data type, schema tuple optimization, compatibility with JDK 7 and Windows, faster local mode, and better statistics reporting.
This document provides recommendations for improving performance in a big data environment. It suggests:
1. Increasing the replication factor from 3 to improve data availability.
2. Adjusting YARN scheduler settings like minimum and maximum allocation to improve memory usage.
3. Allocating memory and cores to the application master to improve job performance.
4. Setting the JVM reuse property to reduce JVM overhead for tasks.
5. Increasing the minimum splits size for map output to reduce overhead of multiple files.
6. Increasing the block size from 128MB to 256MB to improve job performance on large data.
Pig is a data flow language that sits on top of Hadoop and allows users to quickly process large volumes of data across many servers simultaneously. It supports relational features like joins, groups, and aggregates, making it well-suited for extract, transform, load (ETL) tasks. Common ETL use cases for Pig include time-sensitive data loads from various sources into databases, and processing multiple data sources to gain insights into customer behavior. While Pig can handle ETL tasks, it is also capable of sampling large datasets for analysis and providing analytical insights beyond basic ETL functions.
apache pig performance optimizations talk at apachecon 2010Thejas Nair
Pig provides a high-level language called Pig Latin for analyzing large datasets. It optimizes Pig Latin scripts by restructuring the logical query plan through techniques like predicate pushdown and operator rewriting, and by generating efficient physical execution plans that leverage features like combiners, different join algorithms, and memory management. Future work aims to improve memory usage and allow joins and groups within a single MapReduce job when keys are the same.
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
We share our slides about Apache Tez delivered as a lightening talk given at Warsaw Hadoop User Group http://www.meetup.com/warsaw-hug/events/218579675
This document describes the MapReduce programming model for processing large datasets in a distributed manner. MapReduce allows users to write map and reduce functions that are automatically parallelized and run across large clusters. The input data is split and the map tasks run in parallel, producing intermediate key-value pairs. These are shuffled and input to the reduce tasks, which produce the final output. The system handles failures, scheduling and parallelization transparently, making it easy for programmers to write distributed applications.
This document summarizes a presentation on using indexes in Hive to accelerate query performance. It describes how indexes provide an alternative view of data to enable faster lookups compared to full data scans. Example queries demonstrating group by and aggregation are rewritten to use an index on the shipdate column. Performance tests on TPC-H data show the indexed queries outperforming the non-indexed versions by an order of magnitude. Future work is needed to expand rewrite rules and integrate indexing fully into Hive's optimizer.
This was the first session about Hadoop and MapReduce. It introduces what Hadoop is and its main components. It also covers the how to program your first MapReduce task and how to run it on pseudo distributed Hadoop installation.
This session was given in Arabic and i may provide a video for the session soon.
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
The document discusses MapReduce, a framework for processing large datasets in parallel. It provides an overview of MapReduce's basic principles, surveys research to improve the conventional MapReduce framework, and describes research projects ongoing at KAIST. The key points are that MapReduce provides automatic parallelization, fault tolerance, and distributed processing of large datasets across commodity computer clusters. It also introduces the map and reduce functions that define MapReduce jobs.
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
The document describes a resource-aware scheduler for Hadoop that aims to improve task scheduling by considering both job resource demands and node resource availability. It captures job and node profiles, estimates task execution times, and applies scheduling policies like shortest job first. Evaluation on word count and Pi estimation workloads showed the estimated task times closely matched the actual times, demonstrating the accuracy of the scheduler's resource modeling and estimations.
Speaking of big data analysis, what comes to mind is possibly using HDFS and MapReduce within Hadoop. But to write a MapReduce program, one must face the problem of learning how to write native java. One might wonder is it possible to use R, the most popular language adapted by data scientist, to implement MapReduce program? And through the integration or R and Hadoop, is it truly one can unleash the power of parallel computing and big data analysis?
This slide introduces how to install RHadoop step by step, and introduces how to write a MapReduce program through R. What is more, this slide will discuss whether RHadoop is really a light for big data analysis, or just another method to write MapReduce Program.
Please mail me if you found any problem toward the slide. EMAIL: tr.ywchiu@gmail.com
談到巨量資料,通常大家腦海中聯想到的就是使用Hadoop 的 MapReduce 和HDFS,但是撰寫MapReduce,則就必須要學會撰寫Java 或透過Thrift 接口才能撰寫。但R是否有辦法運行在Hadoop 上呢 ? 而使用R + Hadoop,是否就真的能結合R強大的分析功能,分析巨量資料呢 ?
本次講題將介紹如何Step by step 在Hadoop 上安裝RHadoop相關套件,並介紹如何撰寫R的MapReduce 程式。更重要的是,此次將探討使用RHadoop 是否為巨量資料分析找到一盞明燈? 或者只是另一套實作方法而已?
Apache Tez is a framework for executing data processing jobs on Hadoop clusters. It allows expressing jobs as directed acyclic graphs (DAGs) which enables optimizations like running jobs as a single logical unit rather than separate MapReduce jobs. The presentation covered Tez features like container reuse, dynamic parallelism, and integration with YARN and ATS for monitoring. It also discussed ongoing work to improve performance through speculation, intermediate file formats, and shuffle optimizations, as well as better debuggability using tools like the Tez UI.
This document provides a high-level overview of MapReduce and Hadoop. It begins with an introduction to MapReduce, describing it as a distributed computing framework that decomposes work into parallelized map and reduce tasks. Key concepts like mappers, reducers, and job tracking are defined. The structure of a MapReduce job is then outlined, showing how input is divided and processed by mappers, then shuffled and sorted before being combined by reducers. Example map and reduce functions for a word counting problem are presented to demonstrate how a full MapReduce job works.
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
Optimal Execution Of MapReduce Jobs In Cloud
Anshul Aggarwal, Software Engineer, Cisco Systems
Session Length: 1 Hour
Tue March 10 21:30 PST
Wed March 11 0:30 EST
Wed March 11 4:30:00 UTC
Wed March 11 10:00 IST
Wed March 11 15:30 Sydney
Voices 2015 www.globaltechwomen.com
We use MapReduce programming paradigm because it lends itself well to most data-intensive analytics jobs run on cloud these days, given its ability to scale-out and leverage several machines to parallel process data. Research has demonstrates that existing approaches to provisioning other applications in the cloud are not immediately relevant to MapReduce -based applications. Provisioning a MapReduce job entails requesting optimum number of resource sets (RS) and configuring MapReduce parameters such that each resource set is maximally utilized.
Each application has a different bottleneck resource (CPU :Disk :Network), and different bottleneck resource utilization, and thus needs to pick a different combination of these parameters based on the job profile such that the bottleneck resource is maximally utilized.
The problem at hand is thus defining a resource provisioning framework for MapReduce jobs running in a cloud keeping in mind performance goals such as Optimal resource utilization with Minimum incurred cost, Lower execution time, Energy Awareness, Automatic handling of node failure and Highly scalable solution.
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
This document provides an overview of large scale data analysis using distributed computing frameworks like MapReduce. It describes MapReduce and related frameworks like Dryad, and open source MapReduce tools including Hadoop, Cloud MapReduce, Elastic MapReduce, and MR.Flow. Example MapReduce algorithms for tasks like graph analysis, text indexing and retrieval are also outlined. The document is the first part of a series on large scale data analysis using distributed frameworks.
This document provides a summary of improvements made to Hive's performance through the use of Apache Tez and other optimizations. Some key points include:
- Hive was improved to use Apache Tez as its execution engine instead of MapReduce, reducing latency for interactive queries and improving throughput for batch queries.
- Statistics collection was optimized to gather column-level statistics from ORC file footers, speeding up statistics gathering.
- The cost-based optimizer Optiq was added to Hive, allowing it to choose better execution plans.
- Vectorized query processing, broadcast joins, dynamic partitioning, and other optimizations improved individual query performance by over 100x in some cases.
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2skCodH
This CloudxLab Understanding MapReduce tutorial helps you to understand MapReduce in detail. Below are the topics covered in this tutorial:
1) Thinking in Map / Reduce
2) Understanding Unix Pipeline
3) Examples to understand MapReduce
4) Merging
5) Mappers & Reducers
6) Mapper Example
7) Input Split
8) mapper() & reducer() Code
9) Example - Count number of words in a file using MapReduce
10) Example - Compute Max Temperature using MapReduce
11) Hands-on - Count number of words in a file using MapReduce on CloudxLab
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
Flink Forward San Francisco 2022.
In modern data platform architectures, stream processing engines such as Apache Flink are used to ingest continuous streams of data into data lakes such as Apache Iceberg. Streaming ingestion to iceberg tables can suffer by two problems (1) small files problem that can hurt read performance (2) poor data clustering that can make file pruning less effective. To address those two problems, we propose adding a shuffling stage to the Flink Iceberg streaming writer. The shuffling stage can intelligently group data via bin packing or range partition. This can reduce the number of concurrent files that every task writes. It can also improve data clustering. In this talk, we will explain the motivations in details and dive into the design of the shuffling stage. We will also share the evaluation results that demonstrate the effectiveness of smart shuffling.
by
Gang Ye & Steven Wu
This document discusses the development of Apache Pig on Tez, an execution engine for Pig jobs. Pig on Tez allows Pig workflows to be executed as directed acyclic graphs (DAGs) using Tez, improving performance over the default MapReduce execution. Key benefits of Tez include eliminating intermediate data writes, reducing job launch overhead, and allowing more flexible data flows. However, challenges remain around automatically determining optimal parallelism and integrating Tez with user interface and monitoring tools. Future work is needed to address these issues.
This document discusses scheduling in distributed systems. It covers:
1) Common scheduling techniques like min-min, max-min, and sufferage for scheduling independent tasks on dedicated systems.
2) Scheduling dependent tasks modeled as directed acyclic graphs (DAGs) using techniques like critical path on a processor (CPOP) and heterogeneous earliest finish time (HEFT).
3) The need for scheduling algorithms to adapt to dynamic grid environments where tasks may have dependencies on shared files and network transfer times vary.
The document summarizes new features and improvements in Pig 0.11, including the CUBE and rank operators, support for Groovy UDFs, a new DateTime data type, schema tuple optimization, compatibility with JDK 7 and Windows, faster local mode, and better statistics reporting.
This document provides recommendations for improving performance in a big data environment. It suggests:
1. Increasing the replication factor from 3 to improve data availability.
2. Adjusting YARN scheduler settings like minimum and maximum allocation to improve memory usage.
3. Allocating memory and cores to the application master to improve job performance.
4. Setting the JVM reuse property to reduce JVM overhead for tasks.
5. Increasing the minimum splits size for map output to reduce overhead of multiple files.
6. Increasing the block size from 128MB to 256MB to improve job performance on large data.
Pig is a data flow language that sits on top of Hadoop and allows users to quickly process large volumes of data across many servers simultaneously. It supports relational features like joins, groups, and aggregates, making it well-suited for extract, transform, load (ETL) tasks. Common ETL use cases for Pig include time-sensitive data loads from various sources into databases, and processing multiple data sources to gain insights into customer behavior. While Pig can handle ETL tasks, it is also capable of sampling large datasets for analysis and providing analytical insights beyond basic ETL functions.
apache pig performance optimizations talk at apachecon 2010Thejas Nair
Pig provides a high-level language called Pig Latin for analyzing large datasets. It optimizes Pig Latin scripts by restructuring the logical query plan through techniques like predicate pushdown and operator rewriting, and by generating efficient physical execution plans that leverage features like combiners, different join algorithms, and memory management. Future work aims to improve memory usage and allow joins and groups within a single MapReduce job when keys are the same.
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
We share our slides about Apache Tez delivered as a lightening talk given at Warsaw Hadoop User Group http://www.meetup.com/warsaw-hug/events/218579675
This document describes the MapReduce programming model for processing large datasets in a distributed manner. MapReduce allows users to write map and reduce functions that are automatically parallelized and run across large clusters. The input data is split and the map tasks run in parallel, producing intermediate key-value pairs. These are shuffled and input to the reduce tasks, which produce the final output. The system handles failures, scheduling and parallelization transparently, making it easy for programmers to write distributed applications.
This document summarizes a presentation on using indexes in Hive to accelerate query performance. It describes how indexes provide an alternative view of data to enable faster lookups compared to full data scans. Example queries demonstrating group by and aggregation are rewritten to use an index on the shipdate column. Performance tests on TPC-H data show the indexed queries outperforming the non-indexed versions by an order of magnitude. Future work is needed to expand rewrite rules and integrate indexing fully into Hive's optimizer.
This was the first session about Hadoop and MapReduce. It introduces what Hadoop is and its main components. It also covers the how to program your first MapReduce task and how to run it on pseudo distributed Hadoop installation.
This session was given in Arabic and i may provide a video for the session soon.
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
The document discusses MapReduce, a framework for processing large datasets in parallel. It provides an overview of MapReduce's basic principles, surveys research to improve the conventional MapReduce framework, and describes research projects ongoing at KAIST. The key points are that MapReduce provides automatic parallelization, fault tolerance, and distributed processing of large datasets across commodity computer clusters. It also introduces the map and reduce functions that define MapReduce jobs.
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
The document describes a resource-aware scheduler for Hadoop that aims to improve task scheduling by considering both job resource demands and node resource availability. It captures job and node profiles, estimates task execution times, and applies scheduling policies like shortest job first. Evaluation on word count and Pi estimation workloads showed the estimated task times closely matched the actual times, demonstrating the accuracy of the scheduler's resource modeling and estimations.
Speaking of big data analysis, what comes to mind is possibly using HDFS and MapReduce within Hadoop. But to write a MapReduce program, one must face the problem of learning how to write native java. One might wonder is it possible to use R, the most popular language adapted by data scientist, to implement MapReduce program? And through the integration or R and Hadoop, is it truly one can unleash the power of parallel computing and big data analysis?
This slide introduces how to install RHadoop step by step, and introduces how to write a MapReduce program through R. What is more, this slide will discuss whether RHadoop is really a light for big data analysis, or just another method to write MapReduce Program.
Please mail me if you found any problem toward the slide. EMAIL: tr.ywchiu@gmail.com
談到巨量資料,通常大家腦海中聯想到的就是使用Hadoop 的 MapReduce 和HDFS,但是撰寫MapReduce,則就必須要學會撰寫Java 或透過Thrift 接口才能撰寫。但R是否有辦法運行在Hadoop 上呢 ? 而使用R + Hadoop,是否就真的能結合R強大的分析功能,分析巨量資料呢 ?
本次講題將介紹如何Step by step 在Hadoop 上安裝RHadoop相關套件,並介紹如何撰寫R的MapReduce 程式。更重要的是,此次將探討使用RHadoop 是否為巨量資料分析找到一盞明燈? 或者只是另一套實作方法而已?
Apache Tez is a framework for executing data processing jobs on Hadoop clusters. It allows expressing jobs as directed acyclic graphs (DAGs) which enables optimizations like running jobs as a single logical unit rather than separate MapReduce jobs. The presentation covered Tez features like container reuse, dynamic parallelism, and integration with YARN and ATS for monitoring. It also discussed ongoing work to improve performance through speculation, intermediate file formats, and shuffle optimizations, as well as better debuggability using tools like the Tez UI.
This document provides a high-level overview of MapReduce and Hadoop. It begins with an introduction to MapReduce, describing it as a distributed computing framework that decomposes work into parallelized map and reduce tasks. Key concepts like mappers, reducers, and job tracking are defined. The structure of a MapReduce job is then outlined, showing how input is divided and processed by mappers, then shuffled and sorted before being combined by reducers. Example map and reduce functions for a word counting problem are presented to demonstrate how a full MapReduce job works.
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
Optimal Execution Of MapReduce Jobs In Cloud
Anshul Aggarwal, Software Engineer, Cisco Systems
Session Length: 1 Hour
Tue March 10 21:30 PST
Wed March 11 0:30 EST
Wed March 11 4:30:00 UTC
Wed March 11 10:00 IST
Wed March 11 15:30 Sydney
Voices 2015 www.globaltechwomen.com
We use MapReduce programming paradigm because it lends itself well to most data-intensive analytics jobs run on cloud these days, given its ability to scale-out and leverage several machines to parallel process data. Research has demonstrates that existing approaches to provisioning other applications in the cloud are not immediately relevant to MapReduce -based applications. Provisioning a MapReduce job entails requesting optimum number of resource sets (RS) and configuring MapReduce parameters such that each resource set is maximally utilized.
Each application has a different bottleneck resource (CPU :Disk :Network), and different bottleneck resource utilization, and thus needs to pick a different combination of these parameters based on the job profile such that the bottleneck resource is maximally utilized.
The problem at hand is thus defining a resource provisioning framework for MapReduce jobs running in a cloud keeping in mind performance goals such as Optimal resource utilization with Minimum incurred cost, Lower execution time, Energy Awareness, Automatic handling of node failure and Highly scalable solution.
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
This document provides an overview of large scale data analysis using distributed computing frameworks like MapReduce. It describes MapReduce and related frameworks like Dryad, and open source MapReduce tools including Hadoop, Cloud MapReduce, Elastic MapReduce, and MR.Flow. Example MapReduce algorithms for tasks like graph analysis, text indexing and retrieval are also outlined. The document is the first part of a series on large scale data analysis using distributed frameworks.
This document provides a summary of improvements made to Hive's performance through the use of Apache Tez and other optimizations. Some key points include:
- Hive was improved to use Apache Tez as its execution engine instead of MapReduce, reducing latency for interactive queries and improving throughput for batch queries.
- Statistics collection was optimized to gather column-level statistics from ORC file footers, speeding up statistics gathering.
- The cost-based optimizer Optiq was added to Hive, allowing it to choose better execution plans.
- Vectorized query processing, broadcast joins, dynamic partitioning, and other optimizations improved individual query performance by over 100x in some cases.
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2skCodH
This CloudxLab Understanding MapReduce tutorial helps you to understand MapReduce in detail. Below are the topics covered in this tutorial:
1) Thinking in Map / Reduce
2) Understanding Unix Pipeline
3) Examples to understand MapReduce
4) Merging
5) Mappers & Reducers
6) Mapper Example
7) Input Split
8) mapper() & reducer() Code
9) Example - Count number of words in a file using MapReduce
10) Example - Compute Max Temperature using MapReduce
11) Hands-on - Count number of words in a file using MapReduce on CloudxLab
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
Flink Forward San Francisco 2022.
In modern data platform architectures, stream processing engines such as Apache Flink are used to ingest continuous streams of data into data lakes such as Apache Iceberg. Streaming ingestion to iceberg tables can suffer by two problems (1) small files problem that can hurt read performance (2) poor data clustering that can make file pruning less effective. To address those two problems, we propose adding a shuffling stage to the Flink Iceberg streaming writer. The shuffling stage can intelligently group data via bin packing or range partition. This can reduce the number of concurrent files that every task writes. It can also improve data clustering. In this talk, we will explain the motivations in details and dive into the design of the shuffling stage. We will also share the evaluation results that demonstrate the effectiveness of smart shuffling.
by
Gang Ye & Steven Wu
This document discusses the development of Apache Pig on Tez, an execution engine for Pig jobs. Pig on Tez allows Pig workflows to be executed as directed acyclic graphs (DAGs) using Tez, improving performance over the default MapReduce execution. Key benefits of Tez include eliminating intermediate data writes, reducing job launch overhead, and allowing more flexible data flows. However, challenges remain around automatically determining optimal parallelism and integrating Tez with user interface and monitoring tools. Future work is needed to address these issues.
This document discusses scheduling in distributed systems. It covers:
1) Common scheduling techniques like min-min, max-min, and sufferage for scheduling independent tasks on dedicated systems.
2) Scheduling dependent tasks modeled as directed acyclic graphs (DAGs) using techniques like critical path on a processor (CPOP) and heterogeneous earliest finish time (HEFT).
3) The need for scheduling algorithms to adapt to dynamic grid environments where tasks may have dependencies on shared files and network transfer times vary.
This document provides an overview and introduction to the concepts taught in a data structures and algorithms course. It discusses the goals of reinforcing that every data structure has costs and benefits, learning commonly used data structures, and understanding how to analyze the efficiency of algorithms. Key topics covered include abstract data types, common data structures, algorithm analysis techniques like best/worst/average cases and asymptotic notation, and examples of analyzing the time complexity of various algorithms. The document emphasizes that problems can have multiple potential algorithms and that problems should be carefully defined in terms of inputs, outputs, and resource constraints.
Earliest Due Date Algorithm for Task scheduling for cloud computingPrakash Poudel
The document discusses the Earliest Due Date algorithm for task scheduling in cloud computing. It begins with an introduction to cloud computing and describes how it provides computing resources over the internet. It then provides an overview of task scheduling algorithms, including their purpose of assigning tasks to resources to optimize performance. The document focuses on the Earliest Due Date algorithm, which executes tasks based on the earliest deadline first. It provides an example of how EDD would schedule a set of tasks and discusses how EDD is optimal for real-time systems as it attempts to minimize maximum lateness. The document concludes by noting some advantages of EDD while also acknowledging it does not consider processing time.
Optimizing Performance - Clojure Remote - Nikola PericNik Peric
When a project approaches production questions about performance always surface. This talk tackles several real-world problems that have occurred while bringing a data-driven project to production, and walks through the problem solving approach to each.
Hadoop was originally designed for running large batch jobs, but users wanted to share clusters for better utilization and lower costs. Sharing requires a scheduler that provides guaranteed capacity for production jobs while also giving interactive jobs good response times. The Fair Scheduler was developed to address this by assigning jobs to pools that each get a minimum share of resources, with excess allocated fairly between pools. However, strictly following queues can hurt data locality. Delay Scheduling improves locality by relaxing the queues for a short time to allow more data-local scheduling opportunities.
This document provides an introduction to big data and MapReduce frameworks. It discusses:
- What big data is and examples of large datasets.
- An overview of MapReduce, including how it allows programmers to break problems into parallelizable map and reduce tasks.
- Details of how MapReduce frameworks like Apache Hadoop work, including distributed processing, fault tolerance, and the roles of mappers, reducers, and other components.
Task allocation and scheduling inmultiprocessorsDon William
This document discusses task allocation and scheduling in a multi-processor environment. It describes generating synthetic tasks and assigning them priorities using static and dynamic scheduling algorithms like Rate Monotonic and Earliest Deadline First. It then covers allocating tasks to processors using algorithms like Next Fit and Bin Packing to optimize processor utilization. The goal is to schedule tasks dynamically in a multi-processor system to improve performance over uniprocessor scheduling.
This document summarizes a proposal to improve fault tolerance in Hadoop clusters. It proposes adding a "Backup" state to store intermediate MapReduce data, so reducers can continue working even if mappers fail. It also proposes a "supernode" protocol where neighboring slave nodes communicate task information. If one node fails, a neighbor can take over its tasks without involving the JobTracker. This would improve fault tolerance by allowing computation to continue locally between nodes after failures.
The document describes Hadoop MapReduce and its key concepts. It discusses how MapReduce allows for parallel processing of large datasets across clusters of computers using a simple programming model. It provides details on the MapReduce architecture, including the JobTracker master and TaskTracker slaves. It also gives examples of common MapReduce algorithms and patterns like counting, sorting, joins and iterative processing.
MapReduce is a programming model for processing large datasets in parallel across clusters of machines. It involves splitting the input data into independent chunks which are processed by the "map" step, and then grouping the outputs of the maps together and inputting them to the "reduce" step to produce the final results. The MapReduce paper presented Google's implementation which ran on a large cluster of commodity machines and used the Google File System for fault tolerance. It demonstrated that MapReduce can efficiently process very large amounts of data for applications like search, sorting and counting word frequencies.
Hanborq Optimizations on Hadoop MapReduceHanborq Inc.
A Hanborq optimized Hadoop Distribution, especially with high performance of MapReduce. It's the core part of HDH (Hanborq Distribution with Hadoop for Big Data Engineering).
This is a deck of slides from a recent meetup of AWS Usergroup Greece, presented by Ioannis Konstantinou from the National Technical University of Athens.
The presentation gives an overview of the Map Reduce framework and a description of its open source implementation (Hadoop). Amazon's own Elastic Map Reduce (EMR) service is also mentioned. With the growing interest on Big Data this is a good introduction to the subject.
This document discusses job scheduling challenges in Hadoop and improvements made over time. It covers:
1) Early Hadoop clusters faced issues like some jobs taking all resources and bad jobs slowing the entire cluster. Fair scheduling and speculation were introduced to improve fairness and fault tolerance.
2) Hadoop 1.x had a single JobTracker which did not scale. It used pull-based and slot-based scheduling which hurt efficiency and scalability.
3) Later versions used independent JobTrackers per job, a central ResourceManager, and push-based scheduling via Corona to improve scalability. However, long-running reducers remained challenging to schedule efficiently.
4) Future improvements may be needed to
Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang
Hanborq has developed optimizations to improve the performance of Hadoop MapReduce in three key areas:
1. The runtime environment uses a worker pool and improved scheduling to reduce job completion times from tens of seconds to near real-time.
2. The processing engine utilizes techniques like sendfile for zero-copy data transfer and Netty batch fetching to reduce network overhead and CPU usage during shuffling.
3. Sort avoidance algorithms are implemented to minimize expensive sorting operations through techniques such as early reduce and hash aggregation.
In-Memory Computing: How, Why? and common PatternsSrinath Perera
Traditionally, big data is mostly read from disks and processed. However, most big data systems are latency bound, which means often the CPU sits idle waiting for data to arrive. This problem is more prevalent with use cases like graph searches that need to randomly access different parts of datasets. In-memory computing proposes an alternative model where data is loaded or stored in-memory and processed instead of processing them from the disk. Although such designs cost more in terms of memory, sometimes resulting systems can have faster order of magnitudes (e.g. 1000X), which could lead to savings in the long run. With rapidly falling memory prices, this difference is reducing by the day. Furthermore, in-memory computing can enable use cases like ad hoc analysis over a large set of data that was not possible earlier. This talk will provide an overview of in-memory technology and discuss how WSO2 technologies like complex event processing that can be used to build in-memory solutions. It will also provide an overview of upcoming improvements in the WSO2 platform.
multiprocessor real_ time scheduling.pptnaghamallella
This document discusses different scheduling models for multiprocessor real-time systems, including global scheduling, partitioned scheduling, and semi-partitioned scheduling. Global scheduling uses a shared ready queue and allows tasks to migrate between processors, but can cause overhead from migration and scheduling anomalies. Partitioned scheduling assigns each task to a dedicated processor to avoid migration, but may underutilize processors. Semi-partitioned scheduling first partitions tasks then allows some to migrate to improve utilization.
Similar to GoodFit: Multi-Resource Packing of Tasks with Dependencies (20)
This document discusses running Apache Spark and Apache Zeppelin in production. It begins by introducing the author and their background. It then covers security best practices for Spark deployments, including authentication using Kerberos, authorization using Ranger/Sentry, encryption, and audit logging. Different Spark deployment modes like Spark on YARN are explained. The document also discusses optimizing Spark performance by tuning executor size and multi-tenancy. Finally, it covers security features for Apache Zeppelin like authentication, authorization, and credential management.
This document discusses Spark security and provides an overview of authentication, authorization, encryption, and auditing in Spark. It describes how Spark leverages Kerberos for authentication and uses services like Ranger and Sentry for authorization. It also outlines how communication channels in Spark are encrypted and some common issues to watch out for related to Spark security.
The document discusses the Virtual Data Connector project which aims to leverage Apache Atlas and Apache Ranger to provide unified metadata and access governance across data sources. Key points include:
- The project aims to address challenges of understanding, governing, and controlling access to distributed data through a centralized metadata catalog and policies.
- Apache Atlas provides a scalable metadata repository while Apache Ranger enables centralized access governance. The project will integrate these using a virtualization layer.
- Enhancements to Atlas and Ranger are proposed to better support the project's goals around a unified open metadata platform and metadata-driven governance.
- An initial minimum viable product will be built this year with the goal of an open, collaborative ecosystem around shared
This document discusses using a data science platform to enable digital diagnostics in healthcare. It provides an overview of healthcare data sources and Yale/YNHH's data science platform. It then describes the data science journey process using a clinical laboratory use case as an example. The goal is to use big data and machine learning to improve diagnostic reproducibility, throughput, turnaround time, and accuracy for laboratory testing by developing a machine learning algorithm and real-time data processing pipeline.
This document discusses using Apache Spark and MLlib for text mining on big data. It outlines common text mining applications, describes how Spark and MLlib enable scalable machine learning on large datasets, and provides examples of text mining workflows and pipelines that can be built with Spark MLlib algorithms and components like tokenization, feature extraction, and modeling. It also discusses customizing ML pipelines and the Zeppelin notebook platform for collaborative data science work.
This document compares the performance of Hive and Spark when running the BigBench benchmark. It outlines the structure and use cases of the BigBench benchmark, which aims to cover common Big Data analytical properties. It then describes sequential performance tests of Hive+Tez and Spark on queries from the benchmark using a HDInsight PaaS cluster, finding variations in performance between the systems. Concurrency tests are also run by executing multiple query streams in parallel to analyze throughput.
The document discusses modern data applications and architectures. It introduces Apache Hadoop, an open-source software framework for distributed storage and processing of large datasets across clusters of commodity hardware. Hadoop provides massive scalability and easy data access for applications. The document outlines the key components of Hadoop, including its distributed storage, processing framework, and ecosystem of tools for data access, management, analytics and more. It argues that Hadoop enables organizations to innovate with all types and sources of data at lower costs.
This document provides an overview of data science and machine learning. It discusses what data science and machine learning are, including extracting insights from data and computers learning without being explicitly programmed. It also covers Apache Spark, which is an open source framework for large-scale data processing. Finally, it discusses common machine learning algorithms like regression, classification, clustering, and dimensionality reduction.
This document provides an overview of Apache Spark, including its capabilities and components. Spark is an open-source cluster computing framework that allows distributed processing of large datasets across clusters of machines. It supports various data processing workloads including streaming, SQL, machine learning and graph analytics. The document discusses Spark's APIs like DataFrames and its libraries like Spark SQL, Spark Streaming, MLlib and GraphX. It also provides examples of using Spark for tasks like linear regression modeling.
This document provides an overview of Apache NiFi and dataflow. It begins with an introduction to the challenges of moving data effectively within and between systems. It then discusses Apache NiFi's key features for addressing these challenges, including guaranteed delivery, data buffering, prioritized queuing, and data provenance. The document outlines NiFi's architecture and components like repositories and extension points. It also previews a live demo and invites attendees to further discuss Apache NiFi at a Birds of a Feather session.
Many Organizations are currently processing various types of data and in different formats. Most often this data will be in free form, As the consumers of this data growing it’s imperative that this free-flowing data needs to adhere to a schema. It will help data consumers to have an expectation of about the type of data they are getting and also they will be able to avoid immediate impact if the upstream source changes its format. Having a uniform schema representation also gives the Data Pipeline a really easy way to integrate and support various systems that use different data formats.
SchemaRegistry is a central repository for storing, evolving schemas. It provides an API & tooling to help developers and users to register a schema and consume that schema without having any impact if the schema changed. Users can tag different schemas and versions, register for notifications of schema changes with versions etc.
In this talk, we will go through the need for a schema registry and schema evolution and showcase the integration with Apache NiFi, Apache Kafka, Apache Storm.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
DeepLearning is not just a hype - it outperforms state-of-the-art ML algorithms. One by one. In this talk we will show how DeepLearning can be used for detecting anomalies on IoT sensor data streams at high speed using DeepLearning4J on top of different BigData engines like ApacheSpark and ApacheFlink. Key in this talk is the absence of any large training corpus since we are using unsupervised machine learning - a domain current DL research threats step-motherly. As we can see in this demo LSTM networks can learn very complex system behavior - in this case data coming from a physical model simulating bearing vibration data. Once draw back of DeepLearning is that normally a very large labaled training data set is required. This is particularly interesting since we can show how unsupervised machine learning can be used in conjunction with DeepLearning - no labeled data set is necessary. We are able to detect anomalies and predict braking bearings with 10 fold confidence. All examples and all code will be made publicly available and open sources. Only open source components are used.
QE automation for large systems is a great step forward in increasing system reliability. In the big-data world, multiple components have to come together to provide end-users with business outcomes. This means, that QE Automations scenarios need to be detailed around actual use cases, cross-cutting components. The system tests potentially generate large amounts of data on a recurring basis, verifying which is a tedious job. Given the multiple levels of indirection, the false positives of actual defects are higher, and are generally wasteful.
At Hortonworks, we’ve designed and implemented Automated Log Analysis System - Mool, using Statistical Data Science and ML. Currently the work in progress has a batch data pipeline with a following ensemble ML pipeline which feeds into the recommendation engine. The system identifies the root cause of test failures, by correlating the failing test cases, with current and historical error records, to identify root cause of errors across multiple components. The system works in unsupervised mode with no perfect model/stable builds/source-code version to refer to. In addition the system provides limited recommendations to file/open past tickets and compares run-profiles with past runs.
Improving business performance is never easy! The Natixis Pack is like Rugby. Working together is key to scrum success. Our data journey would undoubtedly have been so much more difficult if we had not made the move together.
This session is the story of how ‘The Natixis Pack’ has driven change in its current IT architecture so that legacy systems can leverage some of the many components in Hortonworks Data Platform in order to improve the performance of business applications. During this session, you will hear:
• How and why the business and IT requirements originated
• How we leverage the platform to fulfill security and production requirements
• How we organize a community to:
o Guard all the players, no one gets left on the ground!
o Us the platform appropriately (Not every problem is eligible for Big Data and standard databases are not dead)
• What are the most usable, the most interesting and the most promising technologies in the Apache Hadoop community
We will finish the story of a successful rugby team with insight into the special skills needed from each player to win the match!
DETAILS
This session is part business, part technical. We will talk about infrastructure, security and project management as well as the industrial usage of Hive, HBase, Kafka, and Spark within an industrial Corporate and Investment Bank environment, framed by regulatory constraints.
HBase is a distributed, column-oriented database that stores data in tables divided into rows and columns. It is optimized for random, real-time read/write access to big data. The document discusses HBase's key concepts like tables, regions, and column families. It also covers performance tuning aspects like cluster configuration, compaction strategies, and intelligent key design to spread load evenly. Different use cases are suitable for HBase depending on access patterns, such as time series data, messages, or serving random lookups and short scans from large datasets. Proper data modeling and tuning are necessary to maximize HBase's performance.
There has been an explosion of data digitising our physical world – from cameras, environmental sensors and embedded devices, right down to the phones in our pockets. Which means that, now, companies have new ways to transform their businesses – both operationally, and through their products and services – by leveraging this data and applying fresh analytical techniques to make sense of it. But are they ready? The answer is “no” in most cases.
In this session, we’ll be discussing the challenges facing companies trying to embrace the Analytics of Things, and how Teradata has helped customers work through and turn those challenges to their advantage.
In this talk, we will present a new distribution of Hadoop, Hops, that can scale the Hadoop Filesystem (HDFS) by 16X, from 70K ops/s to 1.2 million ops/s on Spotiy's industrial Hadoop workload. Hops is an open-source distribution of Apache Hadoop that supports distributed metadata for HSFS (HopsFS) and the ResourceManager in Apache YARN. HopsFS is the first production-grade distributed hierarchical filesystem to store its metadata normalized in an in-memory, shared nothing database. For YARN, we will discuss optimizations that enable 2X throughput increases for the Capacity scheduler, enabling scalability to clusters with >20K nodes. We will discuss the journey of how we reached this milestone, discussing some of the challenges involved in efficiently and safely mapping hierarchical filesystem metadata state and operations onto a shared-nothing, in-memory database. We will also discuss the key database features needed for extreme scaling, such as multi-partition transactions, partition-pruned index scans, distribution-aware transactions, and the streaming changelog API. Hops (www.hops.io) is Apache-licensed open-source and supports a pluggable database backend for distributed metadata, although it currently only support MySQL Cluster as a backend. Hops opens up the potential for new directions for Hadoop when metadata is available for tinkering in a mature relational database.
In high-risk manufacturing industries, regulatory bodies stipulate continuous monitoring and documentation of critical product attributes and process parameters. On the other hand, sensor data coming from production processes can be used to gain deeper insights into optimization potentials. By establishing a central production data lake based on Hadoop and using Talend Data Fabric as a basis for a unified architecture, the German pharmaceutical company HERMES Arzneimittel was able to cater to compliance requirements as well as unlock new business opportunities, enabling use cases like predictive maintenance, predictive quality assurance or open world analytics. Learn how the Talend Data Fabric enabled HERMES Arzneimittel to become data-driven and transform Big Data projects from challenging, hard to maintain hand-coding jobs to repeatable, future-proof integration designs.
Talend Data Fabric combines Talend products into a common set of powerful, easy-to-use tools for any integration style: real-time or batch, big data or master data management, on-premises or in the cloud.
While you could be tempted assuming data is already safe in a single Hadoop cluster, in practice you have to plan for more. Questions like: "What happens if the entire datacenter fails?, or "How do I recover into a consistent state of data, so that applications can continue to run?" are not a all trivial to answer for Hadoop. Did you know that HDFS snapshots are handling open files not as immutable? Or that HBase snapshots are executed asynchronously across servers and therefore cannot guarantee atomicity for cross region updates (which includes tables)? There is no unified and coherent data backup strategy, nor is there tooling available for many of the included components to build such a strategy. The Hadoop distributions largely avoid this topic as most customers are still in the "single use-case" or PoC phase, where data governance as far as backup and disaster recovery (BDR) is concerned are not (yet) important. This talk first is introducing you to the overarching issue and difficulties of backup and data safety, looking at each of the many components in Hadoop, including HDFS, HBase, YARN, Oozie, the management components and so on, to finally show you a viable approach using built-in tools. You will also learn not to take this topic lightheartedly and what is needed to implement and guarantee a continuous operation of Hadoop cluster based solutions.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
2. Cluster Scheduling for Jobs
Jobs
Machines, file-system, network
Cluster Scheduler
matches tasks to resources
Goals
• High cluster utilization
• Fast job completion time
• Predictable perf./ fairness
E.g., BigData (Hive, SCOPE, Spark)
E.g., CloudBuild
Tasks
Dependencies
• Need not keep resource “buffers”
• More dynamic than VM placement (tasks last seconds)
• Aggregate properties are important (eg, all tasks in a job should finish)
3. Need careful multi-resource planning
Problem
Fragmentation
Current Schedulers Packer Scheduler
Over-allocation of net/disk
Current Schedulers Packer Scheduler
2 tasks/T 3 tasks/T (+50%) 2 tasks/ 2T 2 tasks/T (+100%)
4. … worse with dependencies
Problem 2
Tt,
𝟏
𝒏
r t, 1- r
t, r
t, 1- r t, 1- r
(T- 2)t,
𝟏
𝒏
r (T- 4)t,
𝟏
𝒏
r ~Tt,
𝟏
𝒏
r
…
…
DAG label= {duration, resource demand}
resource
time
~nT t
…
resource
time
~T t
…
…
Crit. Path Best
Critical path scheduling is n times off since it ignores resource demands
Packers can be d times off since they ignore future work [d resources]
5. Typical job scheduler infrastructure
+ packing
+ bounded unfairness
+ merge schedules
+ overbook
DAG
AM
DAG
AM
… Node
heartbeat
Task
assignment
Schedule
Constructor
Schedule
Constructor
RM
NM
NM
NM
NM
6. Main ideas in multi-resource packing
Task packing ~ Multi-dimensional bin packing, but
* Very hard problem (“APX-hard”)
* Available heuristics do not directly apply [task demands change with placement]
Alignment score (A) = D R
A packing heuristic
Task’s resources demand vector: D Machine resource vector: R<
Fit
A job completion time heuristic shortest remaining work, P tasks avg. duration
tasks avg. resource demand
*
*
=
remaining # tasks
Packing
Efficiency
?
delays job completion
loses packing efficiencyJob Completion
Time
Fairness
Trade-offs:
We show that:
{best “perf” |bounded unfairness} ~ best “perf”
loses both
7. Main ideas in packing dependent tasks
1. Identify troublesome tasks (meat) and place
them first
2. Systematically place other tasks without
deadlocks
3. At runtime, use a precedence order from the
computed schedule + heuristics to (a)
overbook, (b) previous slide.
4. Better lower bounds for DAG completion time
M
P
C
O
time
resource
meat
begin
meat
end
parents
meat
children
12. Map
(disk)
Reduce
(netw.)
Fair share among two identical jobs
50%
50%
50%
50%
2T 4T
Instantaneous fairness
100
%
100
%
100
%
100
%
2T 3TT
1) Temporal relaxation of fairness
a job will finish within (1 + 𝑓)x the time it takes given strict share
2) Optimal trade-off with performance
(1 + 𝑓)x fairness costs (2 + 2𝑓 − 2 𝑓 + 𝑓2)x on make-span
3) A simple (offline) algorithm that achieves the above trade-off
Problem:
Instantaneous fairness can be up to dx worse on makespan (d resources)
Best
Fairness slack 𝒇 Perf loss
0 (perfectly fair) 2x
1 (<2x longer) 1.1x
2 (<3x longer) 1.07x
13. Bare metal
VM Allocation
Data-parallel Jobs
Job: Tasks
Dependencies
E.g., HDInsight, AzureBatch
E.g., BigData (Yarn, Cosmos, Spark)
E.g., CloudBuild
3500 servers
3500 users
>20M targets/day
~100K servers (40K at Yahoo)
>50K servers
>2EB stored
>6K devs
14. • Tasks are short-lived (10s of seconds)
• Have peculiar shaped demands
• Composites are important (job needs all tasks to finish)
• OK to kill and restart tasks
• Locality
1) Job scheduling has specific aspects
2) will speed-up the average job (and reduce resource cost)
3) research + practice
16. Cluster Scheduling for Jobs
Jobs
Machines, file-system, network
Cluster Scheduler
matches tasks to resources
Goals
• High cluster utilization
• Fast job completion time
• Predictable perf./ fairness
• Efficient (milliseconds…)
E.g., HDInsight, AzureBatch
E.g., BigData (Hive, SCOPE, Spark)
E.g., CloudBuild
Tasks
Dependencies
17. Need careful multi-resource planning
Problem
Fragmentation
Current Schedulers Packer Scheduler
Over-allocation of net/disk
Current Schedulers Packer Scheduler
2 tasks/T 3 tasks/T (+50%) 2 tasks/ 2T 2 tasks/T (+100%)
18. … worse with dependencies
Problem 2
Tt,
𝟏
𝒏
r t, 1- r
t, r
t, 1- r t, 1- r
(T- 2)t,
𝟏
𝒏
r (T- 4)t,
𝟏
𝒏
r ~Tt,
𝟏
𝒏
r
…
…
DAG label= {duration, resource demand}
resource
time
~nT t
…
resource
time
~T t
…
…
Crit. Path Best
Critical path scheduling is n times off since it ignores resource demands
Packers can be d times off since they ignore future work [d resources]
19. Typical job scheduler infrastructure
+ packing
+ bounded unfairness
+ merge schedules
+ overbook
DAG
AM
DAG
AM
… Node
heartbeat
Task
assignment
Schedule
Constructor
Schedule
Constructor
RM
NM
NM
NM
NM
20. Main ideas in packing dependent tasks
1. Identify troublesome tasks (T) and place them
first
2. Systematically place other tasks without dead-
ends
3. At runtime, enforce computed schedule +
heuristics to (a) overbook, (b) previous slide.
4. Better lower bounds for DAG completion time
T
P
C
O
time
resource
Trouble
begin
Trouble
end
parents
trouble
children
24. Performance of cluster schedulers
We observe that:
1Time to finish a set of jobs
Resources are fragmented i.e. machines are running below capacity
Even at 100% usage, goodput is much smaller due to over-allocation
Even pareto-efficient multi-resource fair schemes result in much lower performance
Tetris
up to 40% improvement in makespan1 and job
completion time with near-perfect fairness
25. Findings from Bing and Facebook traces analysis
Tasks need varying amounts of each resource
Demands for resources are weakly correlated
Diversity in multi-resource requirements:
Multiple resources become tight
This matters because no single bottleneck resource:
Enough cross-rack network bandwidth to use all CPU cores
25
Upper bounding potential gains
reduce makespan1 by up to 49%
reduce avg. job compl. time by up to 46%
26. 26
Why so bad #1
Production schedulers neither pack
tasks nor consider all their relevant
resource demands
#1 Resource Fragmentation
#2 Over-allocation
27. Current Schedulers “Packer” Scheduler
Machine A
4 GB Memory
Machine B
4 GB Memory
T1: 2 GB
T3: 4 GB
T2: 2 GB
Time
Resource Fragmentation (RF)
STOP
Machine A
4 GB Memory
Machine B
4 GB Memory
T1: 2 GB
T3: 4 GB
T2: 2 GB
Time
Avg. task compl. time = 1 t
27
Current Schedulers
RF increase with the
number of resources
being allocated !
Avg. task compl.time = 1.33 t
Resources allocated
in terms of Slots
Free resources unable
to be assigned to tasks
28. Current Schedulers “Packer” Scheduler
Machine A
4 GB Memory; 20 MB/s Nw.
Time
T1: 2 GB
Memory
20 MB/s
Nw.
T2: 2 GB
Memory
20 MB/s
Nw.
T3: 2 GB
Memory
Machine A
4 GB Memory; 20 MB/s Nw.Time
T1: 2 GB
Memory
20 MB/s
Nw.
T2: 2 GB
Memory
20 MB/s
Nw.
T3: 2 GB
Memory
STOP
20 MB/s
Nw.
20 MB/s
Nw.
28
Over-Allocation
Not all tasks resource
demands are
explicitly allocated
Disk and network
are over-allocated
Avg. task compl.time= 2.33 t Avg. task compl. time = 1.33 t
Current Schedulers
29. Work Conserving != no fragmentation, over-allocation
Treat cluster as a big bag of resources
Hides the impact of resource fragmentation
Assume job has a fixed resource profile
Different tasks in the same job have different demands
Multi-resource Fairness Schemes do not help either
Why so bad #2
The schedule impacts job’s current resource profiles
Can schedule to create complementarity profiles
Packer Scheduler vs. DRF
Avg. Job Compl.Time: 50%
Makespan: 33%
Pareto1 efficient != performant
1no job can increase share without decreasing the share of another
29
30. Competing objectives
Job completion time
Fairness
vs.
Cluster efficiency
vs.
Current Schedulers
1. Resource Fragmentation
3. Fair allocations sacrifice performance
2. Over-Allocation
30
31. # 1
Pack tasks along multiple resources to improve
cluster efficiency and reduce makespan
31
32. Theory Practice
Multi-Resource Packing of Tasks
similar to
Multi-Dimensional Bin Packing
Balls could be tasks
Bin could be machine, time
1APX-Hard is a strict subset of NP-hard
APX-Hard1
Existing heuristics do not directly apply here:
Assume balls of a fixed size
Assume balls are known apriori
32
vary with time / machine placed
elastic
cope with online arrival of jobs,
dependencies, cluster activity
Avoiding fragmentation looks like:
Tight bin packing
Reduces # of bins used -> reduce makespan
33. # 1
Packing heuristic
1. Check for fit ensure no over-allocation Over-Allocation
Alignment score (A)
33
A packing heuristic
Tasks resources demand vector Machine resource vector<
Fit
“A” works because:
2. Bigger balls get bigger scores
3. Abundant resources used first
Resource Fragmentation
4. Can spread load across machines
35. 35
CHALLENGE
# 2
Shortest Remaining Time First1 (SRTF)
1SRTF – M. Harchol-Balter et al. Connection Scheduling in Web Servers [USITS’99]
schedules jobs in ascending order of their remaining time
Job Completion
Time Heuristic
Q: What is the shortest “remaining time” ?
“remaining work”
remaining # tasks
tasks durations
tasks resource demands
&
&
=
A job completion time heuristic
Gives a score P to every job
Extended SRTF to incorporate multiple resources
36. 36
CHALLENGE
# 2
Job Completion
Time Heuristic
Combine A and P scores !
Packing
Efficiency
Completion
Time
?
1: among J runnable jobs
2: score (j) = A(t, R)+ P(j)
3: max task t in j, demand(t) ≤ R (resources free)
4: pick j*, t* = argmax score(j)
A: delays job completion time
P: loss in packing efficiency
38. # 3
38
A says: “task i should go here to improve packing efficiency”
Feasible solution which typically can satisfy all of them
P says: “schedule job j next to improve job completion time”
Fairness says: “this set of jobs should be scheduled next”
Fairness
Heuristic
Performance and fairness do not mix well in general
But ….
We can get “perfect fairness” and much better performance
39. # 3
39
Fairness Knob, F [0, 1)
F = 0 most efficient scheduling
F → 1 close to perfect fairness
Pick the best-for-perf. task from among
1-F fraction of jobs furthest from fair share
Fairness
Heuristic
Fairness is not a tight constraint
Long term fairness not short term fairness
Lose a bit of fairness for a lot of gains in performance
Heuristic
40. 40
Putting it all together
We saw:
Other things in the paper:
Packing efficiency
Prefer small remaining work
Fairness knob
Estimate task demands
Deal with inaccuracies, barriers
Ingestion / evacuation
Job Manager1
Node Manager1
Cluster-wide Resource Manager
Multi-resource asks;
barrier hint
Track resource usage;
enforce allocations
New logic to match tasks to machines
(+packing, +SRTF, +fairness)
Allocations
Asks
Offers
Resource
availability reports
Yarn architecture
Changes to add Tetris(shown in orange)
42. 42
Efficiency
Makespan
DRF 28 %
Avg. Job Compl. Time
35%
0
50
100
150
200
0 5000 10000 15000
Utilization(%)
Time (s)
CPU Mem In St
Tetris
Gains from
avoiding fragmentation
avoid over-allocation
0
50
100
150
200
0 4500 9000 13500 18000 22500
Utilization(%)
Time (s)
CPU Mem In St
Tetris vs.
Capacity
Scheduler
29 % 30 %
Over-allocation
Lower value => higher resource fragmentation
Utilization(%)
200
150
100
50
0
0 5000 10000 15000
Time (s)
Over-allocation
Lower value => higher resource fragmentation
Capacity Scheduler
43. 43
Fairness
Fairness Knob
quantifies the extent to which Tetris adheres to fair allocation
No Fairness
F = 0
Makespan
50 %
10 %
25 %
Job Compl.
Time
40 %
23 %
35 %
Avg. Slowdown
[over impacted jobs]
25 %
2 %
5 %
Full Fairness
F → 1
F = 0.25
44. Pack efficiently
along multiple
resources
Prefer jobs
with less
“remaining
work”
Incorporate
Fairness
combine heuristics that improve packing efficiency with those that
lower average job completion time
achieving desired amounts of fairness can coexist with improving
cluster performance
implemented inside YARN; trace-driven simulations and deployment
show encouraging initial results
We are working towards a Yarn check-in
http://research.microsoft.com/en-us/UM/redmond/projects/tetris/
44
46. Estimating resource requirements
Estimating Resource Demands
Under-utilization
from:
o finished tasks in the same phase
peak usage demands estimates
Machine1 - In Network
850
1024
0
512
MBytes/sec
Time (sec)
In Network Used
In Network Free
Resource Tracker
o report unused resources
o aware of other cluster activities: ingestion and evacuation
Resource Tracker
o collecting statistics from recurring jobs
Peak Demand
o inputs size/location of tasks
46
Placement
Impacts network/disk requirements
47. Packer Scheduler vs. DRF
DRF Scheduler Packer Schedulers
2 tasks
Job Schedule
Resources used
2 tasks 2 tasks
2 tasks 2 tasks 2 tasks
6 tasks 6 tasks 6 tasksA
B
C
18 cores
16 GB
18 cores
16 GB
18 cores
16 GB
t 2t 3t
0 tasks
Job Schedule
Resources used
0 tasks 6 tasks
0 tasks 6 tasks
18 tasksA
B
C
18 cores 18 cores
6 GB
18 cores
6 GB
t 2t 3t
36 GB
Durations:
A: 3t
B: 3t
C: 3t
Durations:
A: t
B: 2t
C: 3t
33%
improvement
Dominant Resource Fairness (DRF)
computes the dominant share (DS) of every user and
seeks to maximize the minimum DS across all users
Cluster [18 Cores, 36 GB Memory]
Job: [Task Prof.], # tasks
A [1 Core, 2 GB], 18
B [3 Cores, 1 GB], 6
C [3 Cores, 1 GB], 6
DS =
𝟏
𝟑
max (qA, qB, qC) (Maximize allocations)
qA + 3qB + 3qC ≤ 18 (CPU constraint)
2qA + 1qB + 1qC ≤ 36 (Memory constraint)
qA
18
=
qB
6
=
qC
6
(Equalize DS) 47
48. 1Time to finish a set of jobs
Machine 1,2: [2 Cores, 4 GB]
Job: [Task Prof.], # tasks
A [2 Cores, 3 GB], 6
B [1 Core, 2 GB], 2
Resources used
4
cores
6 GB
2
tasks
2
tasks
2
tasks
2
tasks
t 2t 3t 4t
Job Schedule
4
cores
6 GB
4
cores
6 GB
2
cores
4 GB
Resources used
2
cores
4 GB
2
tasks
2
tasks
2
tasks
2
tasks
t 2t 3t 4t
Job Schedule
4
cores
6 GB
4
cores
6 GB
4
cores
6 GB
Pack No Pack
Durations:
A: 3t
B: 4t
Durations:
A: 4t
B: t
29% improvement
48
Packing efficiency does not achieve everything
Achieving packing efficiency does not
necessarily improve job completion time
49. 49
Ingestion / evacuation
ingestion = storing incoming data for later analytics
evacuation = data evacuated and re-replicated before
maintenance operations
e.g. some clusters reports volumes of up to 10 TB per hour
Other cluster activities which produce background traffic
e.g. rack decommission for machines re-imaging
Resource Tracker reports, used by Tetris to avoid
contention between its tasks and these activities
54. 54
Virtual Machine Packing != Tetris
Virtual Machine Packing
But focus on different challenges and not task packing:
balance load across servers
ensure VM availability inspite of failures
allow for quick software and hardware updates
NO corresponding entity to a job and hence job completion time is inexpressible
Explicit resource requirements (e.g. small VM) makes VM packing simpler
Consolidating VMs, with multi-dimensional resource
requirements, on to the fewest number of servers
55. 55
Barrier knob, b [0, 1)
Tetris gives preference for last tasks in a stage
Offer resources to tasks in a stage preceding a
barrier, where b fraction of tasks have finished
b = 1 no tasks preferentially treated
56. 56
Starvation Prevention
It could take a long time to accommodate large tasks ?
But …
1. most tasks have demands within one order of magnitude of one another
2. machines report resource availability to the scheduler periodically
scheduler learn about all the resources freed up by tasks that finish in the
preceding period together => can to reservation for large tasks
59. Performance of cluster schedulers
We observe that:
1Time to finish a set of jobs
Typically cluster schedulers do dependency-aware scheduling
OR multi-resource packing
None of the existing solutions are close to optimal for more than 50% of the
production jobs
Graphene
> 30% improvements in makespan1 and job
completion time for more than 50% of the jobs
2
60. Findings from Bing traces analysis
Jobs structure have evolved into complex DAGs of tasks
depth 7
103 tasks
Median job DAG’s has:
A good cluster scheduler should be
aware of dependencies
1Time to finish a set of jobs
3
61. Findings from Bing traces analysis
High coefficient of variation (~1) for many resources
Demands for resources are weakly correlated
Applications have (very) diverse resource needs:
Multiple resources become tight
This matters because no single bottleneck resource:
Enough cross-rack network bandwidth to use all CPU cores
61
CPU, Memory, Network and Disk
A good cluster scheduler should
pack resources
63. Dependency-aware Packing
Breadth First Search (BFS)
63
Do not account for tasks resource demands
If so, they assume tasks have homogeneous
demands
OR Consider the DAG structure during the
schedule
Tetris
Ignore dependencies
Takes local greedy choices
Handle tasks with multiple resource
requirements
Any scheduler that is not packing,
is up to n x OPTIMAL (n – number tasks)
Any scheduler that ignores dependencies is
d x OPTIMAL (d – number resource dimensions)
Critical Path Scheduling
(CPSched)
64. Where does the “work” lie in a DAG?
“Work” – stages in a DAG where most amount of resources X time is spent
Large DAGs that are neither a bunch of unrelated stages
nor a chain of stages
> 40% of the DAGs have most of the “work” on the Critical Path CPSched performs well
> 30% of the DAGs have most of the “work” such that Packers performs well
For ~50% of the DAGs neither
packers nor critically-based
schedulers may perform well 7
65. Pack tasks along multiple resources
while consider tasks dependencies
65
State-of-the art techniques are suboptimal
Key ideas in Graphene
Conclusion
66. State-of-the art scheduling techniques are suboptimal
CPSched / Tetris
3 X Optimal
66
t0: t1:
t2:
t3:
1
{.7, .31}
.01
{.95, .01}
.01
{.1, .7}
.96
{. 2, .68}
.98
{. 1, .01}
.01
{. 01, .01}
t4:
t5:
duration
{rsrc.1, rsrc.2}
task:
CPSched t0 t4 t5
t
t1 t3t2
2t 3t
Time: ~3T
Tetris t0 t1 t2
t
t4 t3t5
2t 3t
Time: ~3T
Optimal t1 t0
t
t4 t3
t2
3t
Time: ~T
t5
Key insights:
t0, t2, t5 are troublesome tasks
schedule them as soon as possible
Total capacity in any dimens. = 1
68. T
P
C
O
…
time
resources
T
…
time
resources
P
O
C
T
Schedule Construction
Identify tasks that can lead to a poor schedule (troublesome tasks) - T
more likely to be on the critical path
more difficult to pack
Break the others tasks into P, C, O sets based on their relationship with tasks from T
Place tasks in T on a virtual time space; overlay the others to fill any resultant holes in
this space
Nearly optimal for over three quarters of our
analyzed production DAGs
11
70. DAG
Schedule Construction
Schedule Construction
Preference order
Preference order
- merging schedulesDAG
Runtime component
Node
heartbeat
Task
assignment
Resource Manager
Prefer jobs with less
remaining work
Enforces priority ordering
Local placement
Multi-resource packing
Judicious overbooking of
malleable resources
Deficit counters to bound
unfairness
Enables implementation
of different fairness
schemes
Job completion time
Online Scheduling
Makespan Being Fair
- bound unfairness
- packing + overbooking
13
71. Evaluation
Implemented in Yarn and Tez
250 machine cluster deployment
Replay Bing traces and
TPC-DS / TPC-H workloads
71
72. Makespan
Tetris
29 %
Avg. Job Compl. Time
27%
Graphene vs.
Critical Path
31 % 33 %BFS
23 % 24%
Gains from
view of the entire DAG
place the troublesome
tasks first
Efficiency
more compact schedule
better packing
overbooking
15
73. combine various mechanisms to improve packing efficiency and
consider tasks dependencies
constructs a good schedule by placing tasks on a virtual resource time space
implemented inside YARN and Tez; trace-driven simulations and
deployment show encouraging initial results
73
online heuristics that softly enforces the desired schedules
74. Makespan
Tetris
29 %
Avg. Job Compl. Time
27%
Graphene vs.
Critical Path
31 % 33 %BFS
23 % 24%
Gains from
view of the entire DAG
place the troublesome
tasks first
Graphene BFSRunning tasks
Efficiency
more compact schedule
better packing
overbooking
15