This document summarizes Yurie Yamane's presentation on using mruby to make robots. It discusses using mruby and TOPPERS on the LEGO EV3 to create an inverted pendulum self-balancing robot. It then covers creating a DIY self-balancing robot using a Raspberry Pi, gyro sensor, DC motors, and a motor driver. Code examples are provided for reading sensor values, controlling motors with PWM, and implementing a balancing algorithm using a balancer class. Source code links are included for further reference.
Experiments in Sharing Java VM Technology with CRubyMatthew Gaudet
IBM is developing a just-in-time (JIT) compiler called Testarossa based on its Open Runtime (OMR) toolkit. The goal is to integrate the JIT compiler into MRI Ruby to improve performance without changing how MRI works. So far, the JIT supports most opcodes and can run Rails applications, but performance gains are modest. IBM hopes to collaborate with the Ruby community to further optimize Ruby and help make it faster.
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
Treasure Data is a data analytics service company that makes heavy use of Ruby in its platform and services. It uses Ruby for components like Fluentd (log collection), Embulk (data loading), scheduling, and its Rails-based API and console. Java and JRuby are also used for components involving Hadoop and Presto processing. The company's architecture includes collectors that ingest data, a PlazmaDB for storage, workers that process jobs on Hadoop and Presto clusters, and schedulers that queue and schedule those jobs using technologies like PerfectSched and PerfectQueue which are written in Ruby. Hive jobs are built programmatically using Ruby to generate configurations and submit the jobs to underlying Hadoop clusters.
This document summarizes a presentation about the future of the Rake gem and domain-specific languages (DSLs) in Ruby.
The presentation discusses:
1. How Rake works as a Make-like program implemented in Ruby syntax with tasks and dependencies. Rake files use standard Ruby syntax.
2. Examples of common patterns for building internal DSLs in Ruby using class/module methods, method definition, implicit/explicit code blocks, and instance evaluation.
3. How popular Ruby gems like Rake, Bundler, and Thor use DSL techniques and inherit from each other to provide domain-specific interfaces.
We start with an introduction to what Apache Camel is, and how you can use Camel to make integration much easier. Allowing you to focus on your business logic, rather than low level messaging protocols, and transports. You will also hear what other features Camel provides out of the box, which can make integration much easier for you.
We look into web console tooling that allows you to get insight into your running Apache Camel applications, which has among others visual route diagrams with tracing/debugging and profiling capabilities. In addition to the web tooling we will also show you other tools in the making.
Fluentd is a log collection tool that is well-suited for container environments. It allows for flexible log collection from containers through its variety of input plugins. Logs can be aggregated and buffered by Fluentd before being sent to output destinations like Elasticsearch. This addresses problems with traditional log collection in container environments by decoupling log collection from applications and making the infrastructure more scalable and reliable.
This document summarizes Yurie Yamane's presentation on using mruby to make robots. It discusses using mruby and TOPPERS on the LEGO EV3 to create an inverted pendulum self-balancing robot. It then covers creating a DIY self-balancing robot using a Raspberry Pi, gyro sensor, DC motors, and a motor driver. Code examples are provided for reading sensor values, controlling motors with PWM, and implementing a balancing algorithm using a balancer class. Source code links are included for further reference.
Experiments in Sharing Java VM Technology with CRubyMatthew Gaudet
IBM is developing a just-in-time (JIT) compiler called Testarossa based on its Open Runtime (OMR) toolkit. The goal is to integrate the JIT compiler into MRI Ruby to improve performance without changing how MRI works. So far, the JIT supports most opcodes and can run Rails applications, but performance gains are modest. IBM hopes to collaborate with the Ruby community to further optimize Ruby and help make it faster.
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
Treasure Data is a data analytics service company that makes heavy use of Ruby in its platform and services. It uses Ruby for components like Fluentd (log collection), Embulk (data loading), scheduling, and its Rails-based API and console. Java and JRuby are also used for components involving Hadoop and Presto processing. The company's architecture includes collectors that ingest data, a PlazmaDB for storage, workers that process jobs on Hadoop and Presto clusters, and schedulers that queue and schedule those jobs using technologies like PerfectSched and PerfectQueue which are written in Ruby. Hive jobs are built programmatically using Ruby to generate configurations and submit the jobs to underlying Hadoop clusters.
This document summarizes a presentation about the future of the Rake gem and domain-specific languages (DSLs) in Ruby.
The presentation discusses:
1. How Rake works as a Make-like program implemented in Ruby syntax with tasks and dependencies. Rake files use standard Ruby syntax.
2. Examples of common patterns for building internal DSLs in Ruby using class/module methods, method definition, implicit/explicit code blocks, and instance evaluation.
3. How popular Ruby gems like Rake, Bundler, and Thor use DSL techniques and inherit from each other to provide domain-specific interfaces.
We start with an introduction to what Apache Camel is, and how you can use Camel to make integration much easier. Allowing you to focus on your business logic, rather than low level messaging protocols, and transports. You will also hear what other features Camel provides out of the box, which can make integration much easier for you.
We look into web console tooling that allows you to get insight into your running Apache Camel applications, which has among others visual route diagrams with tracing/debugging and profiling capabilities. In addition to the web tooling we will also show you other tools in the making.
Fluentd is a log collection tool that is well-suited for container environments. It allows for flexible log collection from containers through its variety of input plugins. Logs can be aggregated and buffered by Fluentd before being sent to output destinations like Elasticsearch. This addresses problems with traditional log collection in container environments by decoupling log collection from applications and making the infrastructure more scalable and reliable.
Debugging PySpark: Spark Summit East talk by Holden KarauSpark Summit
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. This talk will examine how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, as well as some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose, and this talk will examine how to effectively search logs from Apache Spark to spot common problems. In addition to the internal logging, this talk will look at options for logging from within our program itself.
Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but this talk will look at how to effectively use Spark’s current accumulators for debugging as well as a look to future for data property type accumulators which may be coming to Spark in future version.
In addition to reading logs, and instrumenting our program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems.
Apache Camel Introduction & What's in the boxClaus Ibsen
Slides from JavaBin talk in Grimstad Norway, presented by Claus Ibsen in February 2016.
This slide deck is full up to date with latest Apache Camel 2.16.2 release and includes additional slides to present many of the features that Apache Camel provides out of the box.
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
We created a plugin-based data collection tool that can read any chaotically formatted files called "CSV" by guessing its schema automatically
Talked at csv,conf,v2 in Berlin
http://csvconf.com/
Apache Camel is a powerful open source integration framework that allows developers to focus on business logic by hiding complexity. It supports over 80 components and 19 data formats, and provides a domain-specific language for integration patterns in Java, XML, and Scala. Camel routes can be run in standalone applications or deployed to various containers.
Camel K allows building and deploying Apache Camel integration applications on Kubernetes in about 1 second. It provides a lightweight runtime for Camel on Kubernetes that enables low-code/no-code integration using Camel's Java DSL. Camel K applications can take advantage of serverless capabilities provided by Knative like autoscaling and scaling to zero. Quarkus is a Kubernetes-native Java stack that provides a minimal footprint and container-first experience for building microservices. It works well with Camel/Camel K by enabling native compilation of Camel routes for very fast startup times and low memory usage.
Rhebok, High Performance Rack Handler / Rubykaigi 2015Masahiro Nagano
This document discusses Rhebok, a high performance Rack handler written in Ruby. Rhebok uses a prefork architecture for concurrency and achieves 1.5-2x better performance than Unicorn. It implements efficient network I/O using techniques like IO timeouts, TCP_NODELAY, and writev(). Rhebok also uses the ultra-fast PicoHTTPParser for HTTP request parsing. The document provides an overview of Rhebok, benchmarks showing its performance, and details on its internals and architecture.
Internship final report@Treasure Data Inc.Ryuichi ITO
Ryuchi Ito completed an internship at Treasure Data Inc. where he worked on the open source machine learning library Hivemall. He conducted benchmarks of Hivemall's performance on logistic regression and random forest algorithms. He also added new features to Hivemall including a system testing framework, feature binning, feature selection, and Spark integrations. The benchmarks showed that Hivemall was relatively slow for logistic regression compared to other tools but had good scalability. For random forests, Hivemall performed well on small to medium datasets but struggled on very large datasets.
1) The document proposes making a key-value storage system (CDP KVS) 10 times more scalable to support real-time data delivery.
2) Three ideas are presented: using an alternative distributed KVS, implementing a storage hierarchy on the existing KVS, and shipping edit logs to indexed archives.
3) The storage hierarchy approach of partitioning, compressing, and writing data to DynamoDB in batches is selected as it improves write performance and reduces storage costs while remaining stateless.
2 hour session where I cover what is Apache Camel, latest news on the upcoming Camel v3, and then the main topic of the talk is the new Camel K sub-project for running integrations natively on the cloud with kubernetes. The last part of the talk is about running Camel with GraalVM / Quarkus to archive native compiled binaries that has impressive startup and footprint.
This document provides an overview of Apache Camel and its capabilities for system integration and implementing enterprise integration patterns. It discusses how Camel can be used for message routing and transformations using various components and languages. It also provides examples of how common integration patterns like content-based routing, filtering, splitting, and aggregating can be implemented using Camel's fluent builder style.
The document discusses how to contribute code to the Ruby programming language. It provides instructions for obtaining the Ruby source code, running tests on the Ruby codebase, and submitting patches to the Ruby bug tracking system. The tests include language tests, framework tests, and extension tests. The goal is to help developers get started testing and contributing to the Ruby core.
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, and some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose. Holden and Joey demonstrate how to effectively search logs from Apache Spark to spot common problems and discuss options for logging from within your program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but Holden and Joey look at how to effectively use Spark’s current accumulators for debugging before gazing into the future to see the data property type accumulators that may be coming to Spark in future versions. And in addition to reading logs and instrumenting your program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems. Holden and Joey cover how to quickly use the UI to figure out if certain types of issues are occurring in your job.
The talk will wrap up with Holden trying to get everyone to buy several copies of her new book, High Performance Spark.
The document provides performance best practices for Ruby on Rails applications. It discusses avoiding premature optimization, measuring performance bottlenecks, caching, SQL optimizations, and alternative storage options like NoSQL. It also recommends profiling tools like New Relic, Scout, Rack::Bug and ruby-prof to analyze logs and identify slow requests and actions. Benchmarking and integrating performance tests are also suggested for measuring and testing performance.
A short introduction (with many examples) to the Scala programming language and also an introduction to using the Play! Framework for modern, safe, efffcient and reactive web applications.
Apache Camel is an open source integration framework that allows for routing and mediation using enterprise integration patterns. It supports message routing between various transports and protocols and includes components for common systems as well as language support for writing routing rules in various scripting languages. The history and use of Camel contexts are also discussed.
Presentation from JVMLS 2015
One bottleneck in the Nashorn JavaScript engine is startup time. Nashorn, as it works currently in Java 8, JITs everything to Java bytecode, accruing overhead in code generation and class installation. Nashorn in Java 9, can in unfortunate cases, increase this compilation workload significantly, as the new optimistic type system, which has greatly increased steady state performance, requires more code invalidation on warmup. Based on our optimistic type compilation framework, which contains all the mechanisms for quick code replacement and on stack replacement on the bytecode level, I will present the new execution architecture we are developing. It will minimizes compile time intelligently, while maintaining or possible even increasing code performance, due to extra profiling and execution frequency information being passed to the JIT. I will also talk about what the future will bring in terms of other dynamic languages on the Nashorn engine, partial method compilation of hot paths and other intriguing possibilities that our new execution model opens up.
Spark Streaming can be used to process streaming data from Kafka in real-time. There are two main approaches - the receiver-based approach where Spark receives data from Kafka receivers, and the direct approach where Spark directly reads data from Kafka. The document discusses using Spark Streaming to process tens of millions of transactions per minute from Kafka for an ad exchange system. It describes architectures where Spark Streaming is used to perform real-time aggregations and update databases, as well as save raw data to object storage for analytics and recovery. Stateful processing with mapWithState transformations is also demonstrated to update Cassandra in real-time.
Presto is a distributed SQL query engine that allows for interactive analysis of large datasets across various data sources. It was created at Facebook to enable interactive querying of data in HDFS and Hive, which were too slow for interactive use. Presto addresses problems with existing solutions like Hive being too slow, the need to copy data for analysis, and high costs of commercial databases. It uses a distributed architecture with coordinators planning queries and workers executing tasks quickly in parallel.
Future of Ruby standard libraries will focus on gemification. Standard libraries will be extracted out of the Ruby core repository and maintained as default gems or bundled gems in GitHub repositories. This allows libraries to be updated independently of Ruby releases and more easily accept contributions. While this approach has benefits, it also has challenges around maintaining compatibility and complex dependencies. The process of gemification will be gradual to reduce the size of changes.
This document summarizes garbage collection in Ruby. It discusses the mark-and-sweep algorithm used in Ruby 1.8 and the introduction of lazy sweeping in Ruby 1.9.3 to improve performance. Ruby 2.0 switched to a bitmap marking GC and rewrote the mark phase to be non-recursive. Ruby 2.1 introduced new tuning variables, RGenGC with generational collection, and GC events. The document also briefly discusses memory management approaches in other languages like Python.
The document provides an overview of how Ruby programs are compiled and executed. It discusses how Ruby source code is tokenized and turned into an abstract syntax tree (AST) before being compiled into bytecode. It then describes how the Ruby interpreter implements a virtual machine that maps bytecode instructions to native operations. Key aspects covered include Ruby using a stack-based execution model, the interaction between the C stack, virtual machine stack, and Ruby call stack, and how garbage collection works through mark and sweep to reclaim unused memory.
Debugging PySpark: Spark Summit East talk by Holden KarauSpark Summit
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. This talk will examine how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, as well as some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose, and this talk will examine how to effectively search logs from Apache Spark to spot common problems. In addition to the internal logging, this talk will look at options for logging from within our program itself.
Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but this talk will look at how to effectively use Spark’s current accumulators for debugging as well as a look to future for data property type accumulators which may be coming to Spark in future version.
In addition to reading logs, and instrumenting our program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems.
Apache Camel Introduction & What's in the boxClaus Ibsen
Slides from JavaBin talk in Grimstad Norway, presented by Claus Ibsen in February 2016.
This slide deck is full up to date with latest Apache Camel 2.16.2 release and includes additional slides to present many of the features that Apache Camel provides out of the box.
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
We created a plugin-based data collection tool that can read any chaotically formatted files called "CSV" by guessing its schema automatically
Talked at csv,conf,v2 in Berlin
http://csvconf.com/
Apache Camel is a powerful open source integration framework that allows developers to focus on business logic by hiding complexity. It supports over 80 components and 19 data formats, and provides a domain-specific language for integration patterns in Java, XML, and Scala. Camel routes can be run in standalone applications or deployed to various containers.
Camel K allows building and deploying Apache Camel integration applications on Kubernetes in about 1 second. It provides a lightweight runtime for Camel on Kubernetes that enables low-code/no-code integration using Camel's Java DSL. Camel K applications can take advantage of serverless capabilities provided by Knative like autoscaling and scaling to zero. Quarkus is a Kubernetes-native Java stack that provides a minimal footprint and container-first experience for building microservices. It works well with Camel/Camel K by enabling native compilation of Camel routes for very fast startup times and low memory usage.
Rhebok, High Performance Rack Handler / Rubykaigi 2015Masahiro Nagano
This document discusses Rhebok, a high performance Rack handler written in Ruby. Rhebok uses a prefork architecture for concurrency and achieves 1.5-2x better performance than Unicorn. It implements efficient network I/O using techniques like IO timeouts, TCP_NODELAY, and writev(). Rhebok also uses the ultra-fast PicoHTTPParser for HTTP request parsing. The document provides an overview of Rhebok, benchmarks showing its performance, and details on its internals and architecture.
Internship final report@Treasure Data Inc.Ryuichi ITO
Ryuchi Ito completed an internship at Treasure Data Inc. where he worked on the open source machine learning library Hivemall. He conducted benchmarks of Hivemall's performance on logistic regression and random forest algorithms. He also added new features to Hivemall including a system testing framework, feature binning, feature selection, and Spark integrations. The benchmarks showed that Hivemall was relatively slow for logistic regression compared to other tools but had good scalability. For random forests, Hivemall performed well on small to medium datasets but struggled on very large datasets.
1) The document proposes making a key-value storage system (CDP KVS) 10 times more scalable to support real-time data delivery.
2) Three ideas are presented: using an alternative distributed KVS, implementing a storage hierarchy on the existing KVS, and shipping edit logs to indexed archives.
3) The storage hierarchy approach of partitioning, compressing, and writing data to DynamoDB in batches is selected as it improves write performance and reduces storage costs while remaining stateless.
2 hour session where I cover what is Apache Camel, latest news on the upcoming Camel v3, and then the main topic of the talk is the new Camel K sub-project for running integrations natively on the cloud with kubernetes. The last part of the talk is about running Camel with GraalVM / Quarkus to archive native compiled binaries that has impressive startup and footprint.
This document provides an overview of Apache Camel and its capabilities for system integration and implementing enterprise integration patterns. It discusses how Camel can be used for message routing and transformations using various components and languages. It also provides examples of how common integration patterns like content-based routing, filtering, splitting, and aggregating can be implemented using Camel's fluent builder style.
The document discusses how to contribute code to the Ruby programming language. It provides instructions for obtaining the Ruby source code, running tests on the Ruby codebase, and submitting patches to the Ruby bug tracking system. The tests include language tests, framework tests, and extension tests. The goal is to help developers get started testing and contributing to the Ruby core.
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, and some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose. Holden and Joey demonstrate how to effectively search logs from Apache Spark to spot common problems and discuss options for logging from within your program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but Holden and Joey look at how to effectively use Spark’s current accumulators for debugging before gazing into the future to see the data property type accumulators that may be coming to Spark in future versions. And in addition to reading logs and instrumenting your program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems. Holden and Joey cover how to quickly use the UI to figure out if certain types of issues are occurring in your job.
The talk will wrap up with Holden trying to get everyone to buy several copies of her new book, High Performance Spark.
The document provides performance best practices for Ruby on Rails applications. It discusses avoiding premature optimization, measuring performance bottlenecks, caching, SQL optimizations, and alternative storage options like NoSQL. It also recommends profiling tools like New Relic, Scout, Rack::Bug and ruby-prof to analyze logs and identify slow requests and actions. Benchmarking and integrating performance tests are also suggested for measuring and testing performance.
A short introduction (with many examples) to the Scala programming language and also an introduction to using the Play! Framework for modern, safe, efffcient and reactive web applications.
Apache Camel is an open source integration framework that allows for routing and mediation using enterprise integration patterns. It supports message routing between various transports and protocols and includes components for common systems as well as language support for writing routing rules in various scripting languages. The history and use of Camel contexts are also discussed.
Presentation from JVMLS 2015
One bottleneck in the Nashorn JavaScript engine is startup time. Nashorn, as it works currently in Java 8, JITs everything to Java bytecode, accruing overhead in code generation and class installation. Nashorn in Java 9, can in unfortunate cases, increase this compilation workload significantly, as the new optimistic type system, which has greatly increased steady state performance, requires more code invalidation on warmup. Based on our optimistic type compilation framework, which contains all the mechanisms for quick code replacement and on stack replacement on the bytecode level, I will present the new execution architecture we are developing. It will minimizes compile time intelligently, while maintaining or possible even increasing code performance, due to extra profiling and execution frequency information being passed to the JIT. I will also talk about what the future will bring in terms of other dynamic languages on the Nashorn engine, partial method compilation of hot paths and other intriguing possibilities that our new execution model opens up.
Spark Streaming can be used to process streaming data from Kafka in real-time. There are two main approaches - the receiver-based approach where Spark receives data from Kafka receivers, and the direct approach where Spark directly reads data from Kafka. The document discusses using Spark Streaming to process tens of millions of transactions per minute from Kafka for an ad exchange system. It describes architectures where Spark Streaming is used to perform real-time aggregations and update databases, as well as save raw data to object storage for analytics and recovery. Stateful processing with mapWithState transformations is also demonstrated to update Cassandra in real-time.
Presto is a distributed SQL query engine that allows for interactive analysis of large datasets across various data sources. It was created at Facebook to enable interactive querying of data in HDFS and Hive, which were too slow for interactive use. Presto addresses problems with existing solutions like Hive being too slow, the need to copy data for analysis, and high costs of commercial databases. It uses a distributed architecture with coordinators planning queries and workers executing tasks quickly in parallel.
Future of Ruby standard libraries will focus on gemification. Standard libraries will be extracted out of the Ruby core repository and maintained as default gems or bundled gems in GitHub repositories. This allows libraries to be updated independently of Ruby releases and more easily accept contributions. While this approach has benefits, it also has challenges around maintaining compatibility and complex dependencies. The process of gemification will be gradual to reduce the size of changes.
This document summarizes garbage collection in Ruby. It discusses the mark-and-sweep algorithm used in Ruby 1.8 and the introduction of lazy sweeping in Ruby 1.9.3 to improve performance. Ruby 2.0 switched to a bitmap marking GC and rewrote the mark phase to be non-recursive. Ruby 2.1 introduced new tuning variables, RGenGC with generational collection, and GC events. The document also briefly discusses memory management approaches in other languages like Python.
The document provides an overview of how Ruby programs are compiled and executed. It discusses how Ruby source code is tokenized and turned into an abstract syntax tree (AST) before being compiled into bytecode. It then describes how the Ruby interpreter implements a virtual machine that maps bytecode instructions to native operations. Key aspects covered include Ruby using a stack-based execution model, the interaction between the C stack, virtual machine stack, and Ruby call stack, and how garbage collection works through mark and sweep to reclaim unused memory.
Eclipse OMR: a modern toolkit for building language runtimesMark Stoodley
Eclipse OMR is a modern toolkit for building language runtimes that provides high quality runtime ingredients like a porting library, threading library, garbage collection framework, and JIT compiler tools. It has no language semantics of its own and is designed to be integrated into various language runtimes. The presentation demonstrates how OMR has already been used successfully in Ruby, Python, and Smalltalk runtimes and provides performance benefits. It invites others to get involved in the open source project.
Video presentation: https://www.youtube.com/watch?v=jLAFXQ1Av50
Most applications written in Ruby are great, but also exists evil code applying WOP techniques. There are many workarounds in several programming languages, but in Ruby, when it happens, the proportion is bigger. It's very easy to write Ruby code with collateral damage.
You will see a collection of bad Ruby codes, with a description of how these codes affected negatively their applications and the solutions to fix and avoid them. Long classes, coupling, misapplication of OO, illegible code, tangled flows, naming issues and other things you can ever imagine are examples what you'll get.
This document summarizes a presentation about Meteor and React. It introduces Meteor as a full-stack platform for building web and mobile apps with JavaScript. It demonstrates storing messages in a Meteor collection and rendering them with React components. It also covers latency compensation, user accounts, and the future of Meteor integrating React, Redux, GraphQL and Socket.io on the backend.
The document discusses testing PHP applications using SimpleTest, Selenium IDE, and CakePHP. It provides an overview of these testing tools and frameworks and recommends them for testing PHP applications.
The document discusses testing practices for the Ruby programming language. It provides details on how to run various test suites that are part of the Ruby source code repository, including:
1. Running the "make test" command which runs sample tests, known bug tests, and tests defined in the test/ directory.
2. Running "make test-all" which runs core library and standard library tests under the test/ directory.
3. Running "make check" which builds encodings and extensions, runs all test tasks including test frameworks like Test::Unit and Minitest.
4. It also discusses strategies for merging test changes from external repositories like RubyGems and RDoc back into the Ruby source code
Logging for Production Systems in The Container Era discusses how to effectively collect and analyze logs and metrics in microservices-based container environments. It introduces Fluentd as a centralized log collection service that supports pluggable input/output, buffering, and aggregation. Fluentd allows collecting logs from containers and routing them to storage systems like Kafka, HDFS and Elasticsearch. It also supports parsing, filtering and enriching log data through plugins.
Talk at RubyKaigi 2015.
Plugin architecture is known as a technique that brings extensibility to a program. Ruby has good language features for plugins. RubyGems.org is an excellent platform for plugin distribution. However, creating plugin architecture is not as easy as writing code without it: plugin loader, packaging, loosely-coupled API, and performance. Loading two versions of a gem is a unsolved challenge that is solved in Java on the other hand.
I have designed some open-source software such as Fluentd and Embulk. They provide most of functions by plugins. I will talk about their plugin-based architecture.
grifork - fast propagative task runner -IKEDA Kiyoshi
Grifork runs defined tasks on the system in a way like tree's branching.
Give grifork a list of hosts, then it creates a tree graph internally, and runs tasks in a top-down way.
Introduction to poloxy - proxy for alertingIKEDA Kiyoshi
- "poloxy" is a proxy system that pools and proxies alerts to prevent monitoring systems from bursting alerts to recipients. It works by having monitoring systems send alerts to "poloxy" instead of recipients. "poloxy" then enqueues alerts into a queue and has a worker dequeue alerts every minute and deliver them to original recipients, merging duplicate alerts and preventing repeated alerts from being delivered too frequently. This allows customizing how alerts are merged and controlling the frequency of alerts received by each recipient.
Experiments were conducted utilizing OMR technologies in Ruby MRI. OMR is an open source toolkit that implements language-agnostic parts of a managed runtime. It allows incremental development of new runtimes and consumption of advanced functionality. A preview of Ruby integrated with OMR included garbage collection, just-in-time compilation, and diagnostic tooling improvements. Further work was suggested to improve performance and remove limitations of the Ruby interpreter.
In this talk I give an overview of IBM's efforts to create a VM-agnostic toolkit of runtime components from the mature J9 Java Virtual Machine (JVM). I provide a summary of the motivations behind this project, talk about some important proof points with CPython and Ruby MRI, describe the motivations behind an open community for this technology, and discuss the many challenges with creating a runtime agnostic Just In Time compiler from the Testarossa Java JIT.
VMM2016-Eclipse OMR JITBuilder for better performanceCharlie Gracie
Quickly describe the OMR project, JITBuilder and give an example using JIT Builder in the SOMpp project. Show the performance results and talk about future improvements
This document discusses performance tuning and monitoring in different deployment environments. It begins by describing common performance problems seen in applications like lower than expected throughput. It then covers identifying bottlenecks like CPU usage, I/O, memory, and resource contention. Next, it contrasts monitoring in a "classic" on-premise deployment versus a cloud deployment. For the cloud, it recommends collecting log and metrics data and sending it to Elasticsearch for analysis using tools like Logstash and Kibana. It demonstrates sending application monitoring data from Health Center to Elasticsearch. Finally, it discusses how the same data can be visualized differently and managed monitoring solutions.
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
Big SQL is a SQL engine for Hadoop that excels at performance and scalability at high concurrency. Big SQL complements and integrates with Apache Hive for both data and metadata. An architecture that separates compute from storage allows Big SQL to support multiple open data formats natively. Until recently, Parquet provided a significant performance advantage over other data formats for SQL on Hadoop. The landscape changed when ORC became a top level Apache project independent from Hive. Gone were the days of reading ORC files using slow, single-row-at-a-time Hive Serdes. The new vectorized APIs in the Apache ORC libraries make it possible to ingest ORC data at blazing speed. This talk is about the journey leading to ORC taking the crown of best performing data format for Big SQL away from Parquet. We'll have a look under the hood at the architecture of Big SQL ORC readers, and how to tune them. We'll share lessons learned in walking the fine line between maximizing performance at scale and avoiding dreaded Java OOMs . You'll learn the techniques that SQL engines use for fast data ingestion, so that you can leverage the full potential of Apache ORC in any application.
Speaker:
Gustavo Arocena, Big Data Architect, IBM
Compilers have been improving programmer productivity ever since IBM produced the first FORTRAN compiler in 1957. Today, we mostly take them for granted but even after more than 60 years, compiler researchers and practitioners continue to push the boundaries for what compilers can achieve as well as how easy it is to leverage the sophisticated code bases that encapsulate those six decades of learning in this field. In this talk, I want to highlight how industry trends like the migration to cloud infrastructures and data centers as well as the rise of flexibly licensed open source projects like LLVM and Eclipse OMR are paving the way towards even more effective and powerful compilation infrastructures than have ever existed: compilers with the opportunity to contribute to programmer productivity in even more ways than simply better hardware instruction sequences, and with simpler APIs so they can be readily used in scenarios where even today's most amazing Just In Time compilers are not really practical.
Describes ongoing work at Eclipse OMR and Eclipse OpenJ9 open source projects to develop Just In Time compiler technology that can be deployed independently of a runtime (like a JVM in OpenJ9's case). The actual presentation included two demos which don't both appear in the slides but those demos are available in the open so contact me if you want the details .
The document discusses porting OpenJ9 JDK to RISC-V architecture. It involves preparing the software toolchain for cross-compilation to RISC-V, preparing hardware like the HiFive Unleashed development board, and developing OpenJ9 JDK through a mix of local and cross compilation. The status shows OpenJ9 JDK can execute in interpreter mode on the RISC-V emulator and HiFive board running Debian, with future work planned on JIT support, different GC strategies, and supporting other Java versions.
Enabling a hardware accelerated deep learning data science experience for Apa...DataWorks Summit
Deep learning techniques are finding significant commercial success in a wide variety of industries. Large unstructured data sets such as images, videos, speech and text are great for deep learning, but impose a lot of demands on computing resources. New types of hardware architectures such as GPUs and faster interconnects (e.g. NVLink), RDMA capable networking interface from Mellanox available on OpenPOWER and IBM POWER systems are enabling practical speedups for deep learning. Data Scientists can intuitively incorporate deep learning capabilities on accelerated hardware using open source components such as Jupyter and Zeppelin notebooks, RStudio, Spark, Python, Docker, and Kubernetes with IBM PowerAI. Jupyter and Apache Zeppelin integrate well with Apache Spark and Hadoop using the Apache Livy project. This session will show some deep learning build and deploy steps using Tensorflow and Caffe in Docker containers running in a hardware accelerated private cloud container service. This session will also show system architectures and best practices for deployments on accelerated hardware. INDRAJIT PODDAR, Senior Technical Staff Member, IBM
Accelerating Machine Learning Applications on Spark Using GPUsIBM
Matrix factorization (MF) is widely used in recommendation systems. We present cuMF, a highly-optimized matrix factorization tool with supreme performance on graphics processing units (GPUs) by fully utilizing the GPU compute power and minimizing the overhead of data movement. Firstly, we introduce a memory-optimized alternating least square (ALS) method by reducing discontiguous memory access and aggressively using registers to reduce memory latency. Secondly, we combine data parallelism with model parallelism to scale to multiple GPUs.
Results show that with up to four GPUs on one machine, cuMF can be up to ten times as fast as those on sizable clusters on large scale problems, and has impressively good performance when solving the largest matrix factorization problem ever reported.
FOSDEM 2017 - Open J9 The Next Free Java VMCharlie Gracie
I will discuss the J9 VM technology and our plans on open sourcing the technology. My team has already open sourced a lot of the underlying technology as part of the Eclipse OMR project and now we are working open sourcing the rest of the technology.
Getting to the Next Level with Eclipse Concierge - Jan Rellermeyer + Tim Verb...mfrancis
OSGi Community Event 2016 Presentation by Jan Rellermeyer (IBM), Tim Verbelen (imec) & Jochen Hiller (Deutsche Telekom AG)
Eclipse Concierge provides a clean, small and lightweight implementation of the OSGi core framework specification, specifically tailored to embedded systems and IoT. In this talk, we will cover how to use and deploy the Concierge OSGi framework (e.g. using OSGi enRoute), and discuss many of the new and upcoming features in the Concierge project such as the OSGi REST interface and Cloud Ecosystems reference implementations. We will also present our work in progress on implementing the OSGi R6 core specification level and novel demonstrations that illustrate the advantages of having a lean and streamlined OSGi implementation to deal with deployment and dynamism in IoT applications.
Manual application deployment processes tend to be error prone and inefficient and can make achieving consistent deployments seem impossible.
There is good news. You don’t need to choose between a careful, rigorous approach and a speedy but haphazard one. It’s possible to implement an automated deployment solution that provides consistency and audit trails while improving productivity for your release engineers, operations personnel, and testers. See how!
Learn more about UrbanCode: http://ibm.biz/learnurbancode
Java and the GPU - Everything You Need To KnowAdam Roberts
Here are the main types of GPUs and some key differences:
- Consumer/gaming GPUs: These are graphics cards primarily designed for gaming and consumer applications. Examples include Nvidia GeForce and AMD Radeon cards. They have good price/performance but may lack some features of professional GPUs.
- Professional/workstation GPUs: Higher-end cards designed for professional applications like CAD, content creation, etc. Examples include Nvidia Quadro and AMD FirePro. Tend to be more expensive than gaming GPUs but have stronger drivers, support, and certifications for professional software.
- Cloud/data center GPUs: GPUs designed for high performance computing and machine learning workloads. Have much more
IBM's product provides virtualization capabilities to help address testing challenges in complex enterprise environments involving both mainframe and distributed systems. It can virtualize key mainframe components like CICS, IMS, MQ/z, and DB2/z to allow for testing without relying on limited mainframe resources. This helps reduce costs, decouple development and testing from production systems, and speed up test cycles. Typical customer cases demonstrated how virtualization could help by providing isolated test environments, automating tests, and comparing results across platforms during migration projects. Benefits included lower costs, faster cycles, and the ability to test more scenarios.
Academic Discussion Group Workshop 2018 November 10 st 2018 Nimbix CAPI SNAP...Ganesan Narayanasamy
This document provides notices and disclaimers for a presentation on CAPI SNAP on Nimbix given on November 10, 2018. It states that the information presented is subject to change and may contain errors. It also limits IBM's liability and notes that any performance comparisons made may not accurately reflect all environments. References to non-IBM products are based on their published information and IBM makes no claims about their capabilities. The document also notes that workshops and materials do not necessarily reflect IBM's views and that IBM does not provide legal advice regarding compliance with laws.
WebSphere Technical University: Introduction to the Java Diagnostic ToolsChris Bailey
IBM provides a number of free tools to assist in monitoring and diagnosing issues when running
any Java application - from Hello World to IBM or third-party, middleware-based applications. This
session introduces attendees to those tools, highlights how they have been extended with IBM
middleware product knowledge, how they have been integrated into IBM’s development tools,
and how to use them to investigate and resolve real-world problem scenarios
Presented at the WebSphere Technical University 2014, Dusseldorf
The document summarizes a student project to build a real-time speech recognition system using embedded hardware. The goals were to design a system that could recognize numbers "zero" through "nine" in real-time. The system architecture used a source-filter model of speech and support vector machines for classification. An microcontroller calculated cepstral coefficients from recorded speech and transmitted them to a Raspberry Pi for classification and to demonstrate recognition through actuators. Challenges included data overflow and Bluetooth communication which were addressed.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
18. • Fantastic worldwide community
• Ruby has changed a lot over 20 years
• MRI is a complex virtual machine with a lot of history behind
it’s design decisions.
20
25. Ruby has an incremental GC
Mark Sweep
Ruby
27
• GC work is interleaved with execution
• Minimizes mutator pause times
• Single threaded
• Serialized with the mutator
30. 32
Starting out with Mark Sweep
static int
gc_start(rb_objspace_t *objspace …)
{
…
OMR_GC_SystemCollect();
...
}
static int
newobj_of(rb_objspace_t *objspace …)
{
...
obj = rb_omr_get_freeobj(th, sizeof(RVALUE));
...
}
31. 33
Starting out with Mark Sweep
Need to support conservative collection
Conservative collection
Object Map (bit map)
• Object map allows us to support
conservative collection
• Bit Map used to keep track of objects
• Results in ~ 1.6% memory overhead
33. Multithreaded GC (Stop the world)
35
mark sweep
• Introducing Parallel Global GC
• Aggressively parallelized
• New APIs for parallelism
• Support for thread pooling and task
synchronization
34. Marking in parallel
36
mark sweep
Marking:
• Scan roots in parallel
• Break VM roots into subsets
• Complete marking in parallel
35. Sweeping in parallel
37
mark sweep
Sweeping:
• Free malloc space in parallel
• Clean up objects in parallel
• Move work out of finalization
49. How do we replace malloc/free?
• malloc/free callouts are expensive
• Rely on system for memory management concerns
• Still susceptible to fragmentation and concurrency
• Black box implementation
• Replace malloc and free with a new allocator
52
51. New built-in type OMRBuffers
• Create a new, variable sized object type
• Allocate all buffers on the heap as objects
typedef struct OMRBuffer {
VALUE flags;
long size;
} OMRBuffer;
56
buffer
52. 58
OMRBuffers on the heap!
string arrayhash data …
Managed Memory
buffer buffer buffer buffer
Malloc Memory
54. Getting user defined types on heap
• RDatas are used to create C-extension types.
• Typed RDatas have a new flag:
RDATA_HEAP_ALLOCATED
• Automatically heap allocates the data buffer at allocation
• No free method results in no object finalization
• Allows for heap allocation in extensions
• Extremely easy to use
60
56. Multithreaded allocations
• MRI does some work in backgrounded threads
• Use OMRBuffers in background threads
• Introduce finer grained locking than the GVL
• Allows for:
• multithreaded allocations
• GCing from a background thread
62
59. Generational GC in MRI
• Experimenting with non-copying generational GC
• Marking is already fast
• Heap fragmentations issues
• High memory overhead
65
60. Segregated heap in MRI
• Introduced a segregated heap into ruby
• Heap divided into regions of fixed sized objects
• Bounds maximum heap fragmentation
66
61. 67
Concurrent GC in MRI
• Adding Concurrent GC to Ruby
• Ruby threads incrementally mark
• Background thread scans
• Parallelized sweep (STW)
sweepmark
63. OMR GC in Ruby – what’s next
• Ongoing experimentation
• Balanced
• Semispace copying generational collection
• compaction
• Real time garbage collection
71
64. What’s next for OMR?
• Open source OMR
• Make our Ruby experiments available
• We want to hear from the experts (you)
• Lets make OMR and Ruby the best they can be
72
69. John Duimovich
CTO IBM Runtimes
duimovic@ca.ibm.com
@jduimovich
Ask us Anything!
Mark Stoodley
OMR Project Lead
mstoodle@ca.ibm.com
@mstoodle
Robert Young
OMR Developer
rwyoung@ca.ibm.com
@rwy0717
77
Charlie Gracie
OMR GC Architect
crgracie@ca.ibm.com
@crgracie
Craig Lehmann
OMR Developer
craigl@ca.ibm.com
@craiglehmann
73. Additional Important Disclaimers
• THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL
PURPOSES ONLY.
• WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE
INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY
OF ANY KIND, EXPRESS OR IMPLIED.
• ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A
CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE,
SOFTWARE OR INFRASTRUCTURE DIFFERENCES.
• ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.
• IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT
PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.
• IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING
OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER
DOCUMENTATION.
• NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• - CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS
OR THEIR SUPPLIERS AND/OR LICENSORS
89