The document describes using a MultiOutputFormat in MapReduce to generate separate output files for each stock price based on input that contains stock price data. It includes code for a mapper that extracts the stock name and price from each input record and a reducer that writes these values to individual files for each stock name. Unit tests are also described to test the reducer by mocking the MultipleOutputs class and validating that the output files contain the expected stock price values.
Cascading provides a simpler way to write MapReduce programs through data flows. It uses a pipe and tap metaphor where data flows through pipes and is read from or written to taps. This allows assembling MapReduce jobs as data flow graphs in a more logical way compared to the traditional MapReduce API.
InfluxQL is a powerful query language for InfluxDB, and TICKScript is a domain specific language used by Kapacitor to define tasks involving the extraction, transformation and loading of data and also involving the tracking of arbitrary changes and detection of events within data. The combination of these two can make your monitoring apps powerful. During this session, InfluxData Engineer Michael DeSa will share best practices for using these powerful tools. Prerequisite: Intro To Kapacitor.
Programming with ZooKeeper - A basic tutorialJeff Smith
This document provides a tutorial on using ZooKeeper to implement basic distributed synchronization primitives like barriers and producer-consumer queues. It includes code examples for Barrier and Queue classes that extend a base SyncPrimitive class. The Barrier class allows processes to synchronize barrier entry and exit. The Queue class implements a distributed queue where producers can add elements and consumers can remove the oldest element. Both use ZooKeeper to coordinate access through ephemeral nodes and watches.
This document introduces Test Driven Development (TDD) for MapReduce jobs using the MRUnit testing framework. It discusses how TDD is difficult for Hadoop due to its distributed nature but can be achieved by abstracting business logic. It provides examples of using MRUnit to test mappers, reducers and full MapReduce jobs. It also discusses testing with real data by loading samples into the local filesystem or using a WindowsLocalFileSystem class to enable permission testing on Windows.
While compute becomes faster and cheaper we are tempted to abandon sanity and shield ourselves from reality and laws of physics. The resulting mess of monstrous Slack instances rampaging across our RAM should makes us stop (because our computers did it already) and wonder where did we go wrong? Rising developer salaries and time to market pace are tempting us to abandon all hope for optimising our code and understanding our systems.
Contrary to what casual reader could think this is a deeply technical presentation. We will gaze into hardware counters, NUMA nodes, vector registers and that darkness will stare back at us.
All this to get a taste of what is possible on current hardware, to learn the COST of scalability and forever change how you feel when accessing invoice list in your local utilities provider UI so that after 20s of waiting all 12 elements will be displayed (surely Cthulhu must be eating their compute because it is NOT possible Tauron hosts it’s billing services on FIRST GEN IPHONE).
Vielseitiges In-Memory Computing mit Apache Ignite und KubernetesQAware GmbH
IT-Tage 2017, Frankfurt am Main: Vortrag von Mario-Leander Reimer (@LeanderReimer, Cheftechnologe bei QAware)
Abstract:
Mit Apache Ignite steht eine hochperformante, integrierte und verteilte In-Memory-Plattform bereit, die im Zusammenspiel mit Kubernetes zu wahrer Hochform aufläuft. In dieser Kombination lassen sich flexibel skalierbare In-Memory Computing-Systeme elegant realisieren.
In diesem Vortrag stellen wir die wesentlichen Features und die Architektur von Apache Ignite vor. Anhand von anschaulichen Beispielen zeigen wir mögliche Use Cases, wie etwa den Einsatz als Kommunikations-Backbone einer Microservice-Architektur oder als Plattform zur Verarbeitung von kontinuierlichen Event-Daten. Zur Demonstration von Resilienz und Skalierbarkeit des In-Memory Data-Grids werden die Beispiele auf einem Kubernetes Cluster ausgeführt.
Cascading provides a simpler way to write MapReduce programs through data flows. It uses a pipe and tap metaphor where data flows through pipes and is read from or written to taps. This allows assembling MapReduce jobs as data flow graphs in a more logical way compared to the traditional MapReduce API.
InfluxQL is a powerful query language for InfluxDB, and TICKScript is a domain specific language used by Kapacitor to define tasks involving the extraction, transformation and loading of data and also involving the tracking of arbitrary changes and detection of events within data. The combination of these two can make your monitoring apps powerful. During this session, InfluxData Engineer Michael DeSa will share best practices for using these powerful tools. Prerequisite: Intro To Kapacitor.
Programming with ZooKeeper - A basic tutorialJeff Smith
This document provides a tutorial on using ZooKeeper to implement basic distributed synchronization primitives like barriers and producer-consumer queues. It includes code examples for Barrier and Queue classes that extend a base SyncPrimitive class. The Barrier class allows processes to synchronize barrier entry and exit. The Queue class implements a distributed queue where producers can add elements and consumers can remove the oldest element. Both use ZooKeeper to coordinate access through ephemeral nodes and watches.
This document introduces Test Driven Development (TDD) for MapReduce jobs using the MRUnit testing framework. It discusses how TDD is difficult for Hadoop due to its distributed nature but can be achieved by abstracting business logic. It provides examples of using MRUnit to test mappers, reducers and full MapReduce jobs. It also discusses testing with real data by loading samples into the local filesystem or using a WindowsLocalFileSystem class to enable permission testing on Windows.
While compute becomes faster and cheaper we are tempted to abandon sanity and shield ourselves from reality and laws of physics. The resulting mess of monstrous Slack instances rampaging across our RAM should makes us stop (because our computers did it already) and wonder where did we go wrong? Rising developer salaries and time to market pace are tempting us to abandon all hope for optimising our code and understanding our systems.
Contrary to what casual reader could think this is a deeply technical presentation. We will gaze into hardware counters, NUMA nodes, vector registers and that darkness will stare back at us.
All this to get a taste of what is possible on current hardware, to learn the COST of scalability and forever change how you feel when accessing invoice list in your local utilities provider UI so that after 20s of waiting all 12 elements will be displayed (surely Cthulhu must be eating their compute because it is NOT possible Tauron hosts it’s billing services on FIRST GEN IPHONE).
Vielseitiges In-Memory Computing mit Apache Ignite und KubernetesQAware GmbH
IT-Tage 2017, Frankfurt am Main: Vortrag von Mario-Leander Reimer (@LeanderReimer, Cheftechnologe bei QAware)
Abstract:
Mit Apache Ignite steht eine hochperformante, integrierte und verteilte In-Memory-Plattform bereit, die im Zusammenspiel mit Kubernetes zu wahrer Hochform aufläuft. In dieser Kombination lassen sich flexibel skalierbare In-Memory Computing-Systeme elegant realisieren.
In diesem Vortrag stellen wir die wesentlichen Features und die Architektur von Apache Ignite vor. Anhand von anschaulichen Beispielen zeigen wir mögliche Use Cases, wie etwa den Einsatz als Kommunikations-Backbone einer Microservice-Architektur oder als Plattform zur Verarbeitung von kontinuierlichen Event-Daten. Zur Demonstration von Resilienz und Skalierbarkeit des In-Memory Data-Grids werden die Beispiele auf einem Kubernetes Cluster ausgeführt.
Taking Jenkins Pipeline to the Extremeyinonavraham
Slide deck from Jenkins User Conference Tel Aviv 2018.
Talking about suggested (best?) practices, tips and tricks, using Jenkins pipeline scripts with shared libraries, managing shared libraries, using docker compose, and more.
This document summarizes MySQL's init_connect feature which allows SQL statements to be executed for each client connection. It provides examples of setting init_connect to log client connections to a table and discusses how to address issues like preventing the logs from being written to binary logs. The document also estimates storage needed for connection logs and provides an example of periodically deleting old log entries.
Innovative Specifications for Better Performance Logging and MonitoringCary Millsap
Imagine a car with no speedometer. There are speed limit signs and policemen all around with radar guns waiting to catch you speeding, but you have no way of knowing how fast you're going. Of course, a car like this has no openable hood (no bonnet), so to change the air filter, you have to hire a specialist to saw into the body of your car. A car like this would be preposterous. Yet people write software like this all the time.
The Oracle Database has some of the best performance logging features built into it of any software in the world. You can use it with any application—even applications that were built without logging and monitoring in mind. But you can go SO much further if you bother to include some performance logging features in your application. In this session, I explain Oracle's extended SQL tracing feature and describe how to enable and disable it. Then I show some innovative ideas that will help you design and build database applications that are easier to monitor, manage, and maintain throughout the software development life cycle.
An update to what has been going on with CFEngine Between January 2017 and February 2018.
Slide Source: https://github.com/nickanderson/State-of-the-CFEngine/tree/cfgmgmt-ghent-2018
Why Kotlin - Apalon Kotlin Sprint Part 1Kirill Rozov
Kotlin is a modern programming language that was created by JetBrains as a replacement for Java, with some key advantages:
- It simplifies development tasks like creating data classes and working with collections. Kotlin also reduces the amount of code needed for common operations.
- Kotlin works seamlessly with Java code and is fully interoperable. It is also compatible with existing Java tools and platforms.
- The language has seen growing adoption since its 1.0 release in 2016 and is now officially supported by Google for Android development. Many large companies and open source projects use Kotlin due to its improvements over Java.
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Ontico
Postgres has always had strong support for relational storage. However, there are many cases where relational storage is either inefficient or overly restrictive. This talk shows the many ways that Postgres has expanded to support non-relational storage, specifically the ability to store and index multiple values, even unrelated ones, in a single database field. Such storage allows for greater efficiency and access simplicity, and can also avoid the negatives of entity-attribute-value (eav) storage. The talk will cover many examples of multiple-value-per-field storage, including arrays, range types, geometry, full text search, xml, json, and records.
The document discusses configuring and monitoring buffer pools and memory settings for a DB2 database instance and partitions. It includes commands to:
- Show buffer pool information and alter a buffer pool size
- View tablespace to buffer pool mappings
- Check database and instance configuration parameters for memory settings
- List instances and reset the monitoring
- View buffer pool snapshots
This document provides an overview of Cassandra, including:
- Why Cassandra is used for big data applications handling large volumes of data.
- How Cassandra's distributed architecture provides high availability and horizontal scalability.
- Details of Cassandra's write path, including how writes are replicated across nodes and how consistency is ensured.
- Examples of modeling data in Cassandra, including choices for primary keys, clustering columns, and other techniques.
- Common use cases where Cassandra is applicable, such as sensor data, fraud detection, and personalization engines.
The document discusses various techniques for profiling CPU and memory performance in Rust programs, including:
- Using the flamegraph tool to profile CPU usage by sampling a running process and generating flame graphs.
- Integrating pprof profiling into Rust programs to expose profiles over HTTP similar to how it works in Go.
- Profiling heap usage by integrating jemalloc profiling and generating heap profiles on program exit.
- Some challenges with profiling asynchronous Rust programs due to the lack of backtraces.
The key takeaways are that there are crates like pprof-rs and techniques like jemalloc integration that allow collecting CPU and memory profiles from Rust programs, but profiling asynchronous programs
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
The document discusses running a TensorFlow Serving (TFS) container using Docker. It shows commands to:
1. Pull the TFS Docker image from a repository
2. Define a script to configure and run the TFS container, specifying the model path, name, and port mapping
3. Run the script to start the TFS container exposing port 13377
This document discusses common scenarios for using Varnish Configuration Language (VCL) scripts to configure the caching behavior of the Varnish caching server. It covers topics like normalizing requests, caching static assets, whitelisting and blacklisting URLs, handling cookies, using Edge Side Includes, controlling the time to live for cached responses, debugging techniques, and purging cached content. Example VCL code is provided for many of these common use cases.
PostgreSQL Procedural Languages: Tips, Tricks and GotchasJim Mlodgenski
One of the most powerful features of PostgreSQL is its diversity of procedural languages, but with that diversity comes a lot of options.
Did you ever wonder:
- What all of those options are on the CREATE FUNCTION statement?
- How do they affect my application?
- Does my choice of procedural language affect the performance of my statements?
- Should I create a single trigger with IF statements or several simple triggers?
- How do I debug my code?
- Can I tell which line in my function is taking all of the time?
Large scale machine learning projects with r suiteWit Jakuczun
Agenda for the workshop I conducted at ML@Enterprise conference that took place on 14th of December 2017 in Warsaw.
Machine Learning is not only about algorithms. Machine learning is about value and this can be achieved only after proper deployment of Machine Learning solutions. I will present best practices regarding managing R based ML projects. I will use our open-source tool R Suite (http://rsuite.io/). During the workshop I will talk about:
– project structure
– development cycle
– deployment
– test
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitSylvain Hellegouarch
A report generated by the Chaos Toolkit after a Chaos Engineering experiment against OpenFaaS on Kubernetes.
View the run of the experiment at https://asciinema.org/a/dv4cNOMC5k1oWDhWDe97d5eij
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Big Data Spain
Distributed training is a complex process that does more harm than good if it not setup correctly.
https://www.bigdataspain.org/2017/talk/apache-mxnet-distributed-training-explained-in-depth
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
The document discusses strategic autovacuum configuration and monitoring in PostgreSQL. It begins by explaining the ACID properties and how MVCC and transactions work. It then discusses how to monitor workloads for heavily updated tables, adjust per-table autovacuum thresholds to prioritize those tables, monitor autovacuum behavior over time using logs and queries, and tune the autovacuum throttle settings based on that monitoring to optimize autovacuum performance. The key steps are to start with defaults, monitor workload changes, adjust settings for busy tables, continue monitoring, and refine settings as needed.
The document discusses sessionization with Spark streaming to analyze user sessions from a constant stream of page visit data. Key points include:
- Streaming page visit data presents challenges like joining new visits to ongoing sessions and handling variable data volumes and long user sessions.
- The proposed solution uses Spark streaming to join a checkpoint of incomplete sessions with new visit data to calculate session metrics in real-time.
- Important aspects are controlling data ingress size and partitioning to optimize performance of operations like joins and using custom formats to handle output to multiple sinks.
The Ring programming language version 1.5.3 book - Part 78 of 184Mahmoud Samir Fayed
This document contains code for a notepad application user interface created with the Ring programming language. It defines functions for creating the main window, menu bar, toolbars, dock widgets, and text editor. It also includes functions for common editing actions like opening, saving, printing files and performing search/replace.
This document discusses different types of ships based on their usage and support type. It describes merchant ships like general cargo vessels, tankers, bulk carriers, and container ships. It also covers naval and coast guard vessels, recreational vessels, utility tugs, research ships, ferries, and more. The document further categorizes ships based on their support type such as aerostatic, hydrodynamic, hydrostatic, and submarines.
Taking Jenkins Pipeline to the Extremeyinonavraham
Slide deck from Jenkins User Conference Tel Aviv 2018.
Talking about suggested (best?) practices, tips and tricks, using Jenkins pipeline scripts with shared libraries, managing shared libraries, using docker compose, and more.
This document summarizes MySQL's init_connect feature which allows SQL statements to be executed for each client connection. It provides examples of setting init_connect to log client connections to a table and discusses how to address issues like preventing the logs from being written to binary logs. The document also estimates storage needed for connection logs and provides an example of periodically deleting old log entries.
Innovative Specifications for Better Performance Logging and MonitoringCary Millsap
Imagine a car with no speedometer. There are speed limit signs and policemen all around with radar guns waiting to catch you speeding, but you have no way of knowing how fast you're going. Of course, a car like this has no openable hood (no bonnet), so to change the air filter, you have to hire a specialist to saw into the body of your car. A car like this would be preposterous. Yet people write software like this all the time.
The Oracle Database has some of the best performance logging features built into it of any software in the world. You can use it with any application—even applications that were built without logging and monitoring in mind. But you can go SO much further if you bother to include some performance logging features in your application. In this session, I explain Oracle's extended SQL tracing feature and describe how to enable and disable it. Then I show some innovative ideas that will help you design and build database applications that are easier to monitor, manage, and maintain throughout the software development life cycle.
An update to what has been going on with CFEngine Between January 2017 and February 2018.
Slide Source: https://github.com/nickanderson/State-of-the-CFEngine/tree/cfgmgmt-ghent-2018
Why Kotlin - Apalon Kotlin Sprint Part 1Kirill Rozov
Kotlin is a modern programming language that was created by JetBrains as a replacement for Java, with some key advantages:
- It simplifies development tasks like creating data classes and working with collections. Kotlin also reduces the amount of code needed for common operations.
- Kotlin works seamlessly with Java code and is fully interoperable. It is also compatible with existing Java tools and platforms.
- The language has seen growing adoption since its 1.0 release in 2016 and is now officially supported by Google for Android development. Many large companies and open source projects use Kotlin due to its improvements over Java.
Non-Relational Postgres / Bruce Momjian (EnterpriseDB)Ontico
Postgres has always had strong support for relational storage. However, there are many cases where relational storage is either inefficient or overly restrictive. This talk shows the many ways that Postgres has expanded to support non-relational storage, specifically the ability to store and index multiple values, even unrelated ones, in a single database field. Such storage allows for greater efficiency and access simplicity, and can also avoid the negatives of entity-attribute-value (eav) storage. The talk will cover many examples of multiple-value-per-field storage, including arrays, range types, geometry, full text search, xml, json, and records.
The document discusses configuring and monitoring buffer pools and memory settings for a DB2 database instance and partitions. It includes commands to:
- Show buffer pool information and alter a buffer pool size
- View tablespace to buffer pool mappings
- Check database and instance configuration parameters for memory settings
- List instances and reset the monitoring
- View buffer pool snapshots
This document provides an overview of Cassandra, including:
- Why Cassandra is used for big data applications handling large volumes of data.
- How Cassandra's distributed architecture provides high availability and horizontal scalability.
- Details of Cassandra's write path, including how writes are replicated across nodes and how consistency is ensured.
- Examples of modeling data in Cassandra, including choices for primary keys, clustering columns, and other techniques.
- Common use cases where Cassandra is applicable, such as sensor data, fraud detection, and personalization engines.
The document discusses various techniques for profiling CPU and memory performance in Rust programs, including:
- Using the flamegraph tool to profile CPU usage by sampling a running process and generating flame graphs.
- Integrating pprof profiling into Rust programs to expose profiles over HTTP similar to how it works in Go.
- Profiling heap usage by integrating jemalloc profiling and generating heap profiles on program exit.
- Some challenges with profiling asynchronous Rust programs due to the lack of backtraces.
The key takeaways are that there are crates like pprof-rs and techniques like jemalloc integration that allow collecting CPU and memory profiles from Rust programs, but profiling asynchronous programs
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기NAVER D2
The document discusses running a TensorFlow Serving (TFS) container using Docker. It shows commands to:
1. Pull the TFS Docker image from a repository
2. Define a script to configure and run the TFS container, specifying the model path, name, and port mapping
3. Run the script to start the TFS container exposing port 13377
This document discusses common scenarios for using Varnish Configuration Language (VCL) scripts to configure the caching behavior of the Varnish caching server. It covers topics like normalizing requests, caching static assets, whitelisting and blacklisting URLs, handling cookies, using Edge Side Includes, controlling the time to live for cached responses, debugging techniques, and purging cached content. Example VCL code is provided for many of these common use cases.
PostgreSQL Procedural Languages: Tips, Tricks and GotchasJim Mlodgenski
One of the most powerful features of PostgreSQL is its diversity of procedural languages, but with that diversity comes a lot of options.
Did you ever wonder:
- What all of those options are on the CREATE FUNCTION statement?
- How do they affect my application?
- Does my choice of procedural language affect the performance of my statements?
- Should I create a single trigger with IF statements or several simple triggers?
- How do I debug my code?
- Can I tell which line in my function is taking all of the time?
Large scale machine learning projects with r suiteWit Jakuczun
Agenda for the workshop I conducted at ML@Enterprise conference that took place on 14th of December 2017 in Warsaw.
Machine Learning is not only about algorithms. Machine learning is about value and this can be achieved only after proper deployment of Machine Learning solutions. I will present best practices regarding managing R based ML projects. I will use our open-source tool R Suite (http://rsuite.io/). During the workshop I will talk about:
– project structure
– development cycle
– deployment
– test
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitSylvain Hellegouarch
A report generated by the Chaos Toolkit after a Chaos Engineering experiment against OpenFaaS on Kubernetes.
View the run of the experiment at https://asciinema.org/a/dv4cNOMC5k1oWDhWDe97d5eij
Apache MXNet Distributed Training Explained In Depth by Viacheslav Kovalevsky...Big Data Spain
Distributed training is a complex process that does more harm than good if it not setup correctly.
https://www.bigdataspain.org/2017/talk/apache-mxnet-distributed-training-explained-in-depth
Big Data Spain 2017
November 16th - 17th Kinépolis Madrid
The document discusses strategic autovacuum configuration and monitoring in PostgreSQL. It begins by explaining the ACID properties and how MVCC and transactions work. It then discusses how to monitor workloads for heavily updated tables, adjust per-table autovacuum thresholds to prioritize those tables, monitor autovacuum behavior over time using logs and queries, and tune the autovacuum throttle settings based on that monitoring to optimize autovacuum performance. The key steps are to start with defaults, monitor workload changes, adjust settings for busy tables, continue monitoring, and refine settings as needed.
The document discusses sessionization with Spark streaming to analyze user sessions from a constant stream of page visit data. Key points include:
- Streaming page visit data presents challenges like joining new visits to ongoing sessions and handling variable data volumes and long user sessions.
- The proposed solution uses Spark streaming to join a checkpoint of incomplete sessions with new visit data to calculate session metrics in real-time.
- Important aspects are controlling data ingress size and partitioning to optimize performance of operations like joins and using custom formats to handle output to multiple sinks.
The Ring programming language version 1.5.3 book - Part 78 of 184Mahmoud Samir Fayed
This document contains code for a notepad application user interface created with the Ring programming language. It defines functions for creating the main window, menu bar, toolbars, dock widgets, and text editor. It also includes functions for common editing actions like opening, saving, printing files and performing search/replace.
This document discusses different types of ships based on their usage and support type. It describes merchant ships like general cargo vessels, tankers, bulk carriers, and container ships. It also covers naval and coast guard vessels, recreational vessels, utility tugs, research ships, ferries, and more. The document further categorizes ships based on their support type such as aerostatic, hydrodynamic, hydrostatic, and submarines.
This document discusses the classification and types of ships. It covers merchant ships such as general cargo vessels, tankers, bulk carriers, and container ships. It also discusses naval and coast guard vessels, recreational vessels, utility tugs, research and environmental ships, and ferries. Ships can be classified by their usage and by their support type, including aerostatic, hydrodynamic, hydrostatic, and submarine classifications. Common merchant ship types include general cargo, tankers, bulk carriers, and container ships, which vary in their cargo capacities and handling equipment.
This document discusses numerical integration methods for calculating ship geometrical properties. It introduces the Trapezoidal rule, Simpson's 1st rule, and Simpson's 2nd rule for numerical integration when the ship's shape cannot be represented by a mathematical equation. It then provides examples of applying Simpson's 1st rule to calculate properties like waterplane area, sectional area, submerged volume, and the longitudinal center of floatation (LCF). The document explains the calculation steps and provides generalized Simpson's equations for these examples.
The document discusses the long-term and short-term incentive plans of British Petroleum. The long-term incentive plan requires sustained performance over more than one year and may provide stock options or be based on financial metrics paid out in cash. The short-term incentive plan rewards performance over 12 months or less and includes annual bonuses, profit-sharing, and gain-sharing plans. Both plans aim to incentivize employees and link compensation to the long and short-term success of the company.
This document discusses the cost of capital and how it is calculated. It defines cost of capital as the minimum required rate of return that suppliers of capital require as compensation for time and risk. The cost of capital is used to evaluate investment decisions, design debt policy, and appraise financial performance. It is calculated as a weighted average of the cost of debt and equity. The cost of debt is based on interest rates, while the cost of equity is based on expected returns required by shareholders to compensate for risk. The weighted average cost of capital (WACC) is then determined by weighting the costs of each component by the proportion of debt and equity in the firm's capital structure.
HBase based map reduce job unit testingAshok Agarwal
This document discusses unit testing MapReduce jobs that use HBase as an input source. It provides an example of a MapReduce job that counts first names from an HBase table. It also shows the JUnit test case code using mrunit that tests the MapReduce job by providing sample input and validating the output. The test case code sets up the mapper, configures input and expected output, and runs the test to validate the mapper works as expected.
This document discusses the classification and types of ships. It covers merchant ships, naval and coast guard vessels, recreational vessels, utility tugs, research and environmental ships, and ferries. Merchant ships are classified based on their cargo and include general cargo vessels, tankers, bulk carriers, container ships, and passenger ships. Naval and coast guard vessels tend to be expensive due to performance requirements. Recreational vessels include pleasure craft and cruise liners. Ships are also classified based on their support type, which includes aerostatic, hydrodynamic, hydrostatic, and submarines.
Price Elasticity of Demand, Degrees of Elasticity, Factors determining Elasticity of Demand, Measurement of Price Elasticity, Importance of Elasticity of Demand
Price discrimination exists when different prices are charged for the same product to different buyers. There are three main types of price discrimination: personal, geographical, and according to usage. Price discrimination is possible when there are differences in elasticity of demand, market imperfections, differentiated products, and legal sanctions or monopoly power. Price discrimination can occur through personal, geographical, or according to usage and is classified into three degrees based on how prices are set for individual units or groups.
Accounting Principles, Concepts and Accounting EquationJithin Thomas
This document discusses accounting principles, concepts, and the accounting equation. It outlines key accounting concepts like the separate entity concept and going concern concept. It also describes common accounting conventions like conservatism and full disclosure. Additionally, it provides an example of how the accounting equation balances when assets, liabilities, and capital amounts change as a result of business transactions.
Production Function, Law of Variable Proportions, Return to Scale, Comparison between laws of Returns and Returns to Scale, Economies of Scale of Production.
Amazon Elastic MapReduce (EMR) allows users to run Hadoop MapReduce jobs on the AWS cloud infrastructure. It provides elasticity, ease of use, reliability, integration with other AWS services, and security. EMR is ideal for prototyping and creating repeatable environments without having to configure or deploy clusters manually. MRUnit makes it easy to write and read unit tests for mappers and reducers. Logging in Hadoop uses Log4j. The Java heap size can be configured using a bootstrap action. The DistributedCache can be used to access files from the mapper or reducer. EMR can interact with other AWS services and MongoDB.
Elastic search integration with hadoop leveragebigdataPooja Gupta
Elasticsearch can be integrated with Hadoop and Hive to enable searching structured data stored in these frameworks. Elasticsearch indexes can be populated from Hadoop using MapReduce jobs where Elasticsearch is the output format, or data can be extracted from Elasticsearch to Hadoop. Similarly for Hive, external tables can be defined pointing to Elasticsearch indexes as the data source, or data can be loaded from Hive tables to Elasticsearch indexes. The document provides code examples for performing these types of Extract, Transform, Load operations between Elasticsearch, Hadoop and Hive.
Javascript Continues Integration in Jenkins with AngularJSLadislav Prskavec
The document describes a ToDo application built with AngularJS that uses MongoDB hosted on MongoHQ. It retrieves and saves ToDo items to the MongoDB database via a PHP proxy. It also discusses testing the application using tools like PhantomJS, Jasmine, JSCoverage, JSDoc, and continuous integration with Jenkins.
BATTLESTAR GALACTICA : Saison 5 - Les Cylons passent dans le cloud avec Vert....La Cuisine du Web
1. The document describes a microservices application with BaseStar and Raider microservices.
2. The BaseStar service registers itself with the Redis service discovery and defines REST API routes including one to return a list of registered Raider services.
3. When a Raider service registers with the BaseStar via a POST call, BaseStar looks it up, starts a worker task to communicate with it, and returns a success message.
Groovy is a dynamic language for the Java Virtual Machine that simplifies programming through features like closures, properties, and built-in support for lists, maps, ranges, and regular expressions. The latest version 1.5 adds support for Java 5 features like annotations and generics to leverage frameworks that use them. Groovy can be integrated into applications through mechanisms like JSR-223, Spring, and Groovy's own GroovyClassLoader to externalize business rules, provide extension points, and customize applications.
Node.js is a JavaScript runtime built on Chrome's V8 engine. It allows JavaScript to run on the server-side and is used for building network applications. Some key points about Node.js include:
- It uses an event-driven, non-blocking I/O model that makes it lightweight and efficient.
- Node package manager (npm) allows installation of external packages and libraries.
- Modules are used to organize code into reusable pieces and can be local or installed via npm.
- Testing frameworks like Mocha allow writing unit tests for modules and APIs.
This document provides an overview of the Play! web framework for Java, including how it differs from traditional Java web development approaches by avoiding servlets, portlets, XML, EJBs, JSPs, and other technologies. It demonstrates creating a simple PDF generation application using Play!, including defining a model, controller, and view. The framework uses conventions over configuration and allows rapid development through features like automatic reloading of code changes and helpful error pages.
Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. It is written in Java and uses a pluggable backend. Presto is fast due to code generation and runtime compilation techniques. It provides a library and framework for building distributed services and fast Java collections. Plugins allow Presto to connect to different data sources like Hive, Cassandra, MongoDB and more.
The document introduces dynamic languages and provides examples comparing Java and Groovy implementations of a filtering task. It discusses the benefits of Groovy, including its Java-like syntax, dynamic typing, built-in support for lists/maps/arrays, closures, and additional libraries that simplify APIs. Groovy aims to integrate well with Java while adding meta-programming capabilities. The document provides examples of common uses of Groovy and its features.
This document summarizes best practices for JavaScript unit testing presented by Lars Thorup. It discusses how to test asynchronous code using callbacks and promises, fake timers to test delayed behavior, mock the DOM and AJAX requests, enable cross-browser testing using Karma, and detect memory leaks between tests. Lars Thorup is a software developer who focuses on JavaScript, TDD, and continuous integration.
Presto generates Java bytecode at runtime to optimize query execution. Key query operations like filtering, projections, joins and aggregations are compiled into efficient Java methods using libraries like ASM and Fastutil. This bytecode generation improves performance by 30% through techniques like compiling row hashing for join lookups directly into machine instructions.
Quick and Easy Development with Node.js and Couchbase ServerNic Raboy
Build an API driven Node.js application that uses Couchbase for its NoSQL database and AngularJS for its front-end. Presented by Nic Raboy, Developer Advocate at Couchbase.
MRUnit is a testing library that makes it easier to test Hadoop jobs. It allows programmatically specifying test input and output, reducing the need for external test files. Tests can focus on individual map and reduce functions. MRUnit abstracts away much of the boilerplate test setup code, though it has some limitations like a lack of distributed testing. Overall though, the benefits of using MRUnit to test Hadoop jobs outweigh the problems.
- Scripting languages like PHP, Python, and Ruby are becoming increasingly popular for web application development and administrative tasks due to their simplicity.
- Java is embracing dynamic scripting languages through standards like JSR 223 which allows scripts like JavaScript, Groovy, and BeanShell to be integrated with Java applications and the Java platform.
- Groovy is a popular Java-based scripting language that can be used to simplify and accelerate enterprise development by reducing code length and improving productivity.
The document provides an introduction to JUnit testing in Java. It discusses how to set up a JUnit test with the AEM testing framework using the AemContextExtension. Key aspects covered include adding Sling models to the test context, loading mock JSON resources, and adapting requests to test Sling models. The anatomy of a JUnit test is explained with examples of setting up mocks, verifying expectations, and asserting results. Mocking and the Mockito framework are also introduced for simulating dependencies in tests.
The document discusses the vision for a proposed UNO-based ODF Toolkit API that would allow creation and manipulation of OpenDocument Format (ODF) documents independently of OpenOffice.org. It describes the existing UNO API and its limitations. The proposed new API would provide a pure ODF document model through Java interfaces, allowing standalone creation and editing of ODF files without needing the full OpenOffice installation. Example code snippets are provided to illustrate how the API could work for common tasks like document creation, content editing, and spreadsheet manipulation. However, it is noted that this proposed API has not yet been realized.
Similar to Testing multi outputformat based mapreduce (20)
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Transform Your Communication with Cloud-Based IVR Solutions
Testing multi outputformat based mapreduce
1. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal
Ashok Agarwal
Testing MultiOutputFormat based MapReduce
≈ LEAVE A COMMENT
[]
Tags
11 Thursday Sep 2014
POSTED BY ASHOK AGARWAL IN BIG DATA
Big Data, Hadoop, MapReduce
In one of our projects, we were require to generate per client file as output of MapReduce Job, so
that the corresponding client can see their data and analyze it.
Consider you get daily stock prices files.
For 9/8/2014: 9_8_2014.csv
1234
9/8/14,MSFT,47
9/8/14,ORCL,40
9/8/14,GOOG,577
9/8/14,AAPL,100.4
For 9/9/2014: 9_9_2014.csv
1234
9/9/14,MSFT,46
9/9/14,ORCL,41
9/9/14,GOOG,578
9/9/14,AAPL,101
So on…
123456789
10
9/10/14,MSFT,48
9/10/14,ORCL,39.5
9/10/14,GOOG,577
9/10/14,AAPL,100
9/11/14,MSFT,47.5
9/11/14,ORCL,41
9/11/14,GOOG,588
9/11/14,AAPL,99.8
9/12/14,MSFT,46.69
9/12/14,ORCL,40.5
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 1/7
2. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal
11
12
9/12/14,GOOG,576
9/12/14,AAPL,102.5
We want to analyze the each stock weekly trend. In order to that we need to create each stock
based data.
The below mapper code splits the read records from csv using TextInputFormat. The output
mapper key is stock and value is price.
123456789
10
11
12
13
package com.jbksoft;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class MyMultiOutputMapper extends Mapper<LongWritable, Text, Text, public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] tokens = line.split(",");
context.write(new Text(tokens[1]), new Text(tokens[2]));
}
}
The below reducer code creates file for each stock.
123456789
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package com.jbksoft;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import java.io.IOException;
public class MyMultiOutputReducer extends Reducer<Text, Text, NullWritable, MultipleOutputs<NullWritable, Text> mos;
public void setup(Context context) {
mos = new MultipleOutputs(context);
}
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
for (Text value : values) {
mos.write(NullWritable.get(), value, key.toString());
}
}
protected void cleanup(Context context)
throws IOException, InterruptedException {
mos.close();
}
}
The driver for the code:
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 2/7
3. 12/10/2014 Testing MultiOutputFormat based MapReduce | Ashok Agarwal
123456789
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package com.jbksoft;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
public class MyMultiOutputTest {
public static void main(String[] args) throws IOException, InterruptedException, Path inputDir = new Path(args[0]);
Path outputDir = new Path(args[1]);
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(MyMultiOutputTest.class);
job.setJobName("My MultipleOutputs Demo");
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setMapperClass(MyMultiOutputMapper.class);
job.setReducerClass(MyMultiOutputReducer.class);
FileInputFormat.setInputPaths(job, inputDir);
FileOutputFormat.setOutputPath(job, outputDir);
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
job.waitForCompletion(true);
}
}
The command for executing above code(compiled and packaged as jar):
123456789
aagarwal‐mbpro:~ ashok.agarwal$ hadoop jar test.jar com.jbksoft.MyMultiOutputTest aagarwal‐mbpro:~ ashok.agarwal$ ls ‐l /Users/ashok.agarwal/dev/HBaseDemo/output
total 32
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 25 Sep 11 11:32 AAPL‐r‐00000
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 GOOG‐r‐00000
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 20 Sep 11 11:32 MSFT‐r‐00000
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 19 Sep 11 11:32 ORCL‐r‐00000
‐rwxr‐xr‐x 1 ashok.agarwal 1816361533 0 Sep 11 11:32 _SUCCESS
aagarwal‐mbpro:~ ashok.agarwal$
The test case for the above code can be created using MRunit.
The reducer needs to be mocked over here as below:
https://erashokagarwal.wordpress.com/2014/09/11/testing-multioutputformat-based-mapreduce/ 3/7