The document details data preprocessing and feature engineering steps for a machine learning model to predict West Nile virus presence. It reads in training, test, weather and spraying data, cleans variables, derives new features like distance to locations and week numbers, and splits weather data by station. New weather features like accumulated degree days are created. Moving averages and sums are also calculated for temperature, precipitation, and degree days over 1 and 2 week periods.
The Ring programming language version 1.10 book - Part 80 of 212Mahmoud Samir Fayed
The document describes a Ring code sample for a notepad application. It includes definitions for window elements like buttons, menus, toolbars and dock widgets. Functions are defined to handle events like opening/saving files, searching/replacing text, and changing the active file in the text editor. The application window and user interface elements are initialized and the application is executed.
This document provides an overview of basic usage of the Apache Spark framework for data analysis. It describes what Spark is, how to install it, and how to use it from Scala, Python, and R. It also explains the key concepts of RDDs (Resilient Distributed Datasets), transformations, and actions. Transformations like filter, map, join, and reduce return new RDDs, while actions like collect, count, and first return results to the driver program. The document provides examples of common transformations and actions in Spark.
This document discusses six Python packages that are useful to know:
1. First - A utility for selecting the first successful result from a sequence of functions.
2. Parse - A library for parsing Python format strings and extracting values.
3. Filecmp - A module for comparing files and directories.
4. Bitrot - A tool for detecting silent data corruption in files.
5. Docopt - A tool for generating command-line interfaces from a docstring.
6. Six - A library for writing code that is compatible with both Python 2 and Python 3.
The Ring programming language version 1.7 book - Part 16 of 196Mahmoud Samir Fayed
This document summarizes the new features and changes in Ring 1.2, including new functions like PtrCmp() and PrevFileName(), improved functions like find() supporting C pointers, an improved Ring Notepad that saves line numbers, better RingQt classes with event handling, a new Objects library for RingQt, an enhanced RingLibCurl library, improved call command support, using NULL instead of NULLPointer(), a new display warnings option, and general quality improvements.
The Ring programming language version 1.7 book - Part 73 of 196Mahmoud Samir Fayed
This document describes the code for a basic notepad application created using the Ring programming language and Qt GUI library. It defines functions for opening, saving, and creating new files. It also implements search/replace, font selection, and color settings. The main window contains dockable panels for files, source code, and a web browser. The application loads previous settings and allows opening, editing, and saving text files.
The Ring programming language version 1.6 book - Part 15 of 189Mahmoud Samir Fayed
The document summarizes the new features and changes in Ring 1.2, including:
1. New functions like PtrCmp(), PrevFileName(), and functions to retrieve lists of functions, classes, packages, etc.
2. Improved existing functions like find() and type() to support C pointers.
3. Enhancements to Ring Notepad and RingQt like line number restoration and event handling.
4. The introduction of an Objects library for RingQt and better support for the MVC pattern.
5. The RingLibCurl library providing an API for the libcurl functions.
6. Updates to call handling and allowing NULL instead of NULLPointer().
The Ring programming language version 1.8 book - Part 75 of 202Mahmoud Samir Fayed
This document describes the code for a basic notepad application created using the Ring programming language. It defines functions for opening, saving, and editing text files. The application features a menu bar, toolbars, dockable panels for a file tree and text editor, and basic text editing functionality like font selection, find/replace, and print.
The Ring programming language version 1.10 book - Part 80 of 212Mahmoud Samir Fayed
The document describes a Ring code sample for a notepad application. It includes definitions for window elements like buttons, menus, toolbars and dock widgets. Functions are defined to handle events like opening/saving files, searching/replacing text, and changing the active file in the text editor. The application window and user interface elements are initialized and the application is executed.
This document provides an overview of basic usage of the Apache Spark framework for data analysis. It describes what Spark is, how to install it, and how to use it from Scala, Python, and R. It also explains the key concepts of RDDs (Resilient Distributed Datasets), transformations, and actions. Transformations like filter, map, join, and reduce return new RDDs, while actions like collect, count, and first return results to the driver program. The document provides examples of common transformations and actions in Spark.
This document discusses six Python packages that are useful to know:
1. First - A utility for selecting the first successful result from a sequence of functions.
2. Parse - A library for parsing Python format strings and extracting values.
3. Filecmp - A module for comparing files and directories.
4. Bitrot - A tool for detecting silent data corruption in files.
5. Docopt - A tool for generating command-line interfaces from a docstring.
6. Six - A library for writing code that is compatible with both Python 2 and Python 3.
The Ring programming language version 1.7 book - Part 16 of 196Mahmoud Samir Fayed
This document summarizes the new features and changes in Ring 1.2, including new functions like PtrCmp() and PrevFileName(), improved functions like find() supporting C pointers, an improved Ring Notepad that saves line numbers, better RingQt classes with event handling, a new Objects library for RingQt, an enhanced RingLibCurl library, improved call command support, using NULL instead of NULLPointer(), a new display warnings option, and general quality improvements.
The Ring programming language version 1.7 book - Part 73 of 196Mahmoud Samir Fayed
This document describes the code for a basic notepad application created using the Ring programming language and Qt GUI library. It defines functions for opening, saving, and creating new files. It also implements search/replace, font selection, and color settings. The main window contains dockable panels for files, source code, and a web browser. The application loads previous settings and allows opening, editing, and saving text files.
The Ring programming language version 1.6 book - Part 15 of 189Mahmoud Samir Fayed
The document summarizes the new features and changes in Ring 1.2, including:
1. New functions like PtrCmp(), PrevFileName(), and functions to retrieve lists of functions, classes, packages, etc.
2. Improved existing functions like find() and type() to support C pointers.
3. Enhancements to Ring Notepad and RingQt like line number restoration and event handling.
4. The introduction of an Objects library for RingQt and better support for the MVC pattern.
5. The RingLibCurl library providing an API for the libcurl functions.
6. Updates to call handling and allowing NULL instead of NULLPointer().
The Ring programming language version 1.8 book - Part 75 of 202Mahmoud Samir Fayed
This document describes the code for a basic notepad application created using the Ring programming language. It defines functions for opening, saving, and editing text files. The application features a menu bar, toolbars, dockable panels for a file tree and text editor, and basic text editing functionality like font selection, find/replace, and print.
Evidence is a new, framework-agnostic unit testing library which I developed out of necessity and frustration with the existing offering. Although it's heavily inspired by it's Ruby, Python and Java couterparts, Evidence is packed with niceness targeted at the specificities of the JavaScript language and its different environments. Hopefully this introduction to Evidence will give you the motivation, tools and knowledge to start unit testing your JavaScript code if you are not doing so already.
Many people ask about how to develop a functional mindset. It’s difficult if you’ve learned another paradigm and don’t know where to start. Functional thinking is a set of habits that you can train that will serve you well while programming in any language.
The Ring programming language version 1.3 book - Part 63 of 88Mahmoud Samir Fayed
This document provides documentation for Ring's Qt integration and includes code for initializing the Qt framework in Ring and defining classes and methods for various Qt widgets and objects. It includes code for including necessary Qt header files, initializing QApplication, and defining Ring classes, methods, and functions that wrap common Qt classes and functionality like QObject, QWidget, qApp and others.
This document contains the code for a Java application with a graphical user interface (GUI) that calculates employee payroll. It defines classes and methods to:
1. Create panels to input employee data like name, ID, salary and select deduction options.
2. Calculate deductions for judicial, loans, alimony based on selected checkboxes and salary amount.
3. Calculate total deductions, net salary and update text fields on button click.
4. Clear all fields and reset selections on "New" button click and exit application on "Exit" button click.
The Ring programming language version 1.7 book - Part 12 of 196Mahmoud Samir Fayed
The Ring documentation release notes summarize new features and improvements in Ring version 1.7. Key updates include better documentation generation for extensions, new Ring VM tracing functions for debugging, and more syntax flexibility options when defining packages, classes and functions. The release also introduces a type hints library to support static analysis and improved editor features.
Knowledge is Power: Getting out of trouble by understanding Git - Steve Smith...Codemotion
Git is rapidly taking over the development workplace. One of the downsides of high-level tools is that they can hide the details of what is happening under the hood; when things go wrong it can be hard to understand why git behaves the way it does. But at its core Git consists of a few simple concepts that, when understood, make it a much more intuitive tool. This talk introduces these core Git concepts and uses them to clarify some examples of seemingly counterintuitive behaviour. It also introduces some of Git's less-known features and tricks that are useful to have in your arsenal.
An update to what has been going on with CFEngine Between January 2017 and February 2018.
Slide Source: https://github.com/nickanderson/State-of-the-CFEngine/tree/cfgmgmt-ghent-2018
The document provides an overview of Groovy and Java code examples for performing common tasks like printing "Hello World", reading files, making web requests, using strings, importing packages, and using Swing/SwingBuilder for GUIs. It also shows examples of using Groovy with Java libraries for Excel files, Ant, and JSON. Additional sections cover parallel processing with GPars, contract programming with GContracts, method chaining, Grails basics, and Gaelyk controllers and views.
The document describes a Java class called frmregistroventa that contains a GUI for registering employee data. It initializes components like text fields, radio buttons, and buttons for actions like adding, showing, searching, and modifying employee records stored in a ListaRegistro list. The class contains methods that will be called when the different buttons are clicked to perform the corresponding actions on the employee data.
This document contains AutoLISP functions for analyzing and extracting data from 2D and 3D objects in AutoCAD. It includes functions to calculate distances between 2D points, extract coordinate data and write it to a file, analyze spline curves on polylines and write curve data like radii, angles and lengths to a file. The functions prompt the user for input like selecting objects or points and allow controlling snapping options.
Message-based communication patterns in distributed Akka applicationsAndrii Lashchenko
The document discusses various message-based communication patterns in Akka distributed applications, including tell, ask, pipeTo, and composing futures. It provides code examples of actor implementations demonstrating these patterns and how to handle responses, failures, timeouts, and combining multiple futures. The tell pattern is fire-and-forget messaging. The ask pattern uses a future to represent a possible response. PipeTo pipes a future to the original sender. Examples show how to handle successful, failed, and delayed futures through composing and combining them.
The document contains code for a C# program that merges employee wage and tip data from multiple text files into a single output file. It reads configuration settings from an XML file to determine the input and output file paths. It then reads the input files, merges the data based on employee identifiers, handles null values, and writes the results to the output file. It also writes log information to another output file. The key steps are: 1) reading the configuration, 2) reading and parsing the input files, 3) merging and formatting the data, and 4) writing the merged data and log to output files.
Dmxedit is a command line tool for manipulating DMX model files in Source games. It focuses on flex animation and allows editing wrinkle maps, delta states, and vertex positions through Lua scripts. Functions include loading/saving DMX files, adding/removing delta states, translating/rotating vertices, and more.
Benchy, python framework for performance benchmarking of Python ScriptsMarcel Caraciolo
Benchy is a lightweight Python framework for performing benchmarks on code. It allows generating performance and memory usage graphs to compare different code implementations. Benchmarks can be written as objects and executed via a BenchmarkRunner to obtain results. Results are stored in a SQLite database and full reports can be generated in reStructuredText format. The framework aims to provide an easy way to integrate benchmarks into the development workflow.
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
This document summarizes Samuel Lampa's 2010 degree project on integrating SWI-Prolog for semantic reasoning in Bioclipse. It compares SWI-Prolog to other semantic tools like Jena and Pellet in terms of speed and expressiveness when querying biochemical data. Prolog code is presented for querying NMR spectrum data that finds molecules with peak values near a search value. SPARQL queries for the same use case are also shown. Observations indicate Prolog is fastest while SPARQL is easier to understand but Prolog allows easier parameter changes and logic reuse. A final presentation was planned for April 28, 2010.
(Presented by Antonio Piccolboni to Strata 2012 Conference, Feb 29 2012).
Rhadoop is an open source project spearheaded by Revolution Analytics to grant data scientists access to Hadoop’s scalability from their favorite language, R. RHadoop is comprised of three packages.
- rhdfs provides file level manipulation for HDFS, the Hadoop file system
- rhbase provides access to HBASE, the hadoop database
- rmr allows to write mapreduce programs in R
rmr allows R developers to program in the mapreduce framework, and to all developers provides an alternative way to implement mapreduce programs that strikes a delicate compromise betwen power and usability. It allows to write general mapreduce programs, offering the full power and ecosystem of an existing, established programming language. It doesn’t force you to replace the R interpreter with a special run-time—it is just a library. You can write logistic regression in half a page and even understand it. It feels and behaves almost like the usual R iteration and aggregation primitives. It is comprised of a handful of functions with a modest number of arguments and sensible defaults that combine in many useful ways. But there is no way to prove that an API works: one can only show examples of what it allows to do and we will do that covering a few from machine learning and statistics. Finally, we will discuss how to get involved.
Mobl is a programming language for building mobile web applications. It aims to provide portability across different mobile platforms and browsers by compiling to JavaScript and HTML5. Mobl supports common mobile features like location services, camera, contacts and more through a simple object-oriented syntax. It also includes tools for building user interfaces, accessing data through entities and queries, and making web service requests. The goal is to enable complete coverage of mobile development needs while avoiding platform-specific code.
This document defines options and sets up a simulation to test carrier sense in NS-2. It defines wireless channel, radio propagation, and MAC layer options. It creates 4 nodes with an 802.11 MAC and positions two nodes to have a conversation and the other two nodes some distance away to have another conversation. It generates CBR traffic between the node pairs and runs the simulation for 10 seconds.
R is an open source statistical computing platform that is rapidly growing in popularity within academia. It allows for statistical analysis and data visualization. The document provides an introduction to basic R functions and syntax for assigning values, working with data frames, filtering data, plotting, and connecting to databases. More advanced techniques demonstrated include decision trees, random forests, and other data mining algorithms.
Evidence is a new, framework-agnostic unit testing library which I developed out of necessity and frustration with the existing offering. Although it's heavily inspired by it's Ruby, Python and Java couterparts, Evidence is packed with niceness targeted at the specificities of the JavaScript language and its different environments. Hopefully this introduction to Evidence will give you the motivation, tools and knowledge to start unit testing your JavaScript code if you are not doing so already.
Many people ask about how to develop a functional mindset. It’s difficult if you’ve learned another paradigm and don’t know where to start. Functional thinking is a set of habits that you can train that will serve you well while programming in any language.
The Ring programming language version 1.3 book - Part 63 of 88Mahmoud Samir Fayed
This document provides documentation for Ring's Qt integration and includes code for initializing the Qt framework in Ring and defining classes and methods for various Qt widgets and objects. It includes code for including necessary Qt header files, initializing QApplication, and defining Ring classes, methods, and functions that wrap common Qt classes and functionality like QObject, QWidget, qApp and others.
This document contains the code for a Java application with a graphical user interface (GUI) that calculates employee payroll. It defines classes and methods to:
1. Create panels to input employee data like name, ID, salary and select deduction options.
2. Calculate deductions for judicial, loans, alimony based on selected checkboxes and salary amount.
3. Calculate total deductions, net salary and update text fields on button click.
4. Clear all fields and reset selections on "New" button click and exit application on "Exit" button click.
The Ring programming language version 1.7 book - Part 12 of 196Mahmoud Samir Fayed
The Ring documentation release notes summarize new features and improvements in Ring version 1.7. Key updates include better documentation generation for extensions, new Ring VM tracing functions for debugging, and more syntax flexibility options when defining packages, classes and functions. The release also introduces a type hints library to support static analysis and improved editor features.
Knowledge is Power: Getting out of trouble by understanding Git - Steve Smith...Codemotion
Git is rapidly taking over the development workplace. One of the downsides of high-level tools is that they can hide the details of what is happening under the hood; when things go wrong it can be hard to understand why git behaves the way it does. But at its core Git consists of a few simple concepts that, when understood, make it a much more intuitive tool. This talk introduces these core Git concepts and uses them to clarify some examples of seemingly counterintuitive behaviour. It also introduces some of Git's less-known features and tricks that are useful to have in your arsenal.
An update to what has been going on with CFEngine Between January 2017 and February 2018.
Slide Source: https://github.com/nickanderson/State-of-the-CFEngine/tree/cfgmgmt-ghent-2018
The document provides an overview of Groovy and Java code examples for performing common tasks like printing "Hello World", reading files, making web requests, using strings, importing packages, and using Swing/SwingBuilder for GUIs. It also shows examples of using Groovy with Java libraries for Excel files, Ant, and JSON. Additional sections cover parallel processing with GPars, contract programming with GContracts, method chaining, Grails basics, and Gaelyk controllers and views.
The document describes a Java class called frmregistroventa that contains a GUI for registering employee data. It initializes components like text fields, radio buttons, and buttons for actions like adding, showing, searching, and modifying employee records stored in a ListaRegistro list. The class contains methods that will be called when the different buttons are clicked to perform the corresponding actions on the employee data.
This document contains AutoLISP functions for analyzing and extracting data from 2D and 3D objects in AutoCAD. It includes functions to calculate distances between 2D points, extract coordinate data and write it to a file, analyze spline curves on polylines and write curve data like radii, angles and lengths to a file. The functions prompt the user for input like selecting objects or points and allow controlling snapping options.
Message-based communication patterns in distributed Akka applicationsAndrii Lashchenko
The document discusses various message-based communication patterns in Akka distributed applications, including tell, ask, pipeTo, and composing futures. It provides code examples of actor implementations demonstrating these patterns and how to handle responses, failures, timeouts, and combining multiple futures. The tell pattern is fire-and-forget messaging. The ask pattern uses a future to represent a possible response. PipeTo pipes a future to the original sender. Examples show how to handle successful, failed, and delayed futures through composing and combining them.
The document contains code for a C# program that merges employee wage and tip data from multiple text files into a single output file. It reads configuration settings from an XML file to determine the input and output file paths. It then reads the input files, merges the data based on employee identifiers, handles null values, and writes the results to the output file. It also writes log information to another output file. The key steps are: 1) reading the configuration, 2) reading and parsing the input files, 3) merging and formatting the data, and 4) writing the merged data and log to output files.
Dmxedit is a command line tool for manipulating DMX model files in Source games. It focuses on flex animation and allows editing wrinkle maps, delta states, and vertex positions through Lua scripts. Functions include loading/saving DMX files, adding/removing delta states, translating/rotating vertices, and more.
Benchy, python framework for performance benchmarking of Python ScriptsMarcel Caraciolo
Benchy is a lightweight Python framework for performing benchmarks on code. It allows generating performance and memory usage graphs to compare different code implementations. Benchmarks can be written as objects and executed via a BenchmarkRunner to obtain results. Results are stored in a SQLite database and full reports can be generated in reStructuredText format. The framework aims to provide an easy way to integrate benchmarks into the development workflow.
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
This document summarizes Samuel Lampa's 2010 degree project on integrating SWI-Prolog for semantic reasoning in Bioclipse. It compares SWI-Prolog to other semantic tools like Jena and Pellet in terms of speed and expressiveness when querying biochemical data. Prolog code is presented for querying NMR spectrum data that finds molecules with peak values near a search value. SPARQL queries for the same use case are also shown. Observations indicate Prolog is fastest while SPARQL is easier to understand but Prolog allows easier parameter changes and logic reuse. A final presentation was planned for April 28, 2010.
(Presented by Antonio Piccolboni to Strata 2012 Conference, Feb 29 2012).
Rhadoop is an open source project spearheaded by Revolution Analytics to grant data scientists access to Hadoop’s scalability from their favorite language, R. RHadoop is comprised of three packages.
- rhdfs provides file level manipulation for HDFS, the Hadoop file system
- rhbase provides access to HBASE, the hadoop database
- rmr allows to write mapreduce programs in R
rmr allows R developers to program in the mapreduce framework, and to all developers provides an alternative way to implement mapreduce programs that strikes a delicate compromise betwen power and usability. It allows to write general mapreduce programs, offering the full power and ecosystem of an existing, established programming language. It doesn’t force you to replace the R interpreter with a special run-time—it is just a library. You can write logistic regression in half a page and even understand it. It feels and behaves almost like the usual R iteration and aggregation primitives. It is comprised of a handful of functions with a modest number of arguments and sensible defaults that combine in many useful ways. But there is no way to prove that an API works: one can only show examples of what it allows to do and we will do that covering a few from machine learning and statistics. Finally, we will discuss how to get involved.
Mobl is a programming language for building mobile web applications. It aims to provide portability across different mobile platforms and browsers by compiling to JavaScript and HTML5. Mobl supports common mobile features like location services, camera, contacts and more through a simple object-oriented syntax. It also includes tools for building user interfaces, accessing data through entities and queries, and making web service requests. The goal is to enable complete coverage of mobile development needs while avoiding platform-specific code.
This document defines options and sets up a simulation to test carrier sense in NS-2. It defines wireless channel, radio propagation, and MAC layer options. It creates 4 nodes with an 802.11 MAC and positions two nodes to have a conversation and the other two nodes some distance away to have another conversation. It generates CBR traffic between the node pairs and runs the simulation for 10 seconds.
R is an open source statistical computing platform that is rapidly growing in popularity within academia. It allows for statistical analysis and data visualization. The document provides an introduction to basic R functions and syntax for assigning values, working with data frames, filtering data, plotting, and connecting to databases. More advanced techniques demonstrated include decision trees, random forests, and other data mining algorithms.
This document describes how to use R packages weatherData and cropData to analyze weather data and crop trial data. It shows how to get weather station data, interpolate weather variables to trial locations, derive ecophysiological variables like thermal stress, and use redundancy analysis (RDA) to relate yield residuals to environmental factors while accounting for variety and location effects. The goal is to link weather and trial data to understand genotype-by-environment interactions.
This document provides an overview of phylogenetic analysis tools and techniques available in R. It discusses how to get sequence data from GenBank, align sequences, perform phylogenetic inference using various methods like neighbor joining and maximum likelihood, visualize and analyze trees, model trait evolution, reconstruct ancestral states, simulate trees, and access phylogenetic data from online repositories. Examples are given for many of the tasks using popular R packages like ape, phangorn, picante, and phytools.
This document discusses analyzing Twitter data from the user @a_bicky using R. It extracts over 3,200 tweets from the user's timeline using the twitteR package. The tweets are transformed into a data frame with variables like text, date, and source. The data is then summarized using the reshape2 and ggplot2 packages to calculate metrics like average text length by day of week, month, and source. Frequency tables and heat maps are generated to explore patterns in the Twitter data over time.
Presentation by Jacob van Etten.
CCAFS workshop titled "Using Climate Scenarios and Analogues for Designing Adaptation Strategies in Agriculture," 19-23 September in Kathmandu, Nepal.
This document loads various libraries and reads in multiple csv files containing transportation data. It then performs some data cleaning and preprocessing steps. Various outputs are defined to render tables and plots of subsets of the data. Plots are created to visualize relationships between weighted time, cost, and safety metrics. Interactive elements are added to output text describing user input from the plots. Maps and motion charts are also defined as outputs to visualize additional data aspects.
Cloud Native Night, December 2020, talk by Jörg Viechtbauer (Senior Software Architect, QAware)
== Please download slides if blurred! ==
Abstract:
Neural networks like BERT have revolutionized the processing of natural language and achieve state-of-the-art performance in many NLP tasks. One of them is semantic search where documents are found by query intent and not only by exact match.
This talk takes us through the history of information retrieval and shows how keyword search has evolved into the term vector model. The desire for a better search led to the development of the first semantic models like SLI or PLSA. We will see how this culminates today in the use of sophisticated deep neural networks that perform nonlinear dimensional reductions and master long-range dependencies.
Semantic search has never been as good and easy to implement as it is today.
About Jörg:
Jörg is a search expert at QAware and uses neural networks for semantic search and text comprehension. He has spent almost 20 years developing search engines based on both proprietary and open source software for enterprise search, eDiscovery and local search - always hunting for the perfect ranking formula.
This document provides an overview of common string, data structure, file, operating system, security, XML, SQL, and web service operations in PowerShell. It discusses how to work with strings, arrays, dictionaries, hashtables, files, environment variables, events, services, WMI, encryption, XML processing, SQL queries and transactions, sending emails, downloading files from URLs, and using proxies. The document is a helpful reference for many PowerShell tasks.
Dask is a task scheduler that seamlessly parallelizes Python functions across threads, processes, or cluster nodes. It also offers a DataFrame class (similar to Pandas) that can handle data sets larger than the available memory.
This is an quick introduction to Scalding and Monoids. Scalding is a Scala library that makes writing MapReduce jobs very easy. Monoids on the other hand promise parallelism and quality and they make some more challenging algorithms look very easy.
The talk was held at the Helsinki Data Science meetup on January 9th 2014.
Meet Up - Spark Stream Processing + KafkaKnoldus Inc.
This document provides an overview of Spark Streaming concepts including:
- Streams are sequences of data elements made available over time that can be accessed sequentially
- Stream processing involves continuously and concurrently processing live data streams in micro-batches
- Spark Streaming provides scalable and fault-tolerant stream processing using a micro-batch architecture where streams are divided into batches that are processed through transformations on resilient distributed datasets (RDDs)
- Transformations on DStreams apply operations like map, filter, reduce to the underlying RDDs of each batch
This document summarizes an introduction presentation about the Apache Velocity templating engine given at ApacheCon 2009. It discusses how Velocity uses a simple templating language to replace active elements in text templates with values from a data model. It provides examples of how templates specify elements to insert, loops to iterate over, and other control structures. The presentation also compares Velocity to other templating engines and languages like Java Server Pages.
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
The SQL tab in the Spark UI provides a lot of information for analysing your spark queries, ranging from the query plan, to all associated statistics. However, many new Spark practitioners get overwhelmed by the information presented, and have trouble using it to their benefit. In this talk we want to give a gentle introduction to how to read this SQL tab. We will first go over all the common spark operations, such as scans, projects, filter, aggregations and joins; and how they relate to the Spark code written. In the second part of the talk we will show how to read the associated statistics to pinpoint performance bottlenecks.
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
Defining customized scalable aggregation logic is one of Apache Spark’s most powerful features. User Defined Aggregate Functions (UDAF) are a flexible mechanism for extending both Spark data frames and Structured Streaming with new functionality ranging from specialized summary techniques to building blocks for exploratory data analysis.
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
This document summarizes a user's journey developing a custom aggregation function for Apache Spark using a T-Digest sketch. The user initially implemented it as a User Defined Aggregate Function (UDAF) but ran into performance issues due to excessive serialization/deserialization. They then worked to resolve it by implementing the function as a custom Aggregator using Spark 3.0's new aggregation APIs, which avoided unnecessary serialization and provided a 70x performance improvement. The story highlights the importance of understanding how custom functions interact with Spark's execution model and optimization techniques like avoiding excessive serialization.
The document contains code examples demonstrating various Scala programming concepts such as functions, pattern matching, traits, actors and more. It also includes links to online resources for learning Scala.
The web is evolving, we got it. One of the clear consequences is the complexity of our web apps (formerly known as ‘websites’). The conciseness of functional programming and its fundamentals got our attention, but we knew we could do better. And now we have the Reactive programming model, a functional and declarative way of dealing with big amounts of data.
In the center of it we have Observables: objects responsible to keep your application alive, reacting to any mutation your data may have, through any period of time. We’ll take a look on the concepts and also on the lib that implements it in Angular’s core: RxJS. Using the provided operators, we have great power on our hands, doing anything imaginable in a concise, declarative and easy-to-maintain way.
Watch out: observables are here to stay!