The document is an introduction to functional programming concepts using Scala presented by Sujith Sudhakaran. The presentation covers imperative programming styles and their limitations in dealing with concurrency and parallelism. It then introduces functional programming concepts like immutability and pure functions. The document discusses Scala's features as a unifier of object-oriented and functional paradigms and covers key Scala concepts like higher-order functions, traits, pattern matching and popular Scala frameworks.
Validating big data jobs - Spark AI Summit EUHolden Karau
As big data jobs move from the proof-of-concept phase into powering real production services, we have to start consider what will happen when everything eventually goes wrong (such as recommending inappropriate products or other decisions taken on bad data). This talk will attempt to convince you that we will all eventually get aboard the failboat (especially with ~40% of respondents automatically deploying their Spark jobs results to production), and its important to automatically recognize when things have gone wrong so we can stop deployment before we have to update our resumes.
Figuring out when things have gone terribly wrong is trickier than it first appears, since we want to catch the errors before our users notice them (or failing that before CNN notices them). We will explore general techniques for validation, look at responses from people validating big data jobs in production environments, and libraries that can assist us in writing relative validation rules based on historical data.
For folks working in streaming, we will talk about the unique challenges of attempting to validate in a real-time system, and what we can do besides keeping an up-to-date resume on file for when things go wrong. To keep the talk interesting real-world examples (with company names removed) will be presented, as well as several creative-common licensed cat pictures and an adorable panda GIF.
If you’ve seen Holden’s previous testing Spark talks this can be viewed as a deep dive on the second half focused around what else we need to do besides good testing practices to create production quality pipelines. If you haven’t seen the testing talks watch those on YouTube after you come see this one
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...Databricks
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. This talk introduces Spark’s ML pipelines, and then looks at how to extend them with your own custom algorithms. By integrating your own data preparation and machine learning tools into Spark’s ML pipelines, you will be able to take advantage of useful meta-algorithms, like parameter searching and pipeline persistence (with a bit more work, of course).
Even if you don’t have your own machine learning algorithms that you want to implement, this session will give you an inside look at how the ML APIs are built. It will also help you make even more awesome ML pipelines and customize Spark models for your needs. And if you don’t want to extend Spark ML pipelines with custom algorithms, you’ll still benefit by developing a stronger background for future Spark ML projects.
The examples in this talk will be presented in Scala, but any non-standard syntax will be explained.
Extending spark ML for custom models now with python!Holden Karau
Are you interested in adding your own custom algorithms to Spark ML? This is the talk for you! See the companion examples in the High Performance Spark, and Sparkling ML project.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
This document discusses three approaches to managing namespaces when creating multiple XML schemas for a project:
1. Heterogeneous Namespace Design - Each schema is given a unique target namespace.
2. Homogeneous Namespace Design - All schemas share the same target namespace.
3. Chameleon Namespace Design - The main schema has a target namespace, while supporting schemas have no target namespace and inherit the main schema's namespace.
It provides an example comparing how each approach would be implemented for schemas defining a company, persons, and products. The document aims to help schema designers choose the best approach for their projects.
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...Databricks
As big data jobs move from the proof-of-concept phase into powering real production services, we have to start consider what will happen when everything eventually goes wrong (such as recommending inappropriate products or other decisions taken on bad data). This talk will attempt to convince you that we will all eventually get aboard the failboat (especially with ~40% of respondents automatically deploying their Spark jobs results to production), and its important to automatically recognize when things have gone wrong so we can stop deployment before we have to update our resumes.
Figuring out when things have gone terribly wrong is trickier than it first appears, since we want to catch the errors before our users notice them (or failing that before CNN notices them). We will explore general techniques for validation, look at responses from people validating big data jobs in production environments, and libraries that can assist us in writing relative validation rules based on historical data.
For folks working in streaming, we will talk about the unique challenges of attempting to validate in a real-time system, and what we can do besides keeping an up-to-date resume on file for when things go wrong. To keep the talk interesting real-world examples (with company names removed) will be presented, as well as several creative-common licensed cat pictures and an adorable panda GIF.
If you’ve seen Holden’s previous testing Spark talks this can be viewed as a deep dive on the second half focused around what else we need to do besides good testing practices to create production quality pipelines. If you haven’t seen the testing talks watch those on YouTube after you come see this one
This document summarizes a talk about using Domain-Driven Design patterns and principles when developing applications with the Symfony framework. It discusses where to store business logic in an MVC application, avoiding anemic domain models by putting logic in entities. It provides examples of domain models, repositories, value objects, and strategies when building a Symfony application with DDD in mind. Finally, it outlines common DDD patterns and principles and recommends resources for further reading on the topic.
Validating big data jobs - Spark AI Summit EUHolden Karau
As big data jobs move from the proof-of-concept phase into powering real production services, we have to start consider what will happen when everything eventually goes wrong (such as recommending inappropriate products or other decisions taken on bad data). This talk will attempt to convince you that we will all eventually get aboard the failboat (especially with ~40% of respondents automatically deploying their Spark jobs results to production), and its important to automatically recognize when things have gone wrong so we can stop deployment before we have to update our resumes.
Figuring out when things have gone terribly wrong is trickier than it first appears, since we want to catch the errors before our users notice them (or failing that before CNN notices them). We will explore general techniques for validation, look at responses from people validating big data jobs in production environments, and libraries that can assist us in writing relative validation rules based on historical data.
For folks working in streaming, we will talk about the unique challenges of attempting to validate in a real-time system, and what we can do besides keeping an up-to-date resume on file for when things go wrong. To keep the talk interesting real-world examples (with company names removed) will be presented, as well as several creative-common licensed cat pictures and an adorable panda GIF.
If you’ve seen Holden’s previous testing Spark talks this can be viewed as a deep dive on the second half focused around what else we need to do besides good testing practices to create production quality pipelines. If you haven’t seen the testing talks watch those on YouTube after you come see this one
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...Databricks
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. This talk introduces Spark’s ML pipelines, and then looks at how to extend them with your own custom algorithms. By integrating your own data preparation and machine learning tools into Spark’s ML pipelines, you will be able to take advantage of useful meta-algorithms, like parameter searching and pipeline persistence (with a bit more work, of course).
Even if you don’t have your own machine learning algorithms that you want to implement, this session will give you an inside look at how the ML APIs are built. It will also help you make even more awesome ML pipelines and customize Spark models for your needs. And if you don’t want to extend Spark ML pipelines with custom algorithms, you’ll still benefit by developing a stronger background for future Spark ML projects.
The examples in this talk will be presented in Scala, but any non-standard syntax will be explained.
Extending spark ML for custom models now with python!Holden Karau
Are you interested in adding your own custom algorithms to Spark ML? This is the talk for you! See the companion examples in the High Performance Spark, and Sparkling ML project.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
This document discusses three approaches to managing namespaces when creating multiple XML schemas for a project:
1. Heterogeneous Namespace Design - Each schema is given a unique target namespace.
2. Homogeneous Namespace Design - All schemas share the same target namespace.
3. Chameleon Namespace Design - The main schema has a target namespace, while supporting schemas have no target namespace and inherit the main schema's namespace.
It provides an example comparing how each approach would be implemented for schemas defining a company, persons, and products. The document aims to help schema designers choose the best approach for their projects.
Validating Big Data Jobs—Stopping Failures Before Production on Apache Spark...Databricks
As big data jobs move from the proof-of-concept phase into powering real production services, we have to start consider what will happen when everything eventually goes wrong (such as recommending inappropriate products or other decisions taken on bad data). This talk will attempt to convince you that we will all eventually get aboard the failboat (especially with ~40% of respondents automatically deploying their Spark jobs results to production), and its important to automatically recognize when things have gone wrong so we can stop deployment before we have to update our resumes.
Figuring out when things have gone terribly wrong is trickier than it first appears, since we want to catch the errors before our users notice them (or failing that before CNN notices them). We will explore general techniques for validation, look at responses from people validating big data jobs in production environments, and libraries that can assist us in writing relative validation rules based on historical data.
For folks working in streaming, we will talk about the unique challenges of attempting to validate in a real-time system, and what we can do besides keeping an up-to-date resume on file for when things go wrong. To keep the talk interesting real-world examples (with company names removed) will be presented, as well as several creative-common licensed cat pictures and an adorable panda GIF.
If you’ve seen Holden’s previous testing Spark talks this can be viewed as a deep dive on the second half focused around what else we need to do besides good testing practices to create production quality pipelines. If you haven’t seen the testing talks watch those on YouTube after you come see this one
This document summarizes a talk about using Domain-Driven Design patterns and principles when developing applications with the Symfony framework. It discusses where to store business logic in an MVC application, avoiding anemic domain models by putting logic in entities. It provides examples of domain models, repositories, value objects, and strategies when building a Symfony application with DDD in mind. Finally, it outlines common DDD patterns and principles and recommends resources for further reading on the topic.
The document discusses 10 ways to improve code quality based on a presentation by Martin Cronje. It provides examples of refactoring code to remove switch/if statements, use polymorphism instead of switch statements, extract methods to improve readability, add comments appropriately, use frameworks instead of building your own, and avoid premature optimization. It also emphasizes fundamentals over cool technologies and understanding quality and purpose.
Introduction to javascript templating using handlebars.jsMindfire Solutions
Handlebars.js is a JavaScript templating library. Web apps are using JavaScript to create dynamic interfaces now more than ever before, and that’s not a trend that will change any time soon. DOM manipulation is great for simpler JavaScript apps, but what do you do when you’re changing huge chunks of the document with each change of the view? That’s where JavaScript templating has a critical role to play.
The document discusses the concept of connascence, which refers to coupling between software components where a change in one component requires changes in other components to maintain correctness. It defines 9 types of connascence based on different aspects that coupled components may rely on, such as identity, value, timing, etc. It also describes 3 axes (strength, degree, locality) for analyzing connascence issues. Examples are provided to illustrate different types of connascence.
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
Spark Datasets are an evolution of Spark DataFrames which allow us to work with both functional and relational transformations on big data with the speed of Spark.
Guide to wall street quant jobs for IITiansPratik Poddar
This document provides information about quantitative finance jobs for IIT graduates. It defines quantitative finance roles, which involve quantitative analysis in trading at large banks and hedge funds. These roles require advanced skills in mathematics, statistics, and computer science, as well as an understanding of financial markets. Example jobs mentioned include derivatives trader, trading desk quant strategist, and algorithmic trading quant. The document provides sample employers and recommends materials to prepare for technical interviews, which typically focus on mathematics, computer science, and brain teasers. It also provides example interview questions.
Using Scrum on 3SL Cradle - traceability model and project schemaYulia Madorskaya
This document describes a traceability model and project schema for the requirements management and systems engineering tool Cradle to support agile methodologies like Scrum. The schema includes item types and relationships to link releases, sprints, tasks, user stories, and stakeholders. It also describes attributes for statuses, estimates, dates, and more to enable planning and tracking. The document provides instructions for downloading and importing the schema to use it on a new Cradle project.
Core Java Programming Language (JSE) : Chapter IV - Expressions and Flow Cont...WebStackAcademy
Expressions perform operations on data and move data around. Some expressions will be evaluated for their results, some for their side effects, some for both. An expression can have three kinds of result:
a value, such as the result of: (4 * i)
a variable, such as the result of: i = 4
nothing (in the case of an invocation of a method declared as void)
An expression that results in a variable is called an lvalue in C++ and many other languages. A variable expression in Java is the same thing, the Java Language Specification just uses the name variable instead of lvalue. Such an expression can be used on the left hand side of an assignment operator. Side effects come about when an expression includes an assignment, increment, decrement, or method invocation.
In Java language there are several keywords that are used to alter the flow of the program. Statements can be executed multiple times or only under a specific condition. The if, else, and switch statements are used for testing conditions, the while and for statements to create cycles, and the break and continue statements to alter a loop.
When the program is run, the statements are executed from the top of the source file to the bottom. One by one.
Effective testing for spark programs scala bay preview (pre-strata ny 2015)Holden Karau
We all know testing is important, but often end up cutting corners because its too much effort. Come learn how to make testing Spark programs less effort and save your self from future production disasters when your recommendation system starts to return no results. We will explore how to quickly make tests for regular Spark programs, working with DataFrames, and special considerations for making effective unit tests for Spark Streaming. If you are super excited about the subject of testing Spark programs, make sure to also checkout the corresponding Strata NY talk for even more Spark testing fun. http://strataconf.com/big-data-conference-ny-2015/public/schedule/detail/42993
Testing and validating spark programs - Strata SJ 2016Holden Karau
Apache Spark is a fast, general engine for big data processing. As Spark jobs are used for more mission-critical tasks, it is important to have effective tools for testing and validation. Expanding her Strata NYC talk, “Effective Testing of Spark Programs,” Holden Karau details reasonable validation rules for production jobs and best practices for creating effective tests, as well as options for generating test data.
Holden explores best practices for generating complex test data, setting up performance testing, as well as basic unit testing. The validation component will focus on how to create reasonable validation rules given the constraints of Spark’s accumulators.
Unit testing of Spark programs is deceptively simple. Holden looks at how unit testing of Spark itself is accomplished and distills a number of best practices into traits we can use. This includes dealing with local mode cluster creation and tear down during test suites, factoring our functions to increase testability, mock data for RDDs, and mock data for Spark SQL. A number of interesting problems also arise when testing Spark Streaming programs, including handling of starting and stopping the streaming context, providing mock data, and collecting results, and Holden pulls out simple takeaways for dealing with these issues.
Holden also explores Spark’s internal methods for generating random data, as well as options using external libraries to generate effective test datasets (for both small- and large-scale testing). And while acceptance tests are not always thought of as part of testing, they share a number of similarities, so Holden discusses which counters Spark programs generate that we can use for creating acceptance tests, best practices for storing historic values, and some common counters we can easily use to track the success of our job, all while working within the constraints of Spark’s accumulators.
1) The document discusses data structures and their importance for organizing data, designing large computer systems, and writing efficient programs.
2) It covers common data structures like arrays, stacks, queues, linked lists, trees and graphs. Choosing the right data structure depends on the problem and constraints like space and time.
3) Analyzing algorithms' worst, average, and best cases helps determine efficiency. Practicing with examples like sorting numbers and searching databases improves skills with data structures.
This document discusses principles of clean code and best practices for writing maintainable code. It defines clean code as code that is readable, testable, has minimal dependencies and clear purpose. It emphasizes that code quality is important to reduce technical debt and improve productivity. Specific techniques mentioned include using descriptive names, small single-purpose functions, object-oriented principles like SOLID, design patterns like strategy and observer patterns, and architectural styles like hexagonal architecture. The document stresses that clean code requires ongoing effort to refactor and prevent degradation over time.
Make it Responsive! the logic, the code & tricks of tradeSidharth Sidharth
A talk on Responsive Web Design (RWD) for WordPress Themes for WordCamp Pune 2013. Talking about layouts, image optimisation, typography, media queries, viewport settings etc.
The specification pattern allows business rules to be combined using boolean logic to filter objects. It defines a common interface for all specifications to determine if an object meets a certain specification. This loose coupling avoids polluting the repository interface with many filter methods. Specifications can be written for individual rules, then combined to check complex business rules. For example, in recruitment, specifications for a candidate's test score and interview result can be ANDed or ORed to check eligibility.
The document discusses Java 8 Lambdas and the Streaming API. Lambdas allow functions to be passed around as method arguments rather than whole objects. The Streaming API allows collections to be processed in a functional way using intermediate and terminal operations on a stream, such as filtering, mapping, reducing, and collecting the results. Examples demonstrate common stream operations like filtering, sorting, mapping elements to different types, and collecting results.
In celebration of Maker Week, the Virginia Tech Northern Virginia Center hosted a 3DPrinting Day. This presentation is on how to use OpenSCAD (http://openscad.org) for 3D modeling.
The document discusses the history and future of building web applications using components. It begins by explaining the traditional client-server model and then transitions to discussing newer approaches like REST APIs and single-page applications built with components. It covers topics like building custom elements, using frameworks like Polymer, and the growing capabilities of the web platform for creating reusable UI components. The overall message is that the web is moving towards a more component-based approach to building applications in order to improve developer productivity and user experience.
Collaborating with Developers: How-to Guide for Test Engineers - By Gil Tayar Applitools
* Full webinar recording here: https://youtu.be/0NT_fmXwz1k **
"I will give a recipe that you can follow to ease your fear of the unknown: writing tests for developer code.
At the end of this session, I guarantee that you will gain a deeper understanding of different kinds of tests, know how to decipher developer terminology, and learn how to write unit, integration, browser, and E2E tests." -- Gil Tayar. Sr. Architect & Evangelist
Testing is shifting left, moving closer to testing the code itself. But while managers dictate a shift to the left, developers and testers are confused as to how exactly to test the code.
And while the backend world has established code-testing methodologies, we are still trying to figure out how to test frontend code, while ensuring effective testing procedures and processes.
This means testers need to step in and work with the frontend developers, but with an understanding of the frameworks by which frontend code is tested, the various kinds of testing that can be performed on frontend code, and which tools can be used for this.
In this hands-on session, Gil Tayar discusses various test methodologies, and how they fit together in a coherent way. Gil also includes sample code that you can use as a template in your own project -- all in order to provide you with the knowledge and tools to approach and test developer code.
Improving PySpark performance: Spark Performance Beyond the JVMHolden Karau
This talk covers a number of important topics for making scalable Apache Spark programs - from RDD re-use to considerations for working with Key/Value data, why avoiding groupByKey is important and more. We also include Python specific considerations, like the difference between DataFrames/Datasets and traditional RDDs with Python. We also explore some tricks to intermix Python and JVM code for cases where the performance overhead is too high.
PyConUK2013 - Validated documents on MongoDB with MingAlessandro Molina
Ming is a SQLAlchemy-inspired object-document mapper (ODM) for MongoDB developed at SourceForge which is also used by the TurboGears2 web framework to provide mongodb support.
After a short introduction to the basic Ming layer we will cover the Ming Object Document Mapper layer to show how to take advantage of its Unit Of Work to avoid performing incomplete changes and achieve relations between collections.
The last part of the talk will show how to use Ming to perform lazy migration of data when your schema changes and how to drop below the ODM layer to achieve maximum speed.
The magic of (data parallel) distributed systems and where it all breaks - Re...Holden Karau
Distributed systems can seem magical, and sometimes all of the magic works and our job succeeds. However, if you've worked with them for a long enough time you've found a few places where the magic starts to break down and the fact that it's actually a collection of several hundred garden gnomes* rather than a single large garden gnome.
This talk will use Apache Spark, Beam, Flink, Kafka, and Map Reduce to explore the world of data parallel distributed systems. We'll start with some happy pieces of magic, like how we can combine different transformations into a single pass over the data, working between different languages, data partitioning, and lambda serialization. After each new piece of magic is introduced we'll look at how it breaks in one (or two) of the systems.
Come to be told it's not your fault everything is broken, or if your distributed software still works an exciting preview of everything that's going to go wrong. Don't work with distributed systems? Come to be reassured you've made good life choices.
HOW VOCERA LEVERAGES SYNERZIP FOR ENHANCEMENT OF VOCERA PLATFORM & ITS USER E...Synerzip
Steve Newson, Global VP, Systems Engineering, Vocera says forward thinking of Synerzip team added great value to Vocera.
To know more about how Vocera & Synerzip partnership is enhancing the leading healthcare platform for clinical communication & workflow to deliver safe, efficient quality patient care, visit https://synerzip.com/story/steve-newson-global-vice-president-systems-engineering-vocera/.
Synerzip is a software development partner that provides full software development lifecycle services including testing. They utilize a dual-shore model with experienced teams in the US and India to reduce costs by 50%. Synerzip follows agile development processes and best practices for testing such as test automation, test case management, and tracking bugs and metrics. They have experience delivering projects for clients across industries and technologies.
The document discusses 10 ways to improve code quality based on a presentation by Martin Cronje. It provides examples of refactoring code to remove switch/if statements, use polymorphism instead of switch statements, extract methods to improve readability, add comments appropriately, use frameworks instead of building your own, and avoid premature optimization. It also emphasizes fundamentals over cool technologies and understanding quality and purpose.
Introduction to javascript templating using handlebars.jsMindfire Solutions
Handlebars.js is a JavaScript templating library. Web apps are using JavaScript to create dynamic interfaces now more than ever before, and that’s not a trend that will change any time soon. DOM manipulation is great for simpler JavaScript apps, but what do you do when you’re changing huge chunks of the document with each change of the view? That’s where JavaScript templating has a critical role to play.
The document discusses the concept of connascence, which refers to coupling between software components where a change in one component requires changes in other components to maintain correctness. It defines 9 types of connascence based on different aspects that coupled components may rely on, such as identity, value, timing, etc. It also describes 3 axes (strength, degree, locality) for analyzing connascence issues. Examples are provided to illustrate different types of connascence.
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
Spark Datasets are an evolution of Spark DataFrames which allow us to work with both functional and relational transformations on big data with the speed of Spark.
Guide to wall street quant jobs for IITiansPratik Poddar
This document provides information about quantitative finance jobs for IIT graduates. It defines quantitative finance roles, which involve quantitative analysis in trading at large banks and hedge funds. These roles require advanced skills in mathematics, statistics, and computer science, as well as an understanding of financial markets. Example jobs mentioned include derivatives trader, trading desk quant strategist, and algorithmic trading quant. The document provides sample employers and recommends materials to prepare for technical interviews, which typically focus on mathematics, computer science, and brain teasers. It also provides example interview questions.
Using Scrum on 3SL Cradle - traceability model and project schemaYulia Madorskaya
This document describes a traceability model and project schema for the requirements management and systems engineering tool Cradle to support agile methodologies like Scrum. The schema includes item types and relationships to link releases, sprints, tasks, user stories, and stakeholders. It also describes attributes for statuses, estimates, dates, and more to enable planning and tracking. The document provides instructions for downloading and importing the schema to use it on a new Cradle project.
Core Java Programming Language (JSE) : Chapter IV - Expressions and Flow Cont...WebStackAcademy
Expressions perform operations on data and move data around. Some expressions will be evaluated for their results, some for their side effects, some for both. An expression can have three kinds of result:
a value, such as the result of: (4 * i)
a variable, such as the result of: i = 4
nothing (in the case of an invocation of a method declared as void)
An expression that results in a variable is called an lvalue in C++ and many other languages. A variable expression in Java is the same thing, the Java Language Specification just uses the name variable instead of lvalue. Such an expression can be used on the left hand side of an assignment operator. Side effects come about when an expression includes an assignment, increment, decrement, or method invocation.
In Java language there are several keywords that are used to alter the flow of the program. Statements can be executed multiple times or only under a specific condition. The if, else, and switch statements are used for testing conditions, the while and for statements to create cycles, and the break and continue statements to alter a loop.
When the program is run, the statements are executed from the top of the source file to the bottom. One by one.
Effective testing for spark programs scala bay preview (pre-strata ny 2015)Holden Karau
We all know testing is important, but often end up cutting corners because its too much effort. Come learn how to make testing Spark programs less effort and save your self from future production disasters when your recommendation system starts to return no results. We will explore how to quickly make tests for regular Spark programs, working with DataFrames, and special considerations for making effective unit tests for Spark Streaming. If you are super excited about the subject of testing Spark programs, make sure to also checkout the corresponding Strata NY talk for even more Spark testing fun. http://strataconf.com/big-data-conference-ny-2015/public/schedule/detail/42993
Testing and validating spark programs - Strata SJ 2016Holden Karau
Apache Spark is a fast, general engine for big data processing. As Spark jobs are used for more mission-critical tasks, it is important to have effective tools for testing and validation. Expanding her Strata NYC talk, “Effective Testing of Spark Programs,” Holden Karau details reasonable validation rules for production jobs and best practices for creating effective tests, as well as options for generating test data.
Holden explores best practices for generating complex test data, setting up performance testing, as well as basic unit testing. The validation component will focus on how to create reasonable validation rules given the constraints of Spark’s accumulators.
Unit testing of Spark programs is deceptively simple. Holden looks at how unit testing of Spark itself is accomplished and distills a number of best practices into traits we can use. This includes dealing with local mode cluster creation and tear down during test suites, factoring our functions to increase testability, mock data for RDDs, and mock data for Spark SQL. A number of interesting problems also arise when testing Spark Streaming programs, including handling of starting and stopping the streaming context, providing mock data, and collecting results, and Holden pulls out simple takeaways for dealing with these issues.
Holden also explores Spark’s internal methods for generating random data, as well as options using external libraries to generate effective test datasets (for both small- and large-scale testing). And while acceptance tests are not always thought of as part of testing, they share a number of similarities, so Holden discusses which counters Spark programs generate that we can use for creating acceptance tests, best practices for storing historic values, and some common counters we can easily use to track the success of our job, all while working within the constraints of Spark’s accumulators.
1) The document discusses data structures and their importance for organizing data, designing large computer systems, and writing efficient programs.
2) It covers common data structures like arrays, stacks, queues, linked lists, trees and graphs. Choosing the right data structure depends on the problem and constraints like space and time.
3) Analyzing algorithms' worst, average, and best cases helps determine efficiency. Practicing with examples like sorting numbers and searching databases improves skills with data structures.
This document discusses principles of clean code and best practices for writing maintainable code. It defines clean code as code that is readable, testable, has minimal dependencies and clear purpose. It emphasizes that code quality is important to reduce technical debt and improve productivity. Specific techniques mentioned include using descriptive names, small single-purpose functions, object-oriented principles like SOLID, design patterns like strategy and observer patterns, and architectural styles like hexagonal architecture. The document stresses that clean code requires ongoing effort to refactor and prevent degradation over time.
Make it Responsive! the logic, the code & tricks of tradeSidharth Sidharth
A talk on Responsive Web Design (RWD) for WordPress Themes for WordCamp Pune 2013. Talking about layouts, image optimisation, typography, media queries, viewport settings etc.
The specification pattern allows business rules to be combined using boolean logic to filter objects. It defines a common interface for all specifications to determine if an object meets a certain specification. This loose coupling avoids polluting the repository interface with many filter methods. Specifications can be written for individual rules, then combined to check complex business rules. For example, in recruitment, specifications for a candidate's test score and interview result can be ANDed or ORed to check eligibility.
The document discusses Java 8 Lambdas and the Streaming API. Lambdas allow functions to be passed around as method arguments rather than whole objects. The Streaming API allows collections to be processed in a functional way using intermediate and terminal operations on a stream, such as filtering, mapping, reducing, and collecting the results. Examples demonstrate common stream operations like filtering, sorting, mapping elements to different types, and collecting results.
In celebration of Maker Week, the Virginia Tech Northern Virginia Center hosted a 3DPrinting Day. This presentation is on how to use OpenSCAD (http://openscad.org) for 3D modeling.
The document discusses the history and future of building web applications using components. It begins by explaining the traditional client-server model and then transitions to discussing newer approaches like REST APIs and single-page applications built with components. It covers topics like building custom elements, using frameworks like Polymer, and the growing capabilities of the web platform for creating reusable UI components. The overall message is that the web is moving towards a more component-based approach to building applications in order to improve developer productivity and user experience.
Collaborating with Developers: How-to Guide for Test Engineers - By Gil Tayar Applitools
* Full webinar recording here: https://youtu.be/0NT_fmXwz1k **
"I will give a recipe that you can follow to ease your fear of the unknown: writing tests for developer code.
At the end of this session, I guarantee that you will gain a deeper understanding of different kinds of tests, know how to decipher developer terminology, and learn how to write unit, integration, browser, and E2E tests." -- Gil Tayar. Sr. Architect & Evangelist
Testing is shifting left, moving closer to testing the code itself. But while managers dictate a shift to the left, developers and testers are confused as to how exactly to test the code.
And while the backend world has established code-testing methodologies, we are still trying to figure out how to test frontend code, while ensuring effective testing procedures and processes.
This means testers need to step in and work with the frontend developers, but with an understanding of the frameworks by which frontend code is tested, the various kinds of testing that can be performed on frontend code, and which tools can be used for this.
In this hands-on session, Gil Tayar discusses various test methodologies, and how they fit together in a coherent way. Gil also includes sample code that you can use as a template in your own project -- all in order to provide you with the knowledge and tools to approach and test developer code.
Improving PySpark performance: Spark Performance Beyond the JVMHolden Karau
This talk covers a number of important topics for making scalable Apache Spark programs - from RDD re-use to considerations for working with Key/Value data, why avoiding groupByKey is important and more. We also include Python specific considerations, like the difference between DataFrames/Datasets and traditional RDDs with Python. We also explore some tricks to intermix Python and JVM code for cases where the performance overhead is too high.
PyConUK2013 - Validated documents on MongoDB with MingAlessandro Molina
Ming is a SQLAlchemy-inspired object-document mapper (ODM) for MongoDB developed at SourceForge which is also used by the TurboGears2 web framework to provide mongodb support.
After a short introduction to the basic Ming layer we will cover the Ming Object Document Mapper layer to show how to take advantage of its Unit Of Work to avoid performing incomplete changes and achieve relations between collections.
The last part of the talk will show how to use Ming to perform lazy migration of data when your schema changes and how to drop below the ODM layer to achieve maximum speed.
The magic of (data parallel) distributed systems and where it all breaks - Re...Holden Karau
Distributed systems can seem magical, and sometimes all of the magic works and our job succeeds. However, if you've worked with them for a long enough time you've found a few places where the magic starts to break down and the fact that it's actually a collection of several hundred garden gnomes* rather than a single large garden gnome.
This talk will use Apache Spark, Beam, Flink, Kafka, and Map Reduce to explore the world of data parallel distributed systems. We'll start with some happy pieces of magic, like how we can combine different transformations into a single pass over the data, working between different languages, data partitioning, and lambda serialization. After each new piece of magic is introduced we'll look at how it breaks in one (or two) of the systems.
Come to be told it's not your fault everything is broken, or if your distributed software still works an exciting preview of everything that's going to go wrong. Don't work with distributed systems? Come to be reassured you've made good life choices.
HOW VOCERA LEVERAGES SYNERZIP FOR ENHANCEMENT OF VOCERA PLATFORM & ITS USER E...Synerzip
Steve Newson, Global VP, Systems Engineering, Vocera says forward thinking of Synerzip team added great value to Vocera.
To know more about how Vocera & Synerzip partnership is enhancing the leading healthcare platform for clinical communication & workflow to deliver safe, efficient quality patient care, visit https://synerzip.com/story/steve-newson-global-vice-president-systems-engineering-vocera/.
Synerzip is a software development partner that provides full software development lifecycle services including testing. They utilize a dual-shore model with experienced teams in the US and India to reduce costs by 50%. Synerzip follows agile development processes and best practices for testing such as test automation, test case management, and tracking bugs and metrics. They have experience delivering projects for clients across industries and technologies.
Test Driven Development – What Works And What Doesn’t Synerzip
This document discusses test driven development (TDD) and quality assurance practices for agile software development. It introduces Synerzip, an offshore software development partner, and describes their agile development lifecycle involving short iterations with user stories, estimation, testing, and customer approval. The benefits of practices like TDD, continuous integration, unit testing, and automation are outlined. Challenges with implementation and common mistakes are also discussed. Various testing methodologies and tools used in agile projects are defined.
Distributed/Dual-Shore Agile Software Development – Is It Effective?Synerzip
This webinar covers the best practices for making dual-shore Agile work effectively.
Topics that are covered -
Business case for Dual-Shore development
• Business case for Agile
• Can Dual-Shore and Agile be combined effectively?
• Challenges
• Best Practices
• Synerzip Introduction
Stay tuned for Synerzip's upcoming webinars that you may be interested in https://www.synerzip.com/webinars/
Using Agile Approach with Fixed Budget ProjectsSynerzip
This webinar covers the best practices, alternative approaches for effectively using Agile in fixed budget projects.
Get to know more about Synerzip's upcoming webinars at https://www.synerzip.com/webinars/
The document discusses the role of quality assurance (QA) in agile teams. It compares the traditional and agile approaches to QA, outlining the agile QA responsibilities which include helping define user stories and acceptance criteria, estimating stories, ensuring testing is accounted for in planning, and more. Common mistakes like not involving QA throughout or having them run tests in subsequent sprints are also covered.
The document discusses several agile techniques for mobile app development, including hyper-prototyping, community code scrounging, and user design studios. Hyper-prototyping involves rapidly iterating on prototypes multiple times per day to get quick feedback. Community code scrounging involves searching online developer communities to find and integrate code snippets. User design studios bring together stakeholders to collaboratively design app UIs in a workshop format.
Challenges in Traditional Organizations
• Impact on Agile Process
• Tweaking Agile For Your Situation
Development in short iterative cycles builds better trust
relationship and a stronger engagement between the
product owner/ customer and the development team.
Stay tuned for our upcoming webinars at https://www.synerzip.com/webinars/ that might be of your interest.
Accelerating Agile Transformations - Ravi VermaSynerzip
This webinar discusses three organizational change techniques which can help accelerate Agile transformation.
learn about a simple framework for Accelerating Agile Transformation, with practical techniques you can apply.
Read more at https://www.synerzip.com/webinar/accelerating-agile-transformations/
The document discusses product management basics from an agile perspective. It defines the roles of product managers and product owners, noting that product managers take on a broader strategic role while product owners focus on the development team. It also outlines common failure modes for each role and organizational models for scaling the roles. The conclusion emphasizes that agile has increased the scope of product management work.
Product Portfolio Kanban - by Erik HuddlestonSynerzip
The document discusses applying lean and kanban principles beyond software development to the wider organization. It describes three critical practices for a "Product Portfolio Kanban": 1) stakeholder-based investment themes and business case management to optimize organizational value, 2) upstream and downstream work-in-progress (WIP) limits to enable flow, and 3) dynamic allocations based on organizational capacity and appetite. Implementing these practices can help avoid unintended consequences of agile success and increase overall organizational value.
Modern Software Practices - by Damon PooleSynerzip
This session provides an overview of the following modern practices:
Continuous integration
Refactoring
Unit tests
Multi-stage continuous integration
One piece flow
Cross-functional teams
Product backlog
Story point estimation
User stories
Burn-up charts
Read more at https://www.synerzip.com/webinar/modern-software-practices/
The document discusses context-driven leadership and managing projects based on their level of uncertainty and complexity. It describes a model where projects are categorized as sheepdogs, cows, bulls, or colts based on having low or high levels of uncertainty and complexity. The appropriate leadership approach depends on the project type - sheepdog projects need agility, cow projects need defined interfaces, bull projects need both agility and process, and colt projects are laissez faire. Reducing uncertainty or complexity can involve changing attributes like team size or location. Leadership requires a balance of developing processes, people, technology, and business skills to match the project context.
This document provides an overview and summary of a presentation on unit testing, test-driven development, and behavior-driven development. The presentation covers the basics of each approach, provides examples, and discusses the benefits including increased code quality, reduced defects, and more confidence in the code. It emphasizes that testing should be integrated into the development process from the start.
Pragmatics of Agility - by Venkat SubramaniamSynerzip
This webinar covers the essence of Agile and provides guidance on dealing with common impediments.
Only one thing matters in software development – to successfully deliver a product so users can derive value. If we’re not succeeding with it, it does not matter what the process is called or how we do it. Agile development can help reduce risk and increase the chances of success, but there is no magic wand we can wave at the problem for a quick-fix. It takes disciplined, dedicated, and continuous effort to achieve the desired results.
Read more from the original copy at https://www.synerzip.com/webinar/pragmatics-of-agility-webinar-february-2011/
It covers -
- Pros and cons of different strategies for developing mobile applications.
- Leading choices for cross platform mobile application development. While there are many frameworks for cross platform application development, we will discuss two leading frameworks namely PhoneGap and Titanium Mobile.
Find original copy at https://www.synerzip.com/webinar/cross-platform-mobile-app-development/
It covers ATDD, BDD, UTDD, Lean & Kanban, Technical debt, Value focus & many more.
Every year, world wide Agile Annual Conferences takes place & Synerzip's CEO & CTO use to attend it & bring key takeaways over the years.
Original copy at https://www.synerzip.com/webinar/agile2011-conference-key-take-aways-2011/
This webinar discusses how to do individual performance evaluation in Agile team environment.
concludes with the introduction of 6 tangible techniques for performance evaluation of Agile teams and team members. Included in these techniques is the “annual agile performance review”. These techniques can be easily integrated into your existing environment in order to emphasize the expected behaviors of an Agile team based on the fundamental Agile principles.
Read more from the original copy at https://www.synerzip.com/webinar/performance-evaluation-in-agile/
This webinar discusses how to use Kanban techniques with your Agile teams.
In this session, Damon Poole, Founder and CTO of AccuRev, will introduce Kanban from a Scrum perspective, show how the Lean practice of “one piece flow” is the key to both, and look at how to mix and match Scrum and Kanban to fine tune a process that fits your circumstances.
Read more from the original copy at https://www.synerzip.com/webinar/scrum-and-kanban-oct2011/
This webinar discusses the concept of Technical Debt and approaches for managing it effectively.
Technical debt is the consequence of choosing a software design or construction approach that is expedient but increases complexity and future costs. It can impede the team’s ability to add new features, quickly fix bugs, and evolve the software product. From a business perspective, technical debt can keep a company from remaining competitive in today’s dynamic marketplace.
Read more from the original copy at https://www.synerzip.com/webinar/managing-technical-debt-jan2012/
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
In our second session, we shall learn all about the main features and fundamentals of UiPath Studio that enable us to use the building blocks for any automation project.
📕 Detailed agenda:
Variables and Datatypes
Workflow Layouts
Arguments
Control Flows and Loops
Conditional Statements
💻 Extra training through UiPath Academy:
Variables, Constants, and Arguments in Studio
Control Flow in Studio
"NATO Hackathon Winner: AI-Powered Drug Search", Taras KlobaFwdays
This is a session that details how PostgreSQL's features and Azure AI Services can be effectively used to significantly enhance the search functionality in any application.
In this session, we'll share insights on how we used PostgreSQL to facilitate precise searches across multiple fields in our mobile application. The techniques include using LIKE and ILIKE operators and integrating a trigram-based search to handle potential misspellings, thereby increasing the search accuracy.
We'll also discuss how the azure_ai extension on PostgreSQL databases in Azure and Azure AI Services were utilized to create vectors from user input, a feature beneficial when users wish to find specific items based on text prompts. While our application's case study involves a drug search, the techniques and principles shared in this session can be adapted to improve search functionality in a wide range of applications. Join us to learn how PostgreSQL and Azure AI can be harnessed to enhance your application's search capability.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
22. https://in.linkedin.com/in/sujithsudhakaran
Scala – keywords used in the
examples
● val: to declare a immutable variable
● var: to declare a mutable variable
● def: to define a function
● println: to print out line
● trait: similar to interfaces in java
● sealed: keyword used to specific classes locally only
● object: keyword used to instantiate a singleton object
● extends: used for inheritence
● with: used while doing multiple inheritence
● match: keyword for pattern matching
● case: used for pattern matching
● override: used to override a behavior of an interface
● =>: used to show the some input mapping to output
41. https://in.linkedin.com/in/sujithsudhakaran
Traits - Mixin class composition
● Share interfaces and fields between classes
● Can be extended but NOT instantiated
● Use the keyword with to extend from multiple traits
● Traits with code are known as Mixins
● Abstract and concrete fields
- Introduce myself
- My background
- Share my learning and head start
Questions:
- Java backgroung
- worked on func language
- My journey with Scala
- Realization
- Point out the difference and similarities
- Brief idea about the things to cover
- Talk about the things from the slide
- So lets dive in
- Question: what is IP?
- Show the image
- What is to be computed is interwined with how it is to be computed
- Focus is on how the program operates
- Consists of commands for computer
- Always try to think w.r.t to time
- Show the examples
- I think everyone one of us might have written such code
Haven’t you?
Typically, IP
- Changes the program state
- Assignments, for/while loops, goto, conditional statements
- Carrying on the legacy
- Eg: C, C++, Java, Go, Python, Ruby etc.
- 3rd example: Any problem?
- We’ll about the problem later
- why are we doing this?
Next question:
- Why ?
- HW limitations made us to be imperative
- Optimize code
- No extra memory
- Review comments
- Since memory on chip was less and very expensive
- Inherently Imperative
- Why things have changed now?
Question:
How many of you know what’s Moore’s Law
Gordon Moore 1965
- Number of transistors per square inch on integrated circuits had doubled every year since their invention
Show the graph
Explain the graph
- It’s become dead
- More cores
Questions:
Are we taking advantage of such Hws?
Not the HW we have asked for
Difficult to unlearn the things which we were doing in IP
Thoughts from Martin
Every IP languague has tried to take adv
Next slide
Before the image
Question:
- What is concurrency?
- Way to acheive faster programs
- No dependency on order of execution
- Has it’s own problem
- Show the image
- deadlocks, race conditions etc
Q & A
So, How does scala achieves it?
- Actor model
Typically 4 ways
Shared state
Software Transaction memory
Message Passing
Dataflow concurrency
Before image
Question:
- What is Parallel programming?
- Execute programs faster on different hws
- Programs can be sequential in nature
- Show the image
- Its very good when we have such infra; but are we ready to take the problems while this is getting built
- So, we have the multi-core Hws
- Are we able to take advantage of it
- Difficult if left to developers
- Extra baggage of imperative style
- Its difficult to handle synchronization
- Languagues should be inherently design to make use multi core hws
- Scala does it with the help of mutable Hws
- We can simply call .par function on the collections
- Provided we follow principles of FP which we will talk about in a while
- So, if we stick to our conventional IP
- Its really hard to get it right
- You will definitely lates nights debugging production issues
- FP is not something which is new
- It was there all this time and was more popular in academics
- But now we are seeing rebirth of FP in various languaes like
- Haskell(Best)
- Lisp
- Erlang
- Scala etc.
- Lets try to understand the building block of FP
Question:
Can we be certain about the outcome of this code?
Why?
- The problem is the shared state
- Mutex, locks etc
- But, why make our lives miserable
- Culprit
- Why not get rid of the culprit itself
- Immutability
- Lets have a look at the next pillar
- Remember this eg from previous slide
- What’s the problem
- How about this example?
- They have side effects
- They both are impure functions
- Anything which changes some state outside it’s scope
- So, understanding and debugging such a fuction requires knowledge about context and its possible history
Question
- Production bug scenario?
- Developer cannot reproduce
- Pure functions makes systems to be
- easier to understand
- easier to analyze
- Easier to test
- Easier to optimize
- From the previous example, Isn’t this code more convinient to test
- Scala doesn’t restrict side effects it advocates to be
- explicit
- declarative
- rather than allowing users to assume
- So, that’s the ideology we should keep in mind
- As we saw from last couple of side
- Immuatability & Pure F are building blocks for FP
- Stick to this 2 pillars to build any sort of complex concurrent and parrallel apps
Question:
What is this?
- Show the image
- Functions should be mapping input values to o/p not change any state
- Why do I have a octopus kind of creature here?
- Just to depict the scenario that it’s very simple to use with parallel programming
- Primary concepts
- Immutability for data structures
- Pure functions without side-effects
- Auxiliary concepts
- Recursion
- First class and higher order functions
- Type systems
- Referential transparency
- So, if we follow this style, eventually our code becomes
- elegant
- readable
- concise
- Free from side effects or inconsistent results
- So, why should we do this?
- CPU are not getting faster anymore and memory is getting cheaper
- Simpler reasoning principles
- Better modularity and abstractions
- Transparent concurrent code
- More maintainable and bug free code
- New perspective for problems
- Leave all these reasons aside, It will be a new weapon to your arsenal
Before the image
Question
- How many of you like OOPs?
- Advantages:
- Abstraction
- Modularity
- Code reuse
- Clear defined interfaces
- Easy to maintain
- Suitable for larger projects
So, we have talked about Imperative, Functional and OOP.
Now, we are going to see where does Scala fits in all of this FACADE
History before the image
Pizza before Scala
3 features from FP:
- Generics
- Higher Order functions
- Pattern matching
- This formed the basis for Generics in Java 5
- Current era, one size doesn’t fit all
- Hence, Scala has taken an hybrid approach
- Show the image
- Explain the image
- Express common programming patterns in a concise, elegant, and type-safe way.
- It's an attempt to acheive smoother integration between FP ana OOP
- Martin Ordersky, 2004
- Statically typed
- Mix of OOP and FP
- Runs on JVM
- Inter-operable with Java
- Concise
- High-level
- All values have a type
- Including numerical values and functions
- Any: supertype of all types
- AnyVal -> value types
- AnyRef -> reference types
- This corresponds to java.lang.Object
- So that’s how Scala maintains inter-operability with Java classes
- Read it out
- Before the images
- Omit certain type annotations
- Compiler can deduce the type from the type of the initialization expression
- Same implies with function return types
- 1 Exception though, Recursive functions
- Show the java example
- Show the Scala example
- Emphasize on how it’s written in Java and Scala
- Next slide is also about high level
Q & A:
- How many of you like to write the access specifiers in other languaes and other boiler plate code
- Atleast, now I don’t
- Scala, has made my life way simpler
- All the advantages we talked about OOP is readily available with Scala
- Every value is object
- As in the examples
We can have things like:
- Classes
- Objects
- Abstract classes
- Traits
- Case classes etc
- But don’t use the Scala classes as java classes
- Object: instantiated once, singleton, all members static
- Abstract classes: similar to java, only subclassed
- Traits are akin to Interfaces
- Case Classes: immutable data holding entities that are used for pattern matching; no new keyword; has companion objects that are automatically crated which serve as the extractors; apply and unapply methods
- That’s how elegant is Scala in OO domain
- next we’ll talk about the concepts which i have used and felt interesting
- There are many other concepts like
- Currying
- Self types
- Annotations
- Generics etc
- We won’t be though them in this session atleast
- Before the image
-Doing this for quiet some time now. This is still the same in Scala.
- So, every value is evaluated when they are passed around
- In IP, we had taken it further by having pass by reference. But, we are not going to talk about it here
- Scala being expression oriented language, it promotes pass by value too
- Show the image
- Explain what's happening in the program
- Before the image
- But there is a catch!!
- Lets look a problem which we may encounter when we use pass by value
Since, in scala everything is value. Right?
- show the image
- Discuss the problem
With only 1 image
- Take a look at this scala code
- Talk about the notation(syntax)
- Show the 2nd image
- Talk about the side effect now because of the t variable
- There is a catch in this function. This version is not a pure function
- Either way works for Scala provided
- Reduced expressions consists of pure functions
- Both evaluations terminates
- Call by value
- Represented like a val
- Every function argument is evaluated only once
- Call by name
- Represented like def
- Argument is not evaluated if the param is not used
- Since every value is an object
- They are objects with apply method
- As we pass x and y to the sum function; we can pass around the sum function too
- Show the anonymous functions
- since its a value we can assign it to a variable
- Question:
- What is a diff between methods and functions?
- Techincally, they are considered differently in scala, I haven’t told you how? yet,
- Guess representation for funciton and method
- Reason, since if we use def it’s an object and it’s like calling apply method on that obj
- Before the image
- HOF are functions that take or return an another function
- Lets look at some example where we will use them
- Show the image 1
- Talk about the example
- Show the image 2
- Talk about the example
- Question
- What’s common
- functionality of addition is common
- What’s different
- What needs to be added
- Design Principle: “Separate what cahnges from what stays the same”
- So, lets try to do that
- Expalin the example
Question:
- Can we do better?
- Explain the example
- Means for abstraction and creating new control structures
- What I have learnt is that it's far more easier to take in a function rather than returning a function
- Next concept which we are going to see is from the family of Recursion
- Before the images
- Ask about how many like recursion?
- Personally I never liked it; difficult for me
- show the image
- Talk about the example; exapsion
- Show next image
- Talk about the example; exapsion
Quetions: What’s the difference
- We add 1 more value to our expression
- last statement of the function is a call to itself
- then the same stack frame can be used and such calls are know as tail calls
- Executes in constant space
- The stack frame size is not increased by adding another function jump
So, with the GCD example compiler is able to figure that out
- Hence, it says, tail recursive
- Whereas, factorial is recursive call
- So, what’s the harm? It should be fine.
- Explain the 1st image
- Show the problem in the second image
- the problem here is because of the JVM stack frame size
- Go to next slide
So, now what?
- Do I have to always explode my programs?
- No, so now we write tail recursive calls
- Go to next slide
- Explain the example
- This is all good but I agree that it make your code a bit complex. So, if you know for sure you won't hit the max stack frame size of JVM then it's always better to use normal recursion
- Read the points from slide
- Similar to the Java interfaces
- Offers more than interfaces
- only normal constructors
- Can't have auxiliary constructors
- We'll see example and usage of Abstract type
- Use Scala traits as Java Abstract class
- Before the rigth image
- Talk about the diamond problem
- Talk about various approaches in other languages
- using virtual in C++ while inheriting
- Application for a college with various role
- Helps in having multiple inheritence
- We'll see how it's done in Scala
- Scala has created strict rules for multiple inheritence
- Class hierarchy creates a acyclic graph of inheritence
- Because of such strict rules there is no confusion in which function should be called
- The graph here is for just illustration only
- Linearization has a lot to do with the order in which the traits are mixed
- Prev example: Derived was extending A and C
- Explain the acyclic graph formation
- Explain how it will be reduced and then just show the image on next slide
- Ask if any questions?
- So, I told that we would be talking about abstract types. Right?
- Lets look at this problem here
- Explain the problem
- Talk about the solution by Anil
- Explain the solution here
- There is more to the abstract types/generics with variance and covariance
- I think you get the gist of what Scala can offer in generics
- Covariance +A -> pass any subtype of A
- For some class List[+A], A is subtype of B then List[A] is a subtype of List[B]
- Contravariant -A -> we can pass any parent
- Writer[-A]: A is subtype of B, then Writer[B] is a subtype of Writer[A]
After 1st point
Question:
- How many of you like switch statements?
- if-else/switch
- Wherever possible i use cases; reason it looks somewhat good and get rid of extra equality check
- Clatter when variable name is large
- Traditional for constant value only
- 2nd point
- Show the examples
- Match is a keyword and used as an expression
- Must yield a result
- No case matches, exception(MatchError)
- Go through the slide to show the examples (2 slides)
- Explain what’s happening
- Explain what’s happening
- Talk about case classes:
- Almost like classes with some extra properties
- No new keyword req
- All case classes have so called companion objects; automatically created
- They have apply and unapply methods
- If using abstract class like in the example
- How to ensure all cases are covered; someone can extend this class anywhere else
- Sealed classes:
- Need to mention in that file only
- cover all possible classes
- Explain the example
- Talk about pattern guard
- We will get back to this example later too
- Truth table for tuple classes
- Imagine you writing if-else for this truth table
- Code is more concise, clean and readable
- Question before content
- Any idea what do they mean?
- Pure Functions work on their argument and yield a result
- What if there are result for only range of input
- Use if-else but we have to return something for that range right?
- show the next content
- Talk about the example
- Talk about the collection function
- Comes in handy for data analysis programs
- Various types of data; for the program we are concerned about very specific type
- Can’t go on adding if-else right?
- Guess the type
- Explain the example
- Show the type
Question
- What will happen when I uncomment the map function?
- Error!
- Why Error, explain
- Lets see how can be do it in a better way!
- With collect being a PF
- I have defined my function only for Ints; Hence the result would be a list of Int
- Now, the multiple option is available on Int; hence the result
- Just for illustration only
- Defiining a PF
- Last arg is the return type
- Remember the example on list, where we had to handle all the cases of Nil and non empty
- Scala is compatible with Java and it can make use of the vast extensive Java libraries
- Being OO, all the advantages are available for us but with a pinch of salt
- Usage is different from that of traditional OO
Being functional, we have an opportunity to create clear, transparent and complex concurrent applications
- Our code becomes:
- short
- easier to understand
- less boiler plate code, type inference
- extensive abstraction over data structures
- Compose and make your own abstractions and control structures
- Being statically typed
- Sure about refactorings
Scala provides opportunities to build your own DSL
- Design you own constructs
Actors
For comprehension
Generics
Extractors
Annotations
Self types(cake pattern) to remove DI(Dependency Injection)
Currying
Type system
Type specialization
XML Literals
DSLs
Scripting with Scala
Etc.
- Spark:
- It's an OS cluster computing framework
- API's to program with implicit data parallelism and fault tolerance
- They have this concept of RDDs(Resilient distributed dataset), which are immutable in itself
- All you can do with the dataset is to transform it or apply some actions on it
- Hence, they are inherently parallel and fault tolerant
- Generally, used in Data analysis needs
- Akka:
- If your business needs is to have many concurrent tasks and they have to synchronize between each other, then AKKA is you one-stop destination
- Set of OS libraries to design scalable, resilient systems that span across cores and network
- Gone are the days of low level mutex, atomics or locks
- All the overhead of handling concurrency is taken care by it
- Their core concept is Actor model, adhering to some simple pattern we are easily able to write transparent and concurrent code
- There are many packages
- FSM (I'm using this in my project)
- Akka Routes
- Akka Cluster
- Akka Persistence
- Akka Streams
- All of them are based on Actor model
- So, you guys go and read about Actor model
- Play
- How many of you have build Java web applications?
- This is something for you guys to explore
- Framework for building web applications using Scala and Java written in Scala
- It's also builtN on Akka
- It's totally stateless
- Integrated unit testing; support for JUnit and Selenium
- There are many big players who are using Scala
- So, if you look at the future of Scala, I think It's just warming up before the actual the actual game!!