"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer at H2O.ai.
Jakub Hava enjoys dealing with problems and learning new programming languages. At H2O.ai, Kuba works on Sparkling Water project and its integration with the rest of the H2O.ai ecosystem.
The speech includes a live demo showing how to create a Sparkling Water pipeline with H2O.ai's XGBoost model — no terminal needed, all we need is Jupyter!
Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn
Imagine how you are setting out to implement that awesome idea for a new application. In the back-end you enjoy the horizontal and vertical scalability offered by the Actor model, and its great support for building resilient systems through distribution and supervision hierarchies. In the front-end you love the declarative way of writing rich and interactive web apps that AngularJS gives you. In this presentation we bring these two together, demonstrating how little effort is needed to obtain a responsive user experience with fully consistent and persistent data storage on the server side.
See also http://summercamp.trivento.nl/
Do you know, being a Java dev, how to manage development environments with less effort? How to achieve continuous delivery using immutable server concept? How to manage set up a cloud within your workstation and many more? It might be the case you know, I bet it's much more easier to do with Docker.
ISWC 2017 In-Use paper. Despite the advantages of Linked Data as a data integration paradigm, accessing and consuming Linked Data is still a cumbersome task. Linked Data applications need to use technologies such as RDF and SPARQL that, despite their expressive power, belong to the data integration stack. As a result, applications and data cannot be cleanly separated: SPARQL queries, endpoint addresses, namespaces, and URIs end up as part of the application code. Many publishers address these problems by building RESTful APIs around their Linked Data. However, this solution has two pitfalls: these APIs are costly to maintain; and they blackbox functionality by hiding the queries they use. In this paper we describe grlc, a gateway between Linked Data applications and the LOD cloud that offers a RESTful, reusable and uniform means to routinely access any Linked Data. It generates an OpenAPI compatible API by using parametrized queries shared on the Web. The resulting APIs require no coding, rely on low-cost external query storage and versioning services, contain abundant provenance information, and integrate access to different publishing paradigms into a single API. We evaluate grlc qualitatively, by describing its reported value by current users; and quantitatively, by measuring the added overhead at generating API specifications and answering to calls.
Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn
Imagine how you are setting out to implement that awesome idea for a new application. In the back-end you enjoy the horizontal and vertical scalability offered by the Actor model, and its great support for building resilient systems through distribution and supervision hierarchies. In the front-end you love the declarative way of writing rich and interactive web apps that AngularJS gives you. In this presentation we bring these two together, demonstrating how little effort is needed to obtain a responsive user experience with fully consistent and persistent data storage on the server side.
See also http://summercamp.trivento.nl/
Do you know, being a Java dev, how to manage development environments with less effort? How to achieve continuous delivery using immutable server concept? How to manage set up a cloud within your workstation and many more? It might be the case you know, I bet it's much more easier to do with Docker.
ISWC 2017 In-Use paper. Despite the advantages of Linked Data as a data integration paradigm, accessing and consuming Linked Data is still a cumbersome task. Linked Data applications need to use technologies such as RDF and SPARQL that, despite their expressive power, belong to the data integration stack. As a result, applications and data cannot be cleanly separated: SPARQL queries, endpoint addresses, namespaces, and URIs end up as part of the application code. Many publishers address these problems by building RESTful APIs around their Linked Data. However, this solution has two pitfalls: these APIs are costly to maintain; and they blackbox functionality by hiding the queries they use. In this paper we describe grlc, a gateway between Linked Data applications and the LOD cloud that offers a RESTful, reusable and uniform means to routinely access any Linked Data. It generates an OpenAPI compatible API by using parametrized queries shared on the Web. The resulting APIs require no coding, rely on low-cost external query storage and versioning services, contain abundant provenance information, and integrate access to different publishing paradigms into a single API. We evaluate grlc qualitatively, by describing its reported value by current users; and quantitatively, by measuring the added overhead at generating API specifications and answering to calls.
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
This was a talk that Kelvin Chu and I just gave at the SF Bay Area Spark Meetup 5/14 at Palantir Technologies.
We discussed the Spark Job Server (http://github.com/ooyala/spark-jobserver), its history, example workflows, architecture, and exciting future plans to provide HA spark job contexts.
We also discussed the use case of the job server at Ooyala to facilitate fast query jobs using shared RDD and a shared job context, and how we integrate with Apache Cassandra.
12 Factor App: Best Practices for JVM DeploymentJoe Kutner
Twelve Factor apps are built for agility and rapid deployment. They enable continuous delivery and reduce the time and cost for new developers to join a project. At the same time, they are architected to exploit the principles of modern cloud platforms while permitting maximum portability between them. Finally, they can scale up without significant changes to tooling, architecture or development practices. In this talk, you’ll learn the principles and best practices espoused by the Twelve Factor app. We’ll discuss how to structure your code, manage dependencies, store configuration, run admin tasks, capture log files, and more. You’ll learn how modern Java deployments can benefit
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...Lightbend
Microservices. Streaming data. Event Sourcing and CQRS. Concurrency, routing, self-healing, persistence, clustering… You get the picture. The Akka toolkit makes all of this simple for Java and Scala developers at Amazon, LinkedIn, Starbucks, Verizon and others. So how does Akka provide all these features out of the box?
Join Hugh McKee, Akka expert and Developer Advocate at Lightbend, on an illustrated journey that goes deep into how Akka works–from individual Akka actors to fully distributed clusters across multiple datacenters.
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudLightbend
As the number of systems within an IT infrastructure increases, the number of integrations needed by enterprises also multiplies. Recognizing that the old times of overnight file exchanges are no longer meeting real-time demands, a well-organized enterprise integration strategy is a critical success factor when your systems need to be connected all day.
In this webinar with Enno Runne, Tech Lead for Alpakka at Lightbend, Inc., we’ll look at why integrations should be viewed as streams of data, and how Alpakka—a Reactive Enterprise Integration library for Java and Scala based on Reactive Streams and Akka—fits perfectly for today’s demands on system integrations. Specifically, we will review:
* How Alpakka brings streaming data flows directly to the surface, utilizing the features of Akka to tame the complexity of streams.
* Supported connectors for Amazon Web Services, Microsoft Azure, and Google Cloud, as well as others for event sourcing/persistence/DB technologies and traditional interfaces like FTP, HTTP, etc.
* A deeper look into the use cases for Alpakka’s most utilized interfaces to popular technologies like Apache Kafka, MQTT, and MongoDB.
Revitalizing Enterprise Integration with Reactive StreamsLightbend
With Viktor Klang, Deputy CTO Lightbend, Inc.
As software grows more and more interconnected, and with several generations of software having to interoperate, a new take on the integration of systems is needed—ad hoc, unversioned, and unreplicated scripts just won’t suffice, and the traditional Enterprise Service Bus (ESB) concept has experienced stability, reliability, performance, and scalability problems.
In this webinar, Viktor explores a new take on Enterprise Integration Patterns:
First, he will explore the Reactive Streams standard, an orchestration layer where transformations are standalone, composable, reusable, and—most importantly—using asynchronous flow-control—back pressure—to maintain predictable, stable, behavior over time.
Furthermore, he will go through how one-off workloads relate to continuous, and batch, workloads, and how they can be addressed by that very same orchestration layer.
Finally, he will review how this type of design achieves resilience, scalability, and ultimately—responsiveness.
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
Over the past couple of years, Scala has become a go-to language for building data processing applications, as evidenced by the emerging ecosystem of frameworks and tools including LinkedIn's Kafka, Twitter's Scalding and our own Snowplow project (https://github.com/snowplow/snowplow).
In this talk, Alex will draw on his experiences at Snowplow to explore how to build rock-sold data pipelines in Scala, highlighting a range of techniques including:
* Translating the Unix stdin/out/err pattern to stream processing
* "Railway oriented" programming using the Scalaz Validation
* Validating data structures with JSON Schema
* Visualizing event stream processing errors in ElasticSearch
Alex's talk draws on his experiences working with event streams in Scala over the last two and a half years at Snowplow, and by Alex's recent work penning Unified Log Processing, a Manning book.
Reactive Streams 1.0.0 is now live, and so are our implementations in Akka Streams 1.0 and Slick 3.0.
Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure—with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java.
Akka (recent winner of “Most Innovative Open Source Tech in 2015”) is a toolkit for building message-driven applications. With Akka Streams 1.0, Akka has incorporated a graphical DSL for composing data streams, an execution model that decouples the stream’s staged computation—it’s “blueprint”—from its execution (allowing for actor-based, single-threaded and fully distributed and clustered execution), type safe stream composition, an implementation of the Reactive Streaming specification that enables back-pressure, and more than 20 predefined stream “processing stages” that provide common streaming transformations that developers can tap into (for splitting streams, transforming streams, merging streams, and more).
Slick is a relational database query and access library for Scala that enables loose-coupling, minimal configuration requirements and abstraction of the complexities of connecting with relational databases. With Slick 3.0, Slick now supports the Reactive Streams API for providing asynchronous stream processing with non-blocking back-pressure. Slick 3.0 also allows elegant mapping across multiple data types, static verification and type inference for embedded SQL statements, compile-time error discovery, and JDBC support for interoperability with all existing drivers.
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
One of the most frequent questions that we get asked at Lightbend is “what’s the difference between Akka Streams and Kafka Streams?” After all, there is only a 1 letter difference between these two technologies, so how different could they be?
Well, as we see in this presentation, they are actually quite different. Both tools are part of the streaming Fast Data stack, but were created with entirely different technological approaches in mind. For example, While Akka Streams emerged as a dataflow-centric abstraction for the Akka Actor model, designed for general-purpose microservices, very low-latency event processing, and supports a wider class of application problems and third-party integrations via Alpakka, Kafka Streams is purpose-built for reading data from Kafka topics, processing it, and writing the results to new topics in a Kafka-centric way.
In this webinar by Dr. Dean Wampler, VP of Fast Data Engineering at Lightbend, we will:
* Discuss the strengths and weaknesses of Kafka Streams and Akka Streams for particular design needs in data-centric microservices
* Contrast them with Spark Streaming and Flink, which provide richer analytics over potentially huge data sets
* Help you map these streaming engines to your specific use cases, so you confidently pick the right ones for your jobs
This talk will help you understand better how AWS help you to create a serverless solution using SQS, Lambda function, API gateway. This talk shows how to use it both in the developer field and DevOps
Projects Valhalla and Loom at IT Tage 2021Vadym Kazulkin
In this presentation, we will explain the motivation, added values, challenges and current status of the Valhalla and Loom projects.
In the Valhalla project, Inline Type is introduced in Java. Inline Type is an immutable type that differs only by the state of its properties. The purpose is to reduce memory consumption and access times for such data types. Also as a part of this project Java type system will be unified so that Java will become a pure object-oriented programming language.
In the Loom project, lightweight threads are implemented in Java. The purpose is to no longer trade off between simplicity and scalability of the source code and to reconcile both.
Spark 2.0 is a major release of Apache Spark. This release has brought many changes to API(s) and libraries of Spark. So in this KnolX, we will be looking at some improvements that are made in Spark 2.0. Also, in these slides we will be getting an introduction to some new features in Spark 2,0 like SparkSession API and Structured Streaming.
Application development has come a long way. From client-server, to desktop, to web based applications served by monolithic application servers, the need to serve billions of users and hundreds of devices have become crucial to today's business. Typesafe Reactive Platform helps you to modernize your applications by transforming the most critical parts into microservice-style architectures which support extremely high workloads and allow you to serve millions of end-users.
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobLightbend
For many businesses, the batch-oriented architecture of Big Data–where data is captured in large, scalable stores, then processed later–is simply too slow: a new breed of “Fast Data” architectures has evolved to be stream-oriented, where data is processed as it arrives, providing businesses with a competitive advantage.
There are many stream processing tools, so which ones should you choose? It helps to consider several factors in the context of your applications:
* Low latency: How low (or high) is needed?
* High volume: How much volume must be handled?
* Integration with other tools: Which ones and how?
* Data processing: What kinds? In bulk? As individual events?
In this talk by Dean Wampler, PhD., VP of Fast Data Engineering at Lightbend, we’ll look at the criteria you need to consider when selecting technologies, plus specific examples of how four streaming tools–Akka Streams, Kafka Streams, Apache Flink and Apache Spark serve particular needs and use cases when working with continuous streams of data.
Event sourcing Live 2021: Streaming App Changes to Event StoreShivji Kumar Jha
This deck was used for the this talk at EventSourcing Live 2021 : https://lnkd.in/gbpshVA5
In the slides, we will go over identifying, capturing and delivering app changes to event stores. The event store can then be used as a data warehouse, data lake or a lakehouse. We will go over different ways to capture change data and deliver to an event store and the pros /cons of each.
Developing Secure Scala Applications With Fortify For ScalaLightbend
From banks to airlines to credit rating agencies, security continues to be a major focus for organizations across various industries. As the newspapers show, it’s heavily damaging to enterprises when security vulnerabilities in their code, infrastructure, or open source frameworks/libraries get exploited.
The good news is that your Scala development team now has a powerful ally for securing their applications. Co-developed by the Fortify team along with Lightbend, the upcoming Fortify for Scala Plugin is the only Static Application Security Testing (SAST) solution to use the official Scala compiler. This plugin automatically identifies code-level security vulnerabilities early in the SDLC, so you can confidently and reliably secure your mission-critical Scala-based applications.
In this webinar by Seth Tisue, Scala Committer and Senior Scala Engineer at Lightbend, and Poonam Yadav, Product Manager for Fortify at Micro Focus, you will learn about:
* Some of the more than 200 vulnerabilities that the Fortify plugin for Scala can catch and help you resolve,
* How the plugin works to analyze, identify and provide actionable recommendations,
* How to integrate it into your modern DevOps environment,
* Why this plugin was co-developed by Lightbend and the Fortify team, and how it benefits your organization’s security professionals / CISO office.
Making Scala Faster: 3 Expert Tips For Busy Development TeamsLightbend
In this special guest webinar with Mirco Dotta, co-founder of Triplequote LLC (the creators of Hydra), we take a deeper look into what affects Scala compilation speed, why a combination of language features, external libraries, and type annotations make compilation times generally unpredictable, and what you can do to speed it up by orders of magnitude. We’ll go through:
* Understanding some of the most common bottlenecks in Scala builds.
* Effective use of type class auto-derivation for cutting compilation times.
* What are some average compilation speeds, and how to know if you have a productivity blocker.
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Evan Chan
This was a talk that Kelvin Chu and I just gave at the SF Bay Area Spark Meetup 5/14 at Palantir Technologies.
We discussed the Spark Job Server (http://github.com/ooyala/spark-jobserver), its history, example workflows, architecture, and exciting future plans to provide HA spark job contexts.
We also discussed the use case of the job server at Ooyala to facilitate fast query jobs using shared RDD and a shared job context, and how we integrate with Apache Cassandra.
12 Factor App: Best Practices for JVM DeploymentJoe Kutner
Twelve Factor apps are built for agility and rapid deployment. They enable continuous delivery and reduce the time and cost for new developers to join a project. At the same time, they are architected to exploit the principles of modern cloud platforms while permitting maximum portability between them. Finally, they can scale up without significant changes to tooling, architecture or development practices. In this talk, you’ll learn the principles and best practices espoused by the Twelve Factor app. We’ll discuss how to structure your code, manage dependencies, store configuration, run admin tasks, capture log files, and more. You’ll learn how modern Java deployments can benefit
Akka A to Z: A Guide To The Industry’s Best Toolkit for Fast Data and Microse...Lightbend
Microservices. Streaming data. Event Sourcing and CQRS. Concurrency, routing, self-healing, persistence, clustering… You get the picture. The Akka toolkit makes all of this simple for Java and Scala developers at Amazon, LinkedIn, Starbucks, Verizon and others. So how does Akka provide all these features out of the box?
Join Hugh McKee, Akka expert and Developer Advocate at Lightbend, on an illustrated journey that goes deep into how Akka works–from individual Akka actors to fully distributed clusters across multiple datacenters.
Pakk Your Alpakka: Reactive Streams Integrations For AWS, Azure, & Google CloudLightbend
As the number of systems within an IT infrastructure increases, the number of integrations needed by enterprises also multiplies. Recognizing that the old times of overnight file exchanges are no longer meeting real-time demands, a well-organized enterprise integration strategy is a critical success factor when your systems need to be connected all day.
In this webinar with Enno Runne, Tech Lead for Alpakka at Lightbend, Inc., we’ll look at why integrations should be viewed as streams of data, and how Alpakka—a Reactive Enterprise Integration library for Java and Scala based on Reactive Streams and Akka—fits perfectly for today’s demands on system integrations. Specifically, we will review:
* How Alpakka brings streaming data flows directly to the surface, utilizing the features of Akka to tame the complexity of streams.
* Supported connectors for Amazon Web Services, Microsoft Azure, and Google Cloud, as well as others for event sourcing/persistence/DB technologies and traditional interfaces like FTP, HTTP, etc.
* A deeper look into the use cases for Alpakka’s most utilized interfaces to popular technologies like Apache Kafka, MQTT, and MongoDB.
Revitalizing Enterprise Integration with Reactive StreamsLightbend
With Viktor Klang, Deputy CTO Lightbend, Inc.
As software grows more and more interconnected, and with several generations of software having to interoperate, a new take on the integration of systems is needed—ad hoc, unversioned, and unreplicated scripts just won’t suffice, and the traditional Enterprise Service Bus (ESB) concept has experienced stability, reliability, performance, and scalability problems.
In this webinar, Viktor explores a new take on Enterprise Integration Patterns:
First, he will explore the Reactive Streams standard, an orchestration layer where transformations are standalone, composable, reusable, and—most importantly—using asynchronous flow-control—back pressure—to maintain predictable, stable, behavior over time.
Furthermore, he will go through how one-off workloads relate to continuous, and batch, workloads, and how they can be addressed by that very same orchestration layer.
Finally, he will review how this type of design achieves resilience, scalability, and ultimately—responsiveness.
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
Over the past couple of years, Scala has become a go-to language for building data processing applications, as evidenced by the emerging ecosystem of frameworks and tools including LinkedIn's Kafka, Twitter's Scalding and our own Snowplow project (https://github.com/snowplow/snowplow).
In this talk, Alex will draw on his experiences at Snowplow to explore how to build rock-sold data pipelines in Scala, highlighting a range of techniques including:
* Translating the Unix stdin/out/err pattern to stream processing
* "Railway oriented" programming using the Scalaz Validation
* Validating data structures with JSON Schema
* Visualizing event stream processing errors in ElasticSearch
Alex's talk draws on his experiences working with event streams in Scala over the last two and a half years at Snowplow, and by Alex's recent work penning Unified Log Processing, a Manning book.
Reactive Streams 1.0.0 is now live, and so are our implementations in Akka Streams 1.0 and Slick 3.0.
Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure—with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java.
Akka (recent winner of “Most Innovative Open Source Tech in 2015”) is a toolkit for building message-driven applications. With Akka Streams 1.0, Akka has incorporated a graphical DSL for composing data streams, an execution model that decouples the stream’s staged computation—it’s “blueprint”—from its execution (allowing for actor-based, single-threaded and fully distributed and clustered execution), type safe stream composition, an implementation of the Reactive Streaming specification that enables back-pressure, and more than 20 predefined stream “processing stages” that provide common streaming transformations that developers can tap into (for splitting streams, transforming streams, merging streams, and more).
Slick is a relational database query and access library for Scala that enables loose-coupling, minimal configuration requirements and abstraction of the complexities of connecting with relational databases. With Slick 3.0, Slick now supports the Reactive Streams API for providing asynchronous stream processing with non-blocking back-pressure. Slick 3.0 also allows elegant mapping across multiple data types, static verification and type inference for embedded SQL statements, compile-time error discovery, and JDBC support for interoperability with all existing drivers.
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
One of the most frequent questions that we get asked at Lightbend is “what’s the difference between Akka Streams and Kafka Streams?” After all, there is only a 1 letter difference between these two technologies, so how different could they be?
Well, as we see in this presentation, they are actually quite different. Both tools are part of the streaming Fast Data stack, but were created with entirely different technological approaches in mind. For example, While Akka Streams emerged as a dataflow-centric abstraction for the Akka Actor model, designed for general-purpose microservices, very low-latency event processing, and supports a wider class of application problems and third-party integrations via Alpakka, Kafka Streams is purpose-built for reading data from Kafka topics, processing it, and writing the results to new topics in a Kafka-centric way.
In this webinar by Dr. Dean Wampler, VP of Fast Data Engineering at Lightbend, we will:
* Discuss the strengths and weaknesses of Kafka Streams and Akka Streams for particular design needs in data-centric microservices
* Contrast them with Spark Streaming and Flink, which provide richer analytics over potentially huge data sets
* Help you map these streaming engines to your specific use cases, so you confidently pick the right ones for your jobs
This talk will help you understand better how AWS help you to create a serverless solution using SQS, Lambda function, API gateway. This talk shows how to use it both in the developer field and DevOps
Projects Valhalla and Loom at IT Tage 2021Vadym Kazulkin
In this presentation, we will explain the motivation, added values, challenges and current status of the Valhalla and Loom projects.
In the Valhalla project, Inline Type is introduced in Java. Inline Type is an immutable type that differs only by the state of its properties. The purpose is to reduce memory consumption and access times for such data types. Also as a part of this project Java type system will be unified so that Java will become a pure object-oriented programming language.
In the Loom project, lightweight threads are implemented in Java. The purpose is to no longer trade off between simplicity and scalability of the source code and to reconcile both.
Spark 2.0 is a major release of Apache Spark. This release has brought many changes to API(s) and libraries of Spark. So in this KnolX, we will be looking at some improvements that are made in Spark 2.0. Also, in these slides we will be getting an introduction to some new features in Spark 2,0 like SparkSession API and Structured Streaming.
Application development has come a long way. From client-server, to desktop, to web based applications served by monolithic application servers, the need to serve billions of users and hundreds of devices have become crucial to today's business. Typesafe Reactive Platform helps you to modernize your applications by transforming the most critical parts into microservice-style architectures which support extremely high workloads and allow you to serve millions of end-users.
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobLightbend
For many businesses, the batch-oriented architecture of Big Data–where data is captured in large, scalable stores, then processed later–is simply too slow: a new breed of “Fast Data” architectures has evolved to be stream-oriented, where data is processed as it arrives, providing businesses with a competitive advantage.
There are many stream processing tools, so which ones should you choose? It helps to consider several factors in the context of your applications:
* Low latency: How low (or high) is needed?
* High volume: How much volume must be handled?
* Integration with other tools: Which ones and how?
* Data processing: What kinds? In bulk? As individual events?
In this talk by Dean Wampler, PhD., VP of Fast Data Engineering at Lightbend, we’ll look at the criteria you need to consider when selecting technologies, plus specific examples of how four streaming tools–Akka Streams, Kafka Streams, Apache Flink and Apache Spark serve particular needs and use cases when working with continuous streams of data.
Event sourcing Live 2021: Streaming App Changes to Event StoreShivji Kumar Jha
This deck was used for the this talk at EventSourcing Live 2021 : https://lnkd.in/gbpshVA5
In the slides, we will go over identifying, capturing and delivering app changes to event stores. The event store can then be used as a data warehouse, data lake or a lakehouse. We will go over different ways to capture change data and deliver to an event store and the pros /cons of each.
Developing Secure Scala Applications With Fortify For ScalaLightbend
From banks to airlines to credit rating agencies, security continues to be a major focus for organizations across various industries. As the newspapers show, it’s heavily damaging to enterprises when security vulnerabilities in their code, infrastructure, or open source frameworks/libraries get exploited.
The good news is that your Scala development team now has a powerful ally for securing their applications. Co-developed by the Fortify team along with Lightbend, the upcoming Fortify for Scala Plugin is the only Static Application Security Testing (SAST) solution to use the official Scala compiler. This plugin automatically identifies code-level security vulnerabilities early in the SDLC, so you can confidently and reliably secure your mission-critical Scala-based applications.
In this webinar by Seth Tisue, Scala Committer and Senior Scala Engineer at Lightbend, and Poonam Yadav, Product Manager for Fortify at Micro Focus, you will learn about:
* Some of the more than 200 vulnerabilities that the Fortify plugin for Scala can catch and help you resolve,
* How the plugin works to analyze, identify and provide actionable recommendations,
* How to integrate it into your modern DevOps environment,
* Why this plugin was co-developed by Lightbend and the Fortify team, and how it benefits your organization’s security professionals / CISO office.
Making Scala Faster: 3 Expert Tips For Busy Development TeamsLightbend
In this special guest webinar with Mirco Dotta, co-founder of Triplequote LLC (the creators of Hydra), we take a deeper look into what affects Scala compilation speed, why a combination of language features, external libraries, and type annotations make compilation times generally unpredictable, and what you can do to speed it up by orders of magnitude. We’ll go through:
* Understanding some of the most common bottlenecks in Scala builds.
* Effective use of type class auto-derivation for cutting compilation times.
* What are some average compilation speeds, and how to know if you have a productivity blocker.
Whirlpools in the Stream with Jayesh LalwaniDatabricks
At Capital One, we use Spark to detect Fraud. Recently we have started implementing real-time fraud detection using machine learnt models. One of Capital One’s fraud detection micro services was an early adopter of Structured Streaming. As part of this implementation, the micro service ran into several roadblocks. In this talk, we describe those roadblocks and how we got around them.
Caching of lookup data
Dilemma: Store state in Spark vs store state in database
Retrieve state from database efficiently
Non-homogenous data sources
Aggregations in the stream
Checkpointing fumbles
Checkpointing performance and instabilities
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
Intro to Machine Learning with H2O and AWSSri Ambati
Navdeep Gill @ Galvanize Seattle- May 2016
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Introduce to Apache Beam
Dive in to Beam's architecture and live demo running data pipeline on different runners such as Google Dataflow, Flink and Spark
Our new product (Clicktale Experience cloud) requires processing up to half a million messages per second, sessionizing each "users" journey throughout a web page. In this talk we'll discuss how we have achieved that using Spark's stateful streaming capabilities with only few servers in production, the challenges we've faced and how we've solved them. We'll also take a look at Spark 2.2 (the brand new version) and its new stateful aggregation and talk about how we've used it in order to improve performance significantly.
Data Science at Scale with Apache Spark and Zeppelin NotebookCarolyn Duby
How to get started with Data Science at scale using Apache Spark to clean, analyze, discover, and build models on large data sets. Use Zeppelin to record analysis to encourage peer review and reuse of analysis techniques.
From R Script to Production Using rsparkling with Navdeep GillDatabricks
The rsparkling R package is an extension package for sparklyr (an R interface for Apache Spark) that creates an R front-end for the Sparkling Water Spark package from H2O. This provides an interface to H2O’s high performance, distributed machine learning algorithms on Spark, using R. The main purpose of this package is to provide a connector between sparklyr and H2O’s machine learning algorithms.
In this session, Gill will introduce the basic architectures of rsparkling, H2O Sparkling Water and sparklyr, and go over how these frameworks work together to build a cohesive machine learning framework. In addition, you’ll learn about various implementations for using rsparkling in production. The session will conclude with a live demo of rsparkling that will display an end-to-end use case of data ingestion, munging and machine learning.
Two popular tools for doing Machine Learning on top of JVM ecosystem is H2O and SparkML. This presentation compares these two tools as Machine Learning libraries (Didn't consider Spark's Data Munjing perspective). This work was done during June of 2018.
H2O Rains with Databricks Cloud - NY 02.16.16Sri Ambati
Michal Malohlava's presentation on H2O Rains with Databricks Cloud, New York, NY 02.16.16
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O Rains with Databricks Cloud - Parisoma SFSri Ambati
Michal Malohlava's meetup on H2O Rains with Databricks Cloud at Parisoma SF, 02.02.16
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Spark, the ultra-fast, general purpose big data computing platform provides some very flexible options for processing and accessing data. In a previous meetup we covered PySpark and the Schema RDD. In this session we reviewed and expanded on this, with an in-depth exploration of Spark SQL.
- Overview of Spark in the Hadoop ecosystem
- Deep dive into Spark SQL with step by steps on how to implement and use it
If you have questions about the presentation or want to learn more about our services, please visit our website: http://casertaconcepts.com/
Similar to "Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at H2O.ai (20)
Looking to make your document processing operations more effective and cost-efficient with AI/ML? Learn from the experts of Provectus and Amazon Web Services (AWS) how to choose the right solution for your company! We will look into the management and engineering perspectives of AI document processing, from industry use cases and the solution map to our unique methodology for assessing available document processing solutions to Provectus IDP. Whether you are looking for a ready-made solution or you plan to build a custom solution of your own, this webinar will help you find the best option for your business.
Agenda
- Introductions
- Industry use cases
- Intelligent Document Processing (IDP) overview
- IDP Solutions map
- AWS IDP Solution
- Provectus IDP Platform
- Q&A
Intended Audience
Technology executives and decision makers, including such roles as CIO, CCO, COO, and CDO; digital transformation managers; data and ML engineers.
Presenters
Almir Davletov, IDP Subject Matter Expert, Provectus
Yaroslav Tarasyuk, Business Development, Provectus
Sonali Sahu, Sr. Solutions Architect, AWS
Interested? Learn more about Provectus Intelligent Document Processing Solution: https://provectus.com/document-processing-solution/
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Provectus
Healthcare organizations generate piles of documents and forms in different formats, making it difficult to achieve operational excellence and streamline business processes. Manual entry and OCR are no longer viable, and healthcare entities are looking for new solutions to handle documents.
In this presentation you can learn about:
- Healthcare document types and use cases
- IDP framework: building blocks for document processing solutions
- The document processing market landscape
- Methodology for solution evaluation: comparing apples to apples
Whether you are looking for a ready-made solution or plan to build a custom solution of your own, this webinar will help you find the best fit for your healthcare use cases.
Choosing the Right Document Processing Solution for Healthcare OrganizationsProvectus
Looking to automate document processing in your healthcare organization? Learn from Provectus & AWS experts how to make data capture, conversion, and analytics more efficient. Process and manage documents faster and on a larger scale with AI & Machine Learning.
In this presentation, we offer management and engineering perspectives on document processing with AI, to help you explore available options. Whether you are looking for a ready-made solution or plan to build a custom solution of your own, this webinar will help you find the best fit for your healthcare use cases.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
AI Stack on AWS: Amazon SageMaker and BeyondProvectus
Looking to learn more about AWS AI stack? Join experts from Provectus & AWS to find out how to use Amazon SageMaker (with combination with other tools and services) to enable enterprise-wide AI.
Companies are looking to scale and become more productive when it comes to AI and data initiatives. They seek to launch AI projects more rapidly, which, among many other factors, requires a robust machine learning infrastructure. In this webinar, you will learn how to create a canonical SageMaker workflow, expand the SageMaker workflow to a holistic implementation, enhance and expand the implementation using best practices for feature store, data versioning, ML pipeline orchestration, and model monitoring.
Agenda
- Introductions
- Amazon SageMaker Overview
- Real-World Use Case
- Data Lake for Machine Learning
- Amazon SageMaker Experiments
- Orchestration Beyond SageMaker Experiments
- Amazon SageMaker Debugger
- Amazon SageMaker Model Monitor
- Webinar Takeaways
Intended audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Pritpal Sahota, Technical Account Manager, Provectus
- Christopher A. Burns, Sr. AI/ML Solution Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/ai-stack-on-aws-sagemaker-and-beyond-mar-2020/
Feature Store as a Data Foundation for Machine LearningProvectus
Looking to design and build a centralized, scalable Feature Store for your Data Science & Machine Learning teams to take advantage of? Come and learn from experts of Provectus and Amazon Web Services (AWS) how to!
Feature Store is a key component of the ML stack and data infrastructure, which enables feature engineering and management. By having a Feature Store, organizations can save massive amounts of resources, innovate faster, and drive ML processes at scale. In this webinar, you will learn how to build a Feature Store with a data mesh pattern and see how to achieve consistency between real-time and training features, to improve reproducibility with time-traveling for data.
Agenda
- Modern Data Lakes & Modern ML Infrastructure
- Existing and Emerging Architectural Shifts
- Feature Store: Overview and Reference Architecture
- AWS Perspective on Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data architects & analysts, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Gandhi Raketla, Senior Solutions Architect, AWS
- German Osin, Senior Solutions Architect, Provectus
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-feature-store-as-data-foundation-for-ml-nov-2020/
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
Looking to implement MLOps using AWS services and Kubeflow? Come and learn about machine learning from the experts of Provectus and Amazon Web Services (AWS)!
Businesses recognize that machine learning projects are important but go beyond just building and deploying models, which is mostly done by organizations. Successful ML projects entail a complete lifecycle involving ML, DevOps, and data engineering and are built on top of ML infrastructure.
AWS and Amazon SageMaker provide a foundation for building infrastructure for machine learning while Kubeflow is a great open source project, which is not given enough credit in the AWS community. In this webinar, we show how to design and build an end-to-end ML infrastructure on AWS.
Agenda
- Introductions
- Case Study: GoCheck Kids
- Overview of AWS Infrastructure for Machine Learning
- Provectus ML Infrastructure on AWS
- Experimentation
- MLOps
- Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Qingwei Li, ML Specialist Solutions Architect, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-mlops-and-reproducible-ml-on-aws-with-kubeflow-and-sagemaker-aug-2020/
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
Considering new ways and options for reducing operational costs and scaling flexibility of your Apache Hadoop/Spark? Try migrating to Amazon EMR!
On-premises Apache Hadoop/Spark clusters are among the top sources of financial pressure for businesses. IT organizations want to reduce spend while still meeting demand, to keep their legacy data applications up and running. Come and learn from experts at Provectus & AWS how you can use Amazon EMR to start driving cost efficiencies in your organization!
Agenda
- Hadoop market and cost optimizations using Amazon EMR
- Cost related and other challenges of on-prem Hadoop clusters
- Cost optimizations by using Amazon EMR and migration best practices
Intended audience
Technology executives & decision makers, manager-level tech roles, data engineers & data scientists, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Pritpal Sahota, Technical Account Manager, Provectus
- Nirav Shah, Senior Solutions Architect, AWS
- Perry Peterson, Business Development Manager, AWS
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/cost-optimization-for-apache-hadoop-spark-workloads-with-amazon-emr-june-2020/
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
What's a machine learning workflow? What open source tools can you use to automate ML workflow?
Reproducible ML pipelines in research and production with monitoring insights from live inference clusters could enable and accelerate the delivery of AI solutions for enterprises. There is a growing ecosystem of tools that augment researchers and machine learning engineers in their day to day operations.
Still, there are big gaps in the machine learning workflow when it comes to training dataset versioning, training performance and metadata tracking, integration testing, inferencing quality monitoring, bias detection, concept drift detection and other aspects that prevent the adoption of AI in organizations of all sizes.
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...Provectus
AWS Dev Day Kyiv 2019
Track: Analytics & Machine Learning
Session: "Building a Modern Data platform in the Cloud"
Speaker: Alex Casalboni, AWS Technical Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/HIDnAG9AxZo
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...Provectus
AWS Dev Day Kyiv 2019
Track: Modern Application Development
Session: "How to build a global serverless service"
Speaker: Alex Casalboni, AWS Technical Evangelist
Level: 400
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/Q19B-NTkMfk
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...Provectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: "Automating AWS Infrastructure with PowerShell"
Speaker: Martin Beeby, AWS Principle Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/rgIjjK2J4dQ
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...Provectus
AWS Dev Day Kyiv 2019
Track: Analytics & Machine Learning
Session: "Analyzing your web and application logs"
Speaker: Javier Ramirez, AWS Technical Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/IpEhEs1sXeg
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...Provectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: "Resiliency and Availability Design Patterns for the Cloud"
Speaker: Sebastien Stormacq, AWS Technical Evangelist
Level: 400
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/O8gonQCJawU
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...Provectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: ""Architecting SaaS solutions on AWS""
Speaker: Oleksandr Mykhalchuk, Director of DevOps & Cloud Services at Softserve
Level: 300
Video: https://youtu.be/3lKoe-ts8Qs
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
AWS Dev Day Kyiv 2019
Track: Modern Application Development
Session: "Developing with .NET Core on AWS"
Speaker: Martin Beeby, AWS Principle Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/OzM8L7H1LmA
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: "How to build real-time backends"
Speaker: Martin Beeby, AWS Principle Evangelist
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://youtu.be/bsZYA6V3bDA
"Integrate your front end apps with serverless backend in the cloud", Sebasti...Provectus
AWS Dev Day Kyiv 2019
Track: Modern Application Development
Session: "Integrate your front end apps with serverless backend in the cloud"
Speaker: Sebastien Stormacq, AWS Technical Evangelist
Level: 200
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://www.youtube.com/watch?v=6z43H11qoU8&t=1s
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019Provectus
AWS Dev Day Kyiv 2019
Track: Analytics & Machine Learning
Session: ""Scaling ML from 0 to millions of users""
Speaker: Julien Simon, Global AI & Machine Learning Evangelist at AWS
Level: 300
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
Video: https://www.youtube.com/watch?v=N73u1mx9DqY
How to implement authorization in your backend with AWS IAMProvectus
AWS Dev Day Kyiv 2019
Track: Backend & Architecture
Session: ""How to implement authorization in your backend with AWS IAM""
Speaker: Stas Ivaschenko, AWS solutions architect at Provectus
Level: 400
Video: https://www.youtube.com/watch?v=4Jje_WJ4V7Q
AWS Dev Day is a free, full-day technical event where new developers will learn about some of the hottest topics in cloud computing, and experienced developers can dive deep on newer AWS services.
Provectus has organized AWS Dev Day Kyiv in close collaboration with Amazon Web Services: 800+ participants, 18 sessions, 3 tracks, a really AWSome Day!
Now, together with Zeo Alliance, we're building and nurturing AWS User Group Ukraine — join us on Facebook to stay updated about cloud technologies and AWS services: https://www.facebook.com/groups/AWSUserGroupUkraine
"
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at H2O.ai
1. The Next Generation of
Machine Learning on
Apache Spark
Jakub Háva
jakub@h2o.ai
Provectus Meetup, Kyiv
Oct 12, 2018
2. Who am I
• Senior Software engineer at H2O.ai - Core Sparkling Water
• Finished Master’s at Charles Uni in Prague
• Implemented high-performance cluster monitoring tool for JVM
based languages ( JNI, JVMTI, instrumentation )
• Climbing & Tea lover, ( doesn’t mean I don’t like beer!)
3. H2O Products
In-Memory, Distributed
Machine Learning Algorithms
with H2O Flow GUI
H2O AI Open Source Engine
Integration with Spark
Lightning Fast machine
learning on GPUs
Automatic feature
engineering, machine learning
and interpretability
Secure multi-tenant H2O clusters
H2O AI Open Source Engine
Integration with Spark
6. Sparkling Water
• Transparent integration of H2O with Spark ecosystem - MLlib and
H2O side-by-side
• Transparent use of H2O data structures and algorithms with Spark
API
• Platform for building Smarter Applications
• Excels in existing Spark workflows requiring advanced Machine
Learning algorithms
Functionality missing in H2O can be
replaced by Spark and vice versa
7. Benefits
• Additional algorithms
– NLP
• Powerful data munging
• ML Pipelines
• Advanced algorithms
• speed v. accuracy
• advanced parameters
• Fully distributed and parallelised
• Graphical environment
• R/Python interface
13. Scoring
• POJO
o Plain Old Java Object
• MOJO
o Model Object Optimised
• MOJO Pipeline
o MOJO from DAI with additional future transformations
• No runtime dependency on H2O framework
15. Scala code in H2O Flow
• New type of cell
• Access Spark from Flow UI
• Experimenting made easy
16. H2O Frame as Spark’s Datasource
• Use native Spark API to load and save data
• Spark can optimise the queries when loading data from
H2O Frame
• Use of Spark query optimiser
17. Machine learning pipelines
• Wrap our algorithms as Transformers and Estimators
• Support for embedding them into Spark ML Pipelines
• Can serialise fitted/unfitted pipelines
• Unified API => Arguments are set in the same way for
Spark and H2O Models
• Integration with Mojo pipelines
18. PySparkling
• PySparkling is on PyPi
• Contains all H2O and Sparkling Water dependencies, no
need to worry about them
• Just add in on your Python path and that’s it
• Correctly specifies dependency on PySpark
19. RSparkling
• Sparkling Water for R
• Based on Sparklyr package
• Independent on Spark and Sparkling Water version
20. And others!
• Support for Datasets
• Zeppelin notebook support
• XGBoost Support in Distributed Mode
• Support for high-cardinality fast joins
• Secure Communication - SSL
• Support for Sparse Data conversions
• External Cluster Stabilisation
• Enterprise security support - LDAP, Kerberos
• Driverless AI Pipeline deployment availability
23. Two Backends
• Sparkling Water has two backends
o External
o Internal
• The backend determines where H2O cluster is located
• Each backend is good for different use-cases
28. Pros & Cons
• Advantages
• Easy to configure
• Faster ( no need to send data to different cluster)
• Disadvantages
• Spark kills or joins new executor => H2O goes down
• No way how to discover all executors
30. Overview
• Sparkling Water is using external H2O cluster instead of
starting H2O in each executor
• Spark executors can come and go and H2O won’t be
affected
• Start H2O cluster on YARN automatically
31. Separation Approach
• Separating Spark and H2O
• But preserving same API:
val h2oContext = H2OContext.getOrCreate(“http://h2o:54321/)
• Spark and H2O can be submitted as Yarn job and controlled in
separation
• But H2O still needs non-elastic environment (H2O itself does not
implement HA yet)
34. Pros & Cons
• Advantages
• H2O does not crash when Spark executor goes down
• Better resource management since resources can be planned per
tool
• Disadvantages
• Transfer overhead between Spark and H2O processes
• under measurement with cooperation of a customer
35. Modes
• Auto Start mode
• Start h2o cluster automatically on YARN
• Manual Start Mode
• User is responsible for starting the cluster manually
39. Learn more at h2o.ai
Follow us at @h2oai
PS: We are hiring!
Thank you!
Sparkling Water is
open-source
ML application platform
combining
power of Spark and H2O