Empowering Zillow’s Developers with Self-Service ETLDatabricks
As the amount of data and the number of unique data sources within an organization grow, handling the volume of new pipeline requests becomes difficult. Not all new pipeline requests are created equal — some are for business-critical datasets, others are for routine data preparation, and others are for experimental transformations that allow data scientists to iterate quickly on their solutions.
To meet the growing demand for new data pipelines, Zillow created multiple self-service solutions that enable any team to build, maintain, and monitor their data pipelines. These tools abstract away the orchestration, deployment, and Apache Spark processing implementation from their respective users. In this talk, Zillow engineers discuss two internal platforms they created to address the specific needs of two distinct user groups: data analysts and data producers. Each platform addresses the use cases of its intended user, leverages internal services through its modular design, and empowers users to create their own ETL without having to worry about how the ETL is implemented.
Members of Zillow’s data engineering team discuss:
Why they created two separate user interfaces to meet the needs different user groups
What degree of abstraction from the orchestration, deployment, processing, and other ancillary tasks that chose for each user group
How they leveraged internal services and packages, including their Apache Spark package — Pipeler, to democratize the creation of high-quality, reliable pipelines within Zillow
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
In this session, we will discuss:
* reactive architecture tenets
* distributed “fast data” streams
* application and analytics focused Data Lake
Enterprise level concerns and the importance of holistic governance, operational management, and a Metadata Lake will be conceptually investigated. The next level of detail will be to explore what a prospective architecture looks like at scale with Terabytes of ingestion per day, how scale puts pressure on an architecture, and how to be successful without losing data in a mission critical system via resilient, self-healing, scalable technologies. DevOps and application architecture concerns will be first-class themes throughout.
Reactive principles and technology will be the second act of this talk. Kafka. Akka. Spark. Various streaming technologies (Kafka Streams, Akka Streams, Spark Streaming) will be reviewed to identify what they are best suited for. The fast data pipeline discussion will center around Kafka, Akka, and Apache Flink (Lightbend Fast Data platform). We’ll also walk through an exciting addition to the Akka family, Alpakka, which is a Camel equivalent for Enterprise Integration Patterns.
The final act will be to dive into the Data Lake, from both an analytics and application development perspective. Technologies used to explain concepts will include Amazon and Hadoop. A Data Lake may service multiple analytics consumers with various “views” (and access levels) of data. It may also be a participant of various applications, perhaps by acting as a centralized source for reference data or common middleware (in turn feeding the analytics aspect). The concept of the Metadata Lake to apply structure, meaning and purpose will be an over-arching success factor for a Data Lake. The difference between the Data Lake and Metadata Lake is conceptually similar to a Halocline… Various technologies (Iglu/Snowplow and more) will be discussed from a feature standpoint to flesh out the technology capabilities needed for Data Lake governance.
Akka at Enterprise Scale: Performance Tuning Distributed ApplicationsLightbend
Organizations like Starbucks, HPE, and PayPal (see our customers) have selected the Akka toolkit for their enterprise scale distributed applications; and when it comes to squeezing out the best possible performance, the secret is using two particular modules in tandem: Akka Cluster and Akka Streams.
In this webinar by Nolan Grace, Senior Solution Architect at Lightbend, we look at these two Akka modules and discuss the features that will push your application architecture to the next tier of performance.
For the full blog post, including the video, visit: https://www.lightbend.com/blog/akka-at-enterprise-scale-performance-tuning-distributed-applications
Reactive Streams 1.0.0 is now live, and so are our implementations in Akka Streams 1.0 and Slick 3.0.
Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure—with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java.
Akka (recent winner of “Most Innovative Open Source Tech in 2015”) is a toolkit for building message-driven applications. With Akka Streams 1.0, Akka has incorporated a graphical DSL for composing data streams, an execution model that decouples the stream’s staged computation—it’s “blueprint”—from its execution (allowing for actor-based, single-threaded and fully distributed and clustered execution), type safe stream composition, an implementation of the Reactive Streaming specification that enables back-pressure, and more than 20 predefined stream “processing stages” that provide common streaming transformations that developers can tap into (for splitting streams, transforming streams, merging streams, and more).
Slick is a relational database query and access library for Scala that enables loose-coupling, minimal configuration requirements and abstraction of the complexities of connecting with relational databases. With Slick 3.0, Slick now supports the Reactive Streams API for providing asynchronous stream processing with non-blocking back-pressure. Slick 3.0 also allows elegant mapping across multiple data types, static verification and type inference for embedded SQL statements, compile-time error discovery, and JDBC support for interoperability with all existing drivers.
Empowering Zillow’s Developers with Self-Service ETLDatabricks
As the amount of data and the number of unique data sources within an organization grow, handling the volume of new pipeline requests becomes difficult. Not all new pipeline requests are created equal — some are for business-critical datasets, others are for routine data preparation, and others are for experimental transformations that allow data scientists to iterate quickly on their solutions.
To meet the growing demand for new data pipelines, Zillow created multiple self-service solutions that enable any team to build, maintain, and monitor their data pipelines. These tools abstract away the orchestration, deployment, and Apache Spark processing implementation from their respective users. In this talk, Zillow engineers discuss two internal platforms they created to address the specific needs of two distinct user groups: data analysts and data producers. Each platform addresses the use cases of its intended user, leverages internal services through its modular design, and empowers users to create their own ETL without having to worry about how the ETL is implemented.
Members of Zillow’s data engineering team discuss:
Why they created two separate user interfaces to meet the needs different user groups
What degree of abstraction from the orchestration, deployment, processing, and other ancillary tasks that chose for each user group
How they leveraged internal services and packages, including their Apache Spark package — Pipeler, to democratize the creation of high-quality, reliable pipelines within Zillow
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
In this session, we will discuss:
* reactive architecture tenets
* distributed “fast data” streams
* application and analytics focused Data Lake
Enterprise level concerns and the importance of holistic governance, operational management, and a Metadata Lake will be conceptually investigated. The next level of detail will be to explore what a prospective architecture looks like at scale with Terabytes of ingestion per day, how scale puts pressure on an architecture, and how to be successful without losing data in a mission critical system via resilient, self-healing, scalable technologies. DevOps and application architecture concerns will be first-class themes throughout.
Reactive principles and technology will be the second act of this talk. Kafka. Akka. Spark. Various streaming technologies (Kafka Streams, Akka Streams, Spark Streaming) will be reviewed to identify what they are best suited for. The fast data pipeline discussion will center around Kafka, Akka, and Apache Flink (Lightbend Fast Data platform). We’ll also walk through an exciting addition to the Akka family, Alpakka, which is a Camel equivalent for Enterprise Integration Patterns.
The final act will be to dive into the Data Lake, from both an analytics and application development perspective. Technologies used to explain concepts will include Amazon and Hadoop. A Data Lake may service multiple analytics consumers with various “views” (and access levels) of data. It may also be a participant of various applications, perhaps by acting as a centralized source for reference data or common middleware (in turn feeding the analytics aspect). The concept of the Metadata Lake to apply structure, meaning and purpose will be an over-arching success factor for a Data Lake. The difference between the Data Lake and Metadata Lake is conceptually similar to a Halocline… Various technologies (Iglu/Snowplow and more) will be discussed from a feature standpoint to flesh out the technology capabilities needed for Data Lake governance.
Akka at Enterprise Scale: Performance Tuning Distributed ApplicationsLightbend
Organizations like Starbucks, HPE, and PayPal (see our customers) have selected the Akka toolkit for their enterprise scale distributed applications; and when it comes to squeezing out the best possible performance, the secret is using two particular modules in tandem: Akka Cluster and Akka Streams.
In this webinar by Nolan Grace, Senior Solution Architect at Lightbend, we look at these two Akka modules and discuss the features that will push your application architecture to the next tier of performance.
For the full blog post, including the video, visit: https://www.lightbend.com/blog/akka-at-enterprise-scale-performance-tuning-distributed-applications
Reactive Streams 1.0.0 is now live, and so are our implementations in Akka Streams 1.0 and Slick 3.0.
Reactive Streams is an engineering collaboration between heavy hitters in the area of streaming data on the JVM. With the Reactive Streams Special Interest Group, we set out to standardize a common ground for achieving statically-typed, high-performance, low latency, asynchronous streams of data with built-in non-blocking back pressure—with the goal of creating a vibrant ecosystem of interoperating implementations, and with a vision of one day making it into a future version of Java.
Akka (recent winner of “Most Innovative Open Source Tech in 2015”) is a toolkit for building message-driven applications. With Akka Streams 1.0, Akka has incorporated a graphical DSL for composing data streams, an execution model that decouples the stream’s staged computation—it’s “blueprint”—from its execution (allowing for actor-based, single-threaded and fully distributed and clustered execution), type safe stream composition, an implementation of the Reactive Streaming specification that enables back-pressure, and more than 20 predefined stream “processing stages” that provide common streaming transformations that developers can tap into (for splitting streams, transforming streams, merging streams, and more).
Slick is a relational database query and access library for Scala that enables loose-coupling, minimal configuration requirements and abstraction of the complexities of connecting with relational databases. With Slick 3.0, Slick now supports the Reactive Streams API for providing asynchronous stream processing with non-blocking back-pressure. Slick 3.0 also allows elegant mapping across multiple data types, static verification and type inference for embedded SQL statements, compile-time error discovery, and JDBC support for interoperability with all existing drivers.
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
Audience: Architects, Data Scientists, Developers
Technical level: Introductory
From home intrusion detection, to self-driving cars, to keeping data center operations healthy, Machine Learning (ML) has become one of the hottest topics in software engineering today. While much of the focus has been on the actual creation of the algorithms used in ML, the less talked-about challenge is how to serve these models in production, often utilizing real-time streaming data.
The traditional approach to model serving is to treat the model as code, which means that ML implementation has to be continually adapted for model serving. As the amount of machine learning tools and techniques grows, the efficiency of such an approach is becoming more questionable. Additionally, machine learning and model serving are driven by very different quality of service requirements; while machine learning is typically batch, dealing with scalability and processing power, model serving is mostly concerned with performance and stability.
In this webinar with O’Reilly author and Lightbend Principal Architect, Boris Lublinsky, we will define an alternative approach to model serving, based on treating the model itself as data. Using popular frameworks like Akka Streams and Apache Flink, Boris will review how to implement this approach, explaining how it can help you:
* Achieve complete decoupling between the model implementation for machine learning and model serving, enforcing better standardization of your model serving implementation.
* Enable dynamic updates of the served model without having to restart the system.
* Utilize Tensorflow and PMML as model representation and their usage for building “real time updatable” model serving architecture.
Whirlpools in the Stream with Jayesh LalwaniDatabricks
At Capital One, we use Spark to detect Fraud. Recently we have started implementing real-time fraud detection using machine learnt models. One of Capital One’s fraud detection micro services was an early adopter of Structured Streaming. As part of this implementation, the micro service ran into several roadblocks. In this talk, we describe those roadblocks and how we got around them.
Caching of lookup data
Dilemma: Store state in Spark vs store state in database
Retrieve state from database efficiently
Non-homogenous data sources
Aggregations in the stream
Checkpointing fumbles
Checkpointing performance and instabilities
Evolving Services Into A Cloud Native WorldIain Hull
How Workday manage stateful services with a custom controller on Kubernetes?
Conference talk for CloudNative London 2018
https://skillsmatter.com/skillscasts/12106-evolving-services-into-the-cloud-native-world-how-workday-manage-stateful-services-with-a-custom-controller-on-kubernetes
Kubernetes and declarative infrastructure greatly simplify the way we deploy and manage software. Most services can be orchestrated with the control loops supplied by Kubernetes (deployments, stateful sets or jobs). Some stateful services in Workday require more advanced orchestration, and re-architecting them is not an easy option.
In this talk you will discover why some of our services require extra orchestration, and how we evolved an existing service into a control loop on top of Kubernetes. The control loop organises multiple services into groups these are dynamically created, deleted and scaled. It also orchestrates blue/green deployments of each group. Now we can adopt more kubernetes features and retire some of our old scheduling code. Finally you will learn the process we follow to evaluate and design our own control loops and when you might find them useful.
Bio
Iain is a principal software engineer at Workday using Kubernetes and Scala to deliver their next generation elastic grid. His twin passions are large scale distributed computing and applying clean code to complex problems. He is interested in good design and how this can improve system reliability and reduce friction during development.
He loves sharing his experiences as he learns and builds new systems. He regularly speaks at local meetups in Dublin and has presented at conferences including GotoConf, Scala Days, Functional Kats and Lambda World.
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetesconfluent
Speakers: Joe Beda, Co-founder and CTO, Heptio + Gwen Shapira, Principal Data Architect, Confluent
With the rapid adoption of microservices, there is a growing need for solutions to manage deployment, resources and data for fleets of microservices. Kubernetes is a resource management framework for containers that is rapidly growing in popularity. Apache Kafka is a streaming platform that makes data accessible to the edges of an organization. It's no wonder the question of running Kafka on Kubernetes keeps coming up!
In this online talk, Joe Beda, CTO of Heptio and co-creator of Kubernetes, and Gwen Shapira, principal data architect at Confluent and Kafka PMC member, will help you navigate through the hype, address frequently asked questions and deliver critical information to help you decide if running Kafka on Kubernetes is the right approach for your organization.
You will:
-Get an introduction to the basic concepts you need to know as you plan to deploy services on Kubernetes.
-Learn which parts of the Kafka ecosystem fit Kubernetes like a glove, and which require special attention.
-Pick up useful tips for getting started.
-See why Confluent Platform for Kubernetes is the simplest solution to deploying and orchestrating Kafka on Kubernetes, using container images and a Kubernetes operator.
Watch the recording: https://videos.confluent.io/watch/yoZcuazDjDDTcj1sRnaD3J?.
Things I wish someone had told me about Istio, Omer Levi HevroniSoluto
We at Soluto decided to give Istio a try, and started to gradually roll it out in our production environment. While doing that, we had a lot of *interesting* experiences that we weren't aware off - and we'll be happy to share it with you so you can learn from our experience. In the talk, I'll cover issues like high availability, reliability and monitoring - and also production issues we encounter. We do hope that until the meetup we can say that we have istio deployed in production :)
A Practical Guide to Selecting a Stream Processing Technology confluent
Presented by Michael Noll, Product Manager, Confluent.
Why are there so many stream processing frameworks that each define their own terminology? Are the components of each comparable? Why do you need to know about spouts or DStreams just to process a simple sequence of records? Depending on your application’s requirements, you may not need a full framework at all.
Processing and understanding your data to create business value is the ultimate goal of a stream data platform. In this talk we will survey the stream processing landscape, the dimensions along which to evaluate stream processing technologies, and how they integrate with Apache Kafka. Particularly, we will learn how Kafka Streams, the built-in stream processing engine of Apache Kafka, compares to other stream processing systems that require a separate processing infrastructure.
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...Sencha
View this presentation to see a real-time and data-centric application designed to help people manage large facilities, buildings, and homes in a smart way. It notably features D3.js dashboards, user-friendly device mapping, and automatic alerts on suspicious power consumptions.
Lessons from Large-Scale Cloud Software at DatabricksMatei Zaharia
Keynote by Matei Zaharia at SOCC 2019
Abstract:
The cloud has become one of the most attractive ways for enterprises to purchase software, but it requires building products in a very different way from traditional software, which has not been heavily studied in research. I will explain some of these challenges based on my experience at Databricks, a startup that provides a data analytics platform as a service on AWS and Azure. Databricks manages millions of VMs per day to run data engineering and machine learning workloads using Apache Spark, TensorFlow, Python and other software for thousands of customers. Two main challenges arise in this context: (1) building a reliable, scalable control plane that can manage thousands of customers at once and (2) adapting the data processing software itself (e.g. Apache Spark) for an elastic cloud environment (for instance, autoscaling instead of assuming static clusters). These challenges are especially significant for data analytics workloads whose users constantly push boundaries in terms of scale (e.g. number of VMs used, data size, metadata size, number of concurrent users, etc). I’ll describe some of the common challenges that our new services face and some of the main ways that Databricks has extended and modified open source analytics software for the cloud environment (e.g., designing an autoscaling engine for Apache Spark and creating a transactional storage layer on top of S3 in the Delta Lake open source project).
Bio:
Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly on datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Today, Matei tech-leads the MLflow open source machine learning platform at Databricks and is a PI in the DAWN Lab focusing on systems for ML at Stanford. Matei’s research was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).
You think you know everything about Hibernate? Sorry to disappoint you, but not anymore!
Hibernate 6 did some radical internal changes and comes with a bunch of new features and improvements.
Come and join the talk to learn about performance improvements, new HQL and Criteria features like set operations, the fetch clause and window functions,
or new mapping capabilities for mapping types like JSON or UUID.
Christian Beikov is a software engineer working with Java/Jakarta EE technologies since school. He worked on a SRM (supplier relationship management) system for 9 years and is the founder of Blazebit, a company that provides consulting services, support for Blaze-Persistence and related technologies. Since November 2020 he works as full time Hibernate developer at Red Hat. His main interests are in distributed systems, database technologies and everything Java/JVM-related.
Running Apache Spark Jobs Using KubernetesDatabricks
Apache Spark has introduced a powerful engine for distributed data processing, providing unmatched capabilities to handle petabytes of data across multiple servers. Its capabilities and performance unseated other technologies in the Hadoop world, but while Spark provides a lot of power, it also comes with a high maintenance cost, which is why we now see innovations to simplify the Spark infrastructure.
Reactive Integrations - Caveats and bumps in the road explained Markus Eisele
Understand the different approaches to integrate fast data and streams based frameworks into your legacy applications and learn about the advantages, disadvantages, caveats, and bumps in the road.
Being RDBMS Free -- Alternate Approaches to Data PersistenceDavid Hoerster
The general thinking is that when you create a new application, your data will be persisted into an RDBMS like SQL Server. But with the advent of NoSQL solutions, document databases, key-value stores and other options, do you really need an RDBMS for your application? In this session we’ll look at some alternatives to your persistence solution by looking at utilizing NoSQL solutions like Mongo, search services like Solr, key-value stores and other approaches to data persistence. By the end of this session, you’ll rethink how your applications will store data in the future.
Apache Kafka® Delivers a Single Source of Truth for The New York Timesconfluent
With 3.6 million paid print and digital subscriptions, how did The New York Times remain a leader in an evolving industry that once relied on print? It fundamentally changed its infrastructure at the core to keep up with the new expectations of the digital age and its consumers. Now every piece of content ever published by The New York Times throughout the past 166 years and counting is stored in Apache Kafka.
Join The New York Times' Director of Engineering Boerge Svingen to learn how the innovative news giant of America transformed the way it sources content while still maintaining searchability, accuracy and accessibility through a variety of applications and services—all through the power of a real-time streaming platform.
In this talk, Boerge will:
-Provide an overview of what the publishing infrastructure used to look like
-Deep dive into the log-based architecture of The New York Times’ Publishing Pipeline
-Explain the schema, monolog and skinny log used for storing articles
-Share challenges and lessons learned
-Answer live questions submitted by the audience
Watch the recording: https://videos.confluent.io/watch/SURnGMNNzsvDHYCmnCkJEY?
What We Learned from Porting PiggyMetrics from Spring Boot to MicroProfileEd Burns
PiggyMetrics is a popular open source end-to-end sample which demonstrates the use of Spring Boot and Spring Cloud features in a microservices-style application. Spring Boot and MicroProfile are popular competing frameworks for building apps in the cloud-native microservices style. Functionally, architecturally, and historically they have many things in common. From a business, economic and governance perspective they have significant differences. This session from Java Champions Ed Burns and Emily Jiang, respectively of Microsoft and IBM, briefly surveys the history and non-technical aspects in comparing the Spring Boot and MicroProfile stacks and then will take you through a real world case study based on PiggyMetrics. We will share our experience of porting it from Spring to MicroProfile.
PiggyMetrics models a personal finance application and uses cloud native microservices features such as externalized configuration, aggregate logs, service metrics, security propagation, and distributed tracing. The porting exercise utilizes MicroProfile features such as Config, Metrics, Health Check, Fault Tolerance, Open Tracing and JWT Propagation along with Jakarta CDI, REST and JSON Binding.
Join Ed and Emily for a fun and informative compare and contrast ride.
Lightbend Training for Scala, Akka, Play Framework and Apache SparkLightbend
Having a team adopt new technologies and approaches to software development is a daunting task. New paradigms and unfamiliar ontologies headline the biggest risks to having a team be productive quickly. Lightbend (formerly Typesafe) has a suite of training classes to help you adopt whatever components of the Lightbend Reactive Platform you need to be responsive to you customers by creating resilient and elastic applications.
In this webinar, we will discuss the philosophies and structures of Lightbend's training materials for Scala, Akka, Play Framework, and Spark.
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
Audience: Architects, Data Scientists, Developers
Technical level: Introductory
From home intrusion detection, to self-driving cars, to keeping data center operations healthy, Machine Learning (ML) has become one of the hottest topics in software engineering today. While much of the focus has been on the actual creation of the algorithms used in ML, the less talked-about challenge is how to serve these models in production, often utilizing real-time streaming data.
The traditional approach to model serving is to treat the model as code, which means that ML implementation has to be continually adapted for model serving. As the amount of machine learning tools and techniques grows, the efficiency of such an approach is becoming more questionable. Additionally, machine learning and model serving are driven by very different quality of service requirements; while machine learning is typically batch, dealing with scalability and processing power, model serving is mostly concerned with performance and stability.
In this webinar with O’Reilly author and Lightbend Principal Architect, Boris Lublinsky, we will define an alternative approach to model serving, based on treating the model itself as data. Using popular frameworks like Akka Streams and Apache Flink, Boris will review how to implement this approach, explaining how it can help you:
* Achieve complete decoupling between the model implementation for machine learning and model serving, enforcing better standardization of your model serving implementation.
* Enable dynamic updates of the served model without having to restart the system.
* Utilize Tensorflow and PMML as model representation and their usage for building “real time updatable” model serving architecture.
Whirlpools in the Stream with Jayesh LalwaniDatabricks
At Capital One, we use Spark to detect Fraud. Recently we have started implementing real-time fraud detection using machine learnt models. One of Capital One’s fraud detection micro services was an early adopter of Structured Streaming. As part of this implementation, the micro service ran into several roadblocks. In this talk, we describe those roadblocks and how we got around them.
Caching of lookup data
Dilemma: Store state in Spark vs store state in database
Retrieve state from database efficiently
Non-homogenous data sources
Aggregations in the stream
Checkpointing fumbles
Checkpointing performance and instabilities
Evolving Services Into A Cloud Native WorldIain Hull
How Workday manage stateful services with a custom controller on Kubernetes?
Conference talk for CloudNative London 2018
https://skillsmatter.com/skillscasts/12106-evolving-services-into-the-cloud-native-world-how-workday-manage-stateful-services-with-a-custom-controller-on-kubernetes
Kubernetes and declarative infrastructure greatly simplify the way we deploy and manage software. Most services can be orchestrated with the control loops supplied by Kubernetes (deployments, stateful sets or jobs). Some stateful services in Workday require more advanced orchestration, and re-architecting them is not an easy option.
In this talk you will discover why some of our services require extra orchestration, and how we evolved an existing service into a control loop on top of Kubernetes. The control loop organises multiple services into groups these are dynamically created, deleted and scaled. It also orchestrates blue/green deployments of each group. Now we can adopt more kubernetes features and retire some of our old scheduling code. Finally you will learn the process we follow to evaluate and design our own control loops and when you might find them useful.
Bio
Iain is a principal software engineer at Workday using Kubernetes and Scala to deliver their next generation elastic grid. His twin passions are large scale distributed computing and applying clean code to complex problems. He is interested in good design and how this can improve system reliability and reduce friction during development.
He loves sharing his experiences as he learns and builds new systems. He regularly speaks at local meetups in Dublin and has presented at conferences including GotoConf, Scala Days, Functional Kats and Lambda World.
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetesconfluent
Speakers: Joe Beda, Co-founder and CTO, Heptio + Gwen Shapira, Principal Data Architect, Confluent
With the rapid adoption of microservices, there is a growing need for solutions to manage deployment, resources and data for fleets of microservices. Kubernetes is a resource management framework for containers that is rapidly growing in popularity. Apache Kafka is a streaming platform that makes data accessible to the edges of an organization. It's no wonder the question of running Kafka on Kubernetes keeps coming up!
In this online talk, Joe Beda, CTO of Heptio and co-creator of Kubernetes, and Gwen Shapira, principal data architect at Confluent and Kafka PMC member, will help you navigate through the hype, address frequently asked questions and deliver critical information to help you decide if running Kafka on Kubernetes is the right approach for your organization.
You will:
-Get an introduction to the basic concepts you need to know as you plan to deploy services on Kubernetes.
-Learn which parts of the Kafka ecosystem fit Kubernetes like a glove, and which require special attention.
-Pick up useful tips for getting started.
-See why Confluent Platform for Kubernetes is the simplest solution to deploying and orchestrating Kafka on Kubernetes, using container images and a Kubernetes operator.
Watch the recording: https://videos.confluent.io/watch/yoZcuazDjDDTcj1sRnaD3J?.
Things I wish someone had told me about Istio, Omer Levi HevroniSoluto
We at Soluto decided to give Istio a try, and started to gradually roll it out in our production environment. While doing that, we had a lot of *interesting* experiences that we weren't aware off - and we'll be happy to share it with you so you can learn from our experience. In the talk, I'll cover issues like high availability, reliability and monitoring - and also production issues we encounter. We do hope that until the meetup we can say that we have istio deployed in production :)
A Practical Guide to Selecting a Stream Processing Technology confluent
Presented by Michael Noll, Product Manager, Confluent.
Why are there so many stream processing frameworks that each define their own terminology? Are the components of each comparable? Why do you need to know about spouts or DStreams just to process a simple sequence of records? Depending on your application’s requirements, you may not need a full framework at all.
Processing and understanding your data to create business value is the ultimate goal of a stream data platform. In this talk we will survey the stream processing landscape, the dimensions along which to evaluate stream processing technologies, and how they integrate with Apache Kafka. Particularly, we will learn how Kafka Streams, the built-in stream processing engine of Apache Kafka, compares to other stream processing systems that require a separate processing infrastructure.
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...Sencha
View this presentation to see a real-time and data-centric application designed to help people manage large facilities, buildings, and homes in a smart way. It notably features D3.js dashboards, user-friendly device mapping, and automatic alerts on suspicious power consumptions.
Lessons from Large-Scale Cloud Software at DatabricksMatei Zaharia
Keynote by Matei Zaharia at SOCC 2019
Abstract:
The cloud has become one of the most attractive ways for enterprises to purchase software, but it requires building products in a very different way from traditional software, which has not been heavily studied in research. I will explain some of these challenges based on my experience at Databricks, a startup that provides a data analytics platform as a service on AWS and Azure. Databricks manages millions of VMs per day to run data engineering and machine learning workloads using Apache Spark, TensorFlow, Python and other software for thousands of customers. Two main challenges arise in this context: (1) building a reliable, scalable control plane that can manage thousands of customers at once and (2) adapting the data processing software itself (e.g. Apache Spark) for an elastic cloud environment (for instance, autoscaling instead of assuming static clusters). These challenges are especially significant for data analytics workloads whose users constantly push boundaries in terms of scale (e.g. number of VMs used, data size, metadata size, number of concurrent users, etc). I’ll describe some of the common challenges that our new services face and some of the main ways that Databricks has extended and modified open source analytics software for the cloud environment (e.g., designing an autoscaling engine for Apache Spark and creating a transactional storage layer on top of S3 in the Delta Lake open source project).
Bio:
Matei Zaharia is an Assistant Professor of Computer Science at Stanford University and Chief Technologist at Databricks. He started the Apache Spark project during his PhD at UC Berkeley in 2009, and has worked broadly on datacenter systems, co-starting the Apache Mesos project and contributing as a committer on Apache Hadoop. Today, Matei tech-leads the MLflow open source machine learning platform at Databricks and is a PI in the DAWN Lab focusing on systems for ML at Stanford. Matei’s research was recognized through the 2014 ACM Doctoral Dissertation Award for the best PhD dissertation in computer science, an NSF CAREER Award, and the US Presidential Early Career Award for Scientists and Engineers (PECASE).
You think you know everything about Hibernate? Sorry to disappoint you, but not anymore!
Hibernate 6 did some radical internal changes and comes with a bunch of new features and improvements.
Come and join the talk to learn about performance improvements, new HQL and Criteria features like set operations, the fetch clause and window functions,
or new mapping capabilities for mapping types like JSON or UUID.
Christian Beikov is a software engineer working with Java/Jakarta EE technologies since school. He worked on a SRM (supplier relationship management) system for 9 years and is the founder of Blazebit, a company that provides consulting services, support for Blaze-Persistence and related technologies. Since November 2020 he works as full time Hibernate developer at Red Hat. His main interests are in distributed systems, database technologies and everything Java/JVM-related.
Running Apache Spark Jobs Using KubernetesDatabricks
Apache Spark has introduced a powerful engine for distributed data processing, providing unmatched capabilities to handle petabytes of data across multiple servers. Its capabilities and performance unseated other technologies in the Hadoop world, but while Spark provides a lot of power, it also comes with a high maintenance cost, which is why we now see innovations to simplify the Spark infrastructure.
Reactive Integrations - Caveats and bumps in the road explained Markus Eisele
Understand the different approaches to integrate fast data and streams based frameworks into your legacy applications and learn about the advantages, disadvantages, caveats, and bumps in the road.
Being RDBMS Free -- Alternate Approaches to Data PersistenceDavid Hoerster
The general thinking is that when you create a new application, your data will be persisted into an RDBMS like SQL Server. But with the advent of NoSQL solutions, document databases, key-value stores and other options, do you really need an RDBMS for your application? In this session we’ll look at some alternatives to your persistence solution by looking at utilizing NoSQL solutions like Mongo, search services like Solr, key-value stores and other approaches to data persistence. By the end of this session, you’ll rethink how your applications will store data in the future.
Apache Kafka® Delivers a Single Source of Truth for The New York Timesconfluent
With 3.6 million paid print and digital subscriptions, how did The New York Times remain a leader in an evolving industry that once relied on print? It fundamentally changed its infrastructure at the core to keep up with the new expectations of the digital age and its consumers. Now every piece of content ever published by The New York Times throughout the past 166 years and counting is stored in Apache Kafka.
Join The New York Times' Director of Engineering Boerge Svingen to learn how the innovative news giant of America transformed the way it sources content while still maintaining searchability, accuracy and accessibility through a variety of applications and services—all through the power of a real-time streaming platform.
In this talk, Boerge will:
-Provide an overview of what the publishing infrastructure used to look like
-Deep dive into the log-based architecture of The New York Times’ Publishing Pipeline
-Explain the schema, monolog and skinny log used for storing articles
-Share challenges and lessons learned
-Answer live questions submitted by the audience
Watch the recording: https://videos.confluent.io/watch/SURnGMNNzsvDHYCmnCkJEY?
What We Learned from Porting PiggyMetrics from Spring Boot to MicroProfileEd Burns
PiggyMetrics is a popular open source end-to-end sample which demonstrates the use of Spring Boot and Spring Cloud features in a microservices-style application. Spring Boot and MicroProfile are popular competing frameworks for building apps in the cloud-native microservices style. Functionally, architecturally, and historically they have many things in common. From a business, economic and governance perspective they have significant differences. This session from Java Champions Ed Burns and Emily Jiang, respectively of Microsoft and IBM, briefly surveys the history and non-technical aspects in comparing the Spring Boot and MicroProfile stacks and then will take you through a real world case study based on PiggyMetrics. We will share our experience of porting it from Spring to MicroProfile.
PiggyMetrics models a personal finance application and uses cloud native microservices features such as externalized configuration, aggregate logs, service metrics, security propagation, and distributed tracing. The porting exercise utilizes MicroProfile features such as Config, Metrics, Health Check, Fault Tolerance, Open Tracing and JWT Propagation along with Jakarta CDI, REST and JSON Binding.
Join Ed and Emily for a fun and informative compare and contrast ride.
Lightbend Training for Scala, Akka, Play Framework and Apache SparkLightbend
Having a team adopt new technologies and approaches to software development is a daunting task. New paradigms and unfamiliar ontologies headline the biggest risks to having a team be productive quickly. Lightbend (formerly Typesafe) has a suite of training classes to help you adopt whatever components of the Lightbend Reactive Platform you need to be responsive to you customers by creating resilient and elastic applications.
In this webinar, we will discuss the philosophies and structures of Lightbend's training materials for Scala, Akka, Play Framework, and Spark.
Resume - Taranjeet Singh - 3.5 years - Java/J2EE/GWTtaranjs
This is OLD PROFILE.
For latest one, visit : http://www.linkedin.com/in/taranjs
and connect with me using my email : taranjs at gmail dot com
B.Tech. (Electronics and Communication) from Guru Gobind Singh Indraprastha University.
● Proficiency in grasping new technical concepts quickly and utilizing the same in a productive manner.
● A proactive learner with a flair for adopting emerging trends & addressing industry requirements to achieve
organizational objectives & profitability norms.
● An exceptional performer with distinction of being commended for exemplary performance in academics as well as
extra-curricular activities.
● Total experience: 3 year and 5 months
A Social network and Learning Centre is designed to help users to meet new friends, maintain existing relationships and at the same time enhance their concepts related to Java. The main goal of our website is to make your social life more active and stimulating. This project helps you to connect People, share your ideas and enhance your Programming Concepts related to Java, Android & Windows .
In this project a new class of resource available where you can Read, Write, Compile and Run Java Program with webface Online Compiler. Lecture Notes Available With Example. Your Personal Image, Music & Video Gallery, That makes Complete Platform For Everyone.
• Language Used : JSP & Servlet.
• Designing : Html, CSS, JavaScript
• IDE : NetBeans 8.0.2
• Database : MySQL 5.1.
# Complete project report Made By abhishek Kumar
Masters of Computer Science Candidate with around 3 years of work experience in Java EE, Spring MVC, Hibernate, JavaScript, JQuery, Back End Software Development. Looking for an opportunity as a Full Stack Developer
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
1. Interactive Publication Platform
Group ID : - 2
Group Members :-
1.Ratan Kadam-4679
2.Nachiket Talwalkar-4582
3.Aditya Deshpande-4618
4.Sagar Kharat-4678
Guided By :-
* Prof. Jayadevan R.
* Prof. Warhade S.
Project Sponsorship:- In House
External Guide:- Mr. Prakash Khot(Salesforce Inc. USA)
3. Introduction
Problem Definition :
A mobile web application developed for
Apple’s new device- ipad called “Interactive
Publication Platform” which provides
interactive environment for book reading by
incorporating best of recent web technologies.
4. Front End –
Programming Languages –
Objective C :It is a superset of C
which includes object oriented
features and Cocoa framework.
7. Modules
Back End Modules:
• XML parser(in java):Grabbing the link out of
RSS feeds subscribed by user.
• HTML parser(in java):Performs Extraction and
Transformation of the content in HTML page.
• Magazine Scheduler: It schedules the threads
of magazine creation and updation .
• Page Writer : Writing to the templates of the
magazine in XML form.
10. Google App
Engine
User Interface
UI Controller
Model
XML Data
Connection with
GAE
XML Parsing
Data & Memory
Management
Linking UI & Model
Displaying
Formatted Data
11. Front End Modules:
• XML parser(Objective C): Reading the display
Descriptor and Content Descriptor.
• UI Controller: Writing the controller to link the
data and the view.
• Designing the UI: Managing the look and feel
of the magazine.
• Social Integration: Integrating Facebook and
Twitter.
12. Future Scope
• The future scope of this project is to make the
application platform independent so that it
can be viewed on any tablets like google slate,
hp slate etc.
13. References
• Chia-Chi Peng, Richard Helps “Mobile Application Development Essential
New Directions for IT” 7th
International Conference on Information
Technology.
• Emitrios Katasaros, Panayiotis Bozanis” Application Development: Fly To
clouds or stay in House?” Workshops on enabling technologies
infrastructure for collaborative enterprises 2010.
• Sohansingh Yadav,Zeng Wen Hua”Cloud: A computing infrastructure on
Demand” 2nd
International Conference on Computer Engineering
Technology 2010.
• Ben Fufie” Is iPad A Game Changer?” Apple Press Inc
• Micah Strickel”Impact of lecturing with tablet PCs on students of different
learning styles.” 39th
Aste/IEEE frontiers in education Conference Oct 18-
21 2009,San Antonio TX.