GNW03: Stream Processing with Apache Kafka by Gwen Shapira

Real time data viz with Spark Streaming, Kafka and D3.js

Ben Laird

Bridging the gap of Relational to Hadoop using Sqoop @ Expedia

Introduction to Spark Streaming

datamantra

Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira

Why apache Flink is the 4G of Big Data Analytics Frameworks

Slim Baltagi

Apache Flink is a community-driven open source and memory-centric Big Data analytics framework. It provides the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases. Flink uses a mixture of Scala and Java internally, has very good Scala APIs and some of its libraries are basically pure Scala (FlinkML and Table). At its core, it is a streaming dataflow execution engine and it also provides several APIs for batch processing (DataSet API), real-time streaming (DataStream API) and relational queries (Table API) and also domain-specific libraries for machine learning (FlinkML) and graph processing (Gelly). In this talk, you will learn in more details about: What is Apache Flink, how it fits into the Big Data ecosystem and why it is the 4G (4th Generation) of Big Data Analytics frameworks? How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment? Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? What are the benchmarking results between Apache Flink and those other Big Data analytics frameworks?

Tachyon and Apache Spark

rhatr

Rethinking Streaming Analytics For Scale

This talk will address new architectures emerging for large scale streaming analytics. Some based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) and other newer streaming analytics platforms and frameworks using Apache Flink or GearPump. Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (e.g. ETL). I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.

HBaseConEast2016: Splice machine open source rdbms

Michael Stack

Data Architectures for Robust Decision Making

Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive

Streaming Analytics with Spark, Kafka, Cassandra and Akka

This talk will address how a new architecture is emerging for analytics, based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK). Popular architecture like Lambda separate layers of computation and delivery and require many technologies which have overlapping functionality. Some of this results in duplicated code, untyped processes, or high operational overhead, let alone the cost (i.e. ETL). I will discuss the problem domain and what is needed in terms of strategies, architecture and application design and code to begin leveraging simpler data flows. We will cover how the particular set of technologies addresses common requirements and how collaboratively they work together to enrich and reinforce each other.

Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe

Flip Kromer

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Data Con LA

Scylla is a new, open-source NoSQL data store with a novel design optimized for modern hardware, capable of 1.8 million requests per second per node, while providing Apache Cassandra compatibility and scaling properties. While conventional NoSQL databases suffer from latency hiccups, expensive locking, and low throughput due to low processor utilization, the Scylla design is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. The result is a NoSQL database that delivers an order of magnitude more performance, with less performance tuning needed from the administrator. With extra performance to work with, NoSQL projects can have more flexibility to focus on other concerns, such as functionality and time to market. Come for the tech details on what Scylla does under the hood, and leave with some ideas on how to do more with NoSQL, faster. Speaker bio Don Marti is technical marketing manager for ScyllaDB. He has written for Linux Weekly News, Linux Journal, and other publications. He co-founded the Linux consulting firm Electric Lichen. Don is a strategic advisor for Mozilla, and has previously served as president and vice president of the Silicon Valley Linux Users Group and on the program committees for Uselinux, Codecon, and LinuxWorld Conference and Expo.

Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...

DataStax Academy

The state of analytics has changed dramatically over the last few years. Hadoop is now commonplace, and the ecosystem has evolved to include new tools such as Spark, Shark, and Drill, that live alongside the old MapReduce-based standards. It can be difficult to keep up with the pace of change, and newcomers are left with a dizzying variety of seemingly similar choices. This is compounded by the number of possible deployment permutations, which can cause all but the most determined to simply stick with the tried and true. But there are serious advantages to many of the new tools, and this presentation will give an analysis of the current state–including pros and cons as well as what’s needed to bootstrap and operate the various options. About Robbie Strickland, Software Development Manager at The Weather Channel Robbie works for The Weather Channel’s digital division as part of the team that builds backend services for weather.com and the TWC mobile apps. He has been involved in the Cassandra project since 2010 and has contributed in a variety of ways over the years; this includes work on drivers for Scala and C#, the Hadoop integration, heading up the Atlanta Cassandra Users Group, and answering lots of Stack Overflow questions.

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...

Amazon Web Services

"This is a technical architect's case study of how Loggly has employed the latest social-media-scale technologies as the backbone ingestion processing for our multi-tenant, geo-distributed, and real-time log management system. This presentation describes design details of how we built a second-generation system fully leveraging AWS services including Amazon Route 53 DNS with heartbeat and latency-based routing, multi-region VPCs, Elastic Load Balancing, Amazon Relational Database Service, and a number of pro-active and re-active approaches to scaling computational and indexing capacity. The talk includes lessons learned in our first generation release, validated by thousands of customers; speed bumps and the mistakes we made along the way; various data models and architectures previously considered; and success at scale: speeds, feeds, and an unmeltable log processing engine."

Kafka connect-london-meetup-2016

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi

Databricks

Apache Kylin is a distributed OLAP engine on Hadoop, which provides sub-second level query latency over datasets scaling to petabytes. Kylin’s superior query performance relies on pre-calculated multi-dimension Cube, which is often time-consuming to build. By default, Kylin uses MapReduce Cube Engine built atop of Hadoop MapReduce framework to aggregate huge amounts of source data. The MR Engine has been well-tuned over years and proven to be stable in hundreds of production deployments. Recently, the Kylin team is trying to further speed up the process of cube building by replacing MR with Spark. Kyligence has initiated the new Spark Cube Engine with some benchmarks between Spark and MR over different datasets, and has received some promising results. Hear about their results and experiences on moving Cube building, which is a huge computing task, to Spark.

Lambda Architecture with Spark

Knoldus Inc.

Debunking Common Myths in Stream Processing

Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays. In this talk, we are going to look at some of the most common misconceptions about stream processing and debunk them. - Myth 1: Streaming is approximate and exactly-once is not possible. - Myth 2: Streaming is for real-time only. - Myth 4: Streaming is harder to learn than Batch Processing. - Myth 3: You need to choose between latency and throughput. We will look at these and other myths and debunk them at the example of Apache Flink. We will discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.

Flink System Overview

Timo Walther

What's hot

How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021

StreamNative

The Future of Apache Storm

Real time data viz with Spark Streaming, Kafka and D3.js

Ben Laird

Bridging the gap of Relational to Hadoop using Sqoop @ Expedia

Introduction to Spark Streaming

datamantra

Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira

Why apache Flink is the 4G of Big Data Analytics Frameworks

Slim Baltagi

Tachyon and Apache Spark

rhatr

Rethinking Streaming Analytics For Scale

HBaseConEast2016: Splice machine open source rdbms

Michael Stack

Data Architectures for Robust Decision Making

Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive

Streaming Analytics with Spark, Kafka, Cassandra and Akka

Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe

Flip Kromer

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Data Con LA

Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...

DataStax Academy

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...

Amazon Web Services

Kafka connect-london-meetup-2016

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi

Databricks

Lambda Architecture with Spark

Knoldus Inc.

What's hot (20)

How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021

The Future of Apache Storm

Real time data viz with Spark Streaming, Kafka and D3.js

Bridging the gap of Relational to Hadoop using Sqoop @ Expedia

Introduction to Spark Streaming

Scaling ETL with Hadoop - Avoiding Failure

Why apache Flink is the 4G of Big Data Analytics Frameworks

Tachyon and Apache Spark

Rethinking Streaming Analytics For Scale

HBaseConEast2016: Splice machine open source rdbms

Data Architectures for Robust Decision Making

Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive

Streaming Analytics with Spark, Kafka, Cassandra and Akka

Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe

ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...

Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...

Kafka connect-london-meetup-2016

Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi

Lambda Architecture with Spark

Similar to GNW03: Stream Processing with Apache Kafka by Gwen Shapira

Debunking Common Myths in Stream Processing

Flink System Overview

Timo Walther

Connecting Akka with Oracle Event Hub Cloud Service

Dalibor Blazevic

The Rise of Streaming SQL and Evolution of Streaming Applications

Srinath Perera

First-generation stream processors, such as Apache Storm, wanted us to write code. It was a great start. However, when building real-world apps, which are used for a long time and evolve, writing code gets us into trouble. If we want to query a database or query data stored in storage with Hadoop, we use SQL. Why can't we query data streaming using SQL? We can. Almost all open source stream processors, including Storm, Flink, and Kafka, have switched to SQL. In this webinar, Srinath will talk about the evolution of stream processing, streaming SQL, the status quo, and what this means to stream applications. He will also dissect the experience of building streaming applications by exploring common patterns and pitfalls.

Systems Monitoring with Prometheus (Devops Ireland April 2015)

Brian Brazil

Monitoring means many things to many people. This talk looks at Systems Monitoring, that is how to keep an eye on a given system and use this as part of overall management of a system. This talk will cover Why one monitors, What to monitor, How to monitor, the general design of a monitoring system and how Prometheus is a good fit for this in terms of instrumentation, consoles, alerts, general system health and sanity. Prometheus is a next-generation monitoring system publicly announced earlier this year, developed by companies including SoundCloud, locals Boxever and Docker. Since launch there has been wide-spread interest, and many community contributions. For more information see http://prometheus.io or http://www.boxever.com/tag/monitoring

Cloud lunch and learn real-time streaming in azure

Timothy Spann

Introduction to Kafka Streams - Knolx.pdf

Knoldus Inc.

Five Early Challenges Of Building Streaming Fast Data Applications

Lightbend

There is a unification happening between data and microservice architectures: the demand for availability, scalability, and resilience is forcing Fast Data architectures to become like microservice architectures, while organizations building microservices find their data requirements are also evolving. At the center of it all is stream data processing, which is about more than just extracting information faster. It’s about embracing wholesale change in how organizations build data-centric applications. Yet getting started with streaming and Fast Data systems provides a number of tough questions and challenges to enterprises, which we’ve encapsulated into 5 major categories: 1. Choosing among streaming frameworks. How to select the right stream processing frameworks (e.g. Akka Streams, Spark, Flink, Kafka Streams) for different use cases? 2. Integrating with application architecture. How to best integrate microservices with streaming data services? 3. Operational challenges. What do you need to know about deploying, managing and monitoring our application clusters in the long term? 4. Decreasing Costs. How can you minimize costs by keeping our infrastructure footprint small, while not trading off performance? 5. Applying Machine Learning. How can you start using Machine Learning, Deep Learning and AI to your advantage? In this webinar, Lightbend’s Senior Product Director, Craig Blitz reviews the implications of these decisions, and give you a preview of what Lightbend is doing to make these choices more straightforward with our upcoming Fast Data Platform - an integrated platform that helps you build, deploy and run Fast Data and streaming applications easily and reliably.

kafka for db as postgres

PivotalOpenSourceHub

Performance Comparison of Streaming Big Data Platforms