This document discusses auto-scaling in Google Cloud Dataflow. It describes how Dataflow pipelines can automatically adjust parallelism levels based on throughput, backlog growth, and CPU utilization signals. The scaling policy aims to keep pipelines keeping up with input rates and reducing backlogs quickly. The mechanism for changing parallelism involves splitting computation ranges across additional machines or migrating ranges between machines. Future work may include finer-grained range splitting and approximate throughput modeling.
Splunk for JMX App overview (configuration, deployment, tips and tricks). Developing JMX logic in your application. Splunking other JVM logs and profiling traces. The JVM application landscape and why it's such a rich source of Splunkable machine data. Developing new Splunkbase apps to leverage Splunk for JMX.
Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data and analytics application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
In this one-hour webinar, we will look at the portfolio of AWS Big Data services and how they can be used to build a modern data architecture.
We will cover:
Using different SQL engines to analyze large amounts of structured data
Analysing streaming data in near-real time
Architectures for batch processing
Best practices for Data Lake architectures
This session is suited for:
Solution and enterprise architects
Data architects/ Data warehouse owners
IT & Innovation team members
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Splunk for JMX App overview (configuration, deployment, tips and tricks). Developing JMX logic in your application. Splunking other JVM logs and profiling traces. The JVM application landscape and why it's such a rich source of Splunkable machine data. Developing new Splunkbase apps to leverage Splunk for JMX.
Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data and analytics application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
In this one-hour webinar, we will look at the portfolio of AWS Big Data services and how they can be used to build a modern data architecture.
We will cover:
Using different SQL engines to analyze large amounts of structured data
Analysing streaming data in near-real time
Architectures for batch processing
Best practices for Data Lake architectures
This session is suited for:
Solution and enterprise architects
Data architects/ Data warehouse owners
IT & Innovation team members
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...DataWorks Summit
TMW Systems, A TRIMBLE Company, is the industry-leading transportation management software. 3PLs, brokers, distribution and supply operations, dedicated and private fleets, commercial carriers, and energy service providers rely on our transportation management systems, our fleet maintenance management software, or our routing and scheduling software to make them more efficient and profitable. Billions of data points exist in the trucking industry, and we at TMW Systems are pioneers of tracking millions of trucks, freights, and assets.
The architecture team at TMW leverages Nifi and SAM to deliver the immense volume of data in real-time. In this session, you will get a thorough understanding of all the streaming components. We have utilized Apache Kafka, Apache Nifi, and Streaming Analytics Manager to build our real-time data pipeline. We will also discuss the real-time event processing using SAM and Schema Registry. Lastly, we will show custom processors in Nifi and SAM that helped us with complex event processing.
Speaker
Krishna Potluri, TMW Systems, A Trimble Company, Big Data Architect
Donnie Wheat, Trimble, Senior Big Data Architect
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Training for AWS Solutions Architect at http://zekelabs.com/courses/amazon-web-services-training-bangalore/.This slide describes about features of EC2, EC2 Options, family type, storage, EBS Volumes, EC2 Instance Store, Security Groups, Volumes and Snapshots, Amazon Machine Image (AMI), Elastic load balancer, Classic load balancer, Application load balancer, Network load balancer, AWS CLI and EC2 Metadata
___________________________________________________
zekeLabs is a Technology training platform. We provide instructor led corporate training and classroom training on Industry relevant Cutting Edge Technologies like Big Data, Machine Learning, Natural Language Processing, Artificial Intelligence, Data Science, Amazon Web Services, DevOps, Cloud Computing and Frameworks like Django,Spring, Ruby on Rails, Angular 2 and many more to Professionals.
Reach out to us at www.zekelabs.com or call us at +91 8095465880 or drop a mail at info@zekelabs.com
I gave this presentation on 5/17 to the New Mexico VMUG in Santa Fe. The presentation provides an overview of OpenStack, what it is (and isn't), and some things you might learn to get started with OpenStack.
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
You may be familiar with the Presto plugin used to run fast interactive queries over Pulsar using ANSI SQL and can be joined with other data sources. This plugin will soon get a rename to align with the rename of the PrestoSQL project to Trino. What is the purpose of this rename and what does it mean for those using the Presto plugin? We cover the history of the community shift from PrestoDB to PrestoSQL, as well as, the future plans for the Pulsar community to donate this plugin to the Trino project. One of the connector maintainers will then demo the connector and show what is possible when using Trino and Pulsar!
Mainframe Integration, Offloading and Replacement with Apache KafkaKai Wähner
Video recording of this presentation:
https://youtu.be/upWzamacOVQ
Blog post with more details:
https://www.kai-waehner.de/blog/2020/04/24/mainframe-offloading-replacement-apache-kafka-connect-ibm-db2-mq-cdc-cobol/
Mainframes are still hard at work, processing over 70 percent of the world’s most essential computing transactions every day. Very high cost, monolithic architectures, and missing experts are the key challenges for mainframe applications. Time to get more innovative, even with the mainframe!
Mainframe offloading with Apache Kafka and its ecosystem can be used to keep a more modern data store in real-time sync with the mainframe. At the same time, it is persisting the event data on the bus to enable microservices, and deliver the data to other systems such as data warehouses and search indexes.
But the final goal and ultimate vision are to replace the mainframe by new applications using modern and less costly technologies. Stand up to the dinosaur, but keep in mind that legacy migration is a journey! Kai will guide you to the next step of your company’s evolution!
You will learn:
- how to not only reduce operational expenses but provide a path for architecture modernization, agility and eventually mainframe replacement
- what steps some of Confluent’s customers already took, leveraging technologies like Change Data Capture (CDC) or MQ for mainframe offloading
- how an event streaming platform enables cost reduction, architecture modernization, and a combination of a mainframe with new technologies
Introducción a Microsoft Azure SQL Data WarehouseJoseph Lopez
El nuevo Microsoft Azure SQL Data Warehouse (SQL DW) es un versátil servicio de almacén de datos que provee una solución Massively Parallel Processing (MPP) para "Big data" con verdaderas características de alta infraestructura empresarial. El servicio SQL DW está construido para la carga de datos en ejecución de unos cien gigabytes hasta petabytes de datos con características únicas como cálculo desagregado, permitiendo así que los clientes sean capaces de utilizar el servicio para satisfacer sus necesidades de almacenamiento. En la presente exposición les mostrare una mirada en profundidad de este nuevo servicio de Azure como la implementación, el escalamiento elástico (Grow, Shrink, y Pause), y las nubes de datos híbrida con integración de Hadoop a través Polybase permitiendo una verdadera experiencia de SQL a través de datos estructurados y no estructurados.
YouTube Link: https://youtu.be/0djPrlaxx_U
Edureka AWS Architect Certification Training - https://www.edureka.co/aws-certification-training
This Edureka PPT on AWS Cloud Practitioner will provide a complete guide to your AWS Cloud Practitioner Certification exam. It will explain the exam details, objectives, why you should get certified and also how AWS certification will help your career.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
The Rise of Data in Motion in the Healthcare Industry - Use Cases, Architectures and Examples powered by Apache Kafka.
Use Cases for Data in Motion in the Healthcare Industry:
- Know Your Patient (= “Customer 360”)
- Operations (Healthcare 4.0 including Drug R&D, Patient Care, etc.)
- IT Perspective (Cybersecurity, Mainframe Offload, Hybrid Cloud, Streaming ETL, etc)
Real-world examples include Covid-19 Electronic Lab Reporting, Cerner, Optum, Centene, Humana, Invitae, Bayer, Celmatix, Care.com.
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...Rahul Krishna Upadhyaya
Slide was presented at Dr. Dobb's Conference in Bangalore.
Talks about Openstack Introduction in general
Projects under Openstack.
Contributing to Openstack.
This was presented jointly by CB Ananth and Rahul at Dr. Dobb's Conference Bangalore on 12th Apr 2014.
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent
MQ, ETL and ESB middleware are often used as integration backbone between legacy applications, modern microservices and cloud services. This introduces several challenges and complexities like point-to-point integration or non-scalable architectures. This session discusses how to build a completely event-driven streaming platform leveraging Apache Kafka’s open source messaging, integration and streaming components to leverage distributed processing, fault-tolerance, rolling upgrades and the ability to reprocess events. Learn the differences between a event-driven streaming platform leveraging Apache Kafka and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
Alluxio Tech Talk
Dec 10, 2019
Chris Crosbie and Roderick Yao from the Google Dataproc team and Dipti Borkar of Alluxio will demo how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. They’ll also show how to run Dataproc Spark against a remote HDFS cluster.
For more Alluxio events: https://www.alluxio.io/events/
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...DataWorks Summit
TMW Systems, A TRIMBLE Company, is the industry-leading transportation management software. 3PLs, brokers, distribution and supply operations, dedicated and private fleets, commercial carriers, and energy service providers rely on our transportation management systems, our fleet maintenance management software, or our routing and scheduling software to make them more efficient and profitable. Billions of data points exist in the trucking industry, and we at TMW Systems are pioneers of tracking millions of trucks, freights, and assets.
The architecture team at TMW leverages Nifi and SAM to deliver the immense volume of data in real-time. In this session, you will get a thorough understanding of all the streaming components. We have utilized Apache Kafka, Apache Nifi, and Streaming Analytics Manager to build our real-time data pipeline. We will also discuss the real-time event processing using SAM and Schema Registry. Lastly, we will show custom processors in Nifi and SAM that helped us with complex event processing.
Speaker
Krishna Potluri, TMW Systems, A Trimble Company, Big Data Architect
Donnie Wheat, Trimble, Senior Big Data Architect
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Training for AWS Solutions Architect at http://zekelabs.com/courses/amazon-web-services-training-bangalore/.This slide describes about features of EC2, EC2 Options, family type, storage, EBS Volumes, EC2 Instance Store, Security Groups, Volumes and Snapshots, Amazon Machine Image (AMI), Elastic load balancer, Classic load balancer, Application load balancer, Network load balancer, AWS CLI and EC2 Metadata
___________________________________________________
zekeLabs is a Technology training platform. We provide instructor led corporate training and classroom training on Industry relevant Cutting Edge Technologies like Big Data, Machine Learning, Natural Language Processing, Artificial Intelligence, Data Science, Amazon Web Services, DevOps, Cloud Computing and Frameworks like Django,Spring, Ruby on Rails, Angular 2 and many more to Professionals.
Reach out to us at www.zekelabs.com or call us at +91 8095465880 or drop a mail at info@zekelabs.com
I gave this presentation on 5/17 to the New Mexico VMUG in Santa Fe. The presentation provides an overview of OpenStack, what it is (and isn't), and some things you might learn to get started with OpenStack.
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
You may be familiar with the Presto plugin used to run fast interactive queries over Pulsar using ANSI SQL and can be joined with other data sources. This plugin will soon get a rename to align with the rename of the PrestoSQL project to Trino. What is the purpose of this rename and what does it mean for those using the Presto plugin? We cover the history of the community shift from PrestoDB to PrestoSQL, as well as, the future plans for the Pulsar community to donate this plugin to the Trino project. One of the connector maintainers will then demo the connector and show what is possible when using Trino and Pulsar!
Mainframe Integration, Offloading and Replacement with Apache KafkaKai Wähner
Video recording of this presentation:
https://youtu.be/upWzamacOVQ
Blog post with more details:
https://www.kai-waehner.de/blog/2020/04/24/mainframe-offloading-replacement-apache-kafka-connect-ibm-db2-mq-cdc-cobol/
Mainframes are still hard at work, processing over 70 percent of the world’s most essential computing transactions every day. Very high cost, monolithic architectures, and missing experts are the key challenges for mainframe applications. Time to get more innovative, even with the mainframe!
Mainframe offloading with Apache Kafka and its ecosystem can be used to keep a more modern data store in real-time sync with the mainframe. At the same time, it is persisting the event data on the bus to enable microservices, and deliver the data to other systems such as data warehouses and search indexes.
But the final goal and ultimate vision are to replace the mainframe by new applications using modern and less costly technologies. Stand up to the dinosaur, but keep in mind that legacy migration is a journey! Kai will guide you to the next step of your company’s evolution!
You will learn:
- how to not only reduce operational expenses but provide a path for architecture modernization, agility and eventually mainframe replacement
- what steps some of Confluent’s customers already took, leveraging technologies like Change Data Capture (CDC) or MQ for mainframe offloading
- how an event streaming platform enables cost reduction, architecture modernization, and a combination of a mainframe with new technologies
Introducción a Microsoft Azure SQL Data WarehouseJoseph Lopez
El nuevo Microsoft Azure SQL Data Warehouse (SQL DW) es un versátil servicio de almacén de datos que provee una solución Massively Parallel Processing (MPP) para "Big data" con verdaderas características de alta infraestructura empresarial. El servicio SQL DW está construido para la carga de datos en ejecución de unos cien gigabytes hasta petabytes de datos con características únicas como cálculo desagregado, permitiendo así que los clientes sean capaces de utilizar el servicio para satisfacer sus necesidades de almacenamiento. En la presente exposición les mostrare una mirada en profundidad de este nuevo servicio de Azure como la implementación, el escalamiento elástico (Grow, Shrink, y Pause), y las nubes de datos híbrida con integración de Hadoop a través Polybase permitiendo una verdadera experiencia de SQL a través de datos estructurados y no estructurados.
YouTube Link: https://youtu.be/0djPrlaxx_U
Edureka AWS Architect Certification Training - https://www.edureka.co/aws-certification-training
This Edureka PPT on AWS Cloud Practitioner will provide a complete guide to your AWS Cloud Practitioner Certification exam. It will explain the exam details, objectives, why you should get certified and also how AWS certification will help your career.
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
The Rise of Data in Motion in the Healthcare Industry - Use Cases, Architectures and Examples powered by Apache Kafka.
Use Cases for Data in Motion in the Healthcare Industry:
- Know Your Patient (= “Customer 360”)
- Operations (Healthcare 4.0 including Drug R&D, Patient Care, etc.)
- IT Perspective (Cybersecurity, Mainframe Offload, Hybrid Cloud, Streaming ETL, etc)
Real-world examples include Covid-19 Electronic Lab Reporting, Cerner, Optum, Centene, Humana, Invitae, Bayer, Celmatix, Care.com.
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...Rahul Krishna Upadhyaya
Slide was presented at Dr. Dobb's Conference in Bangalore.
Talks about Openstack Introduction in general
Projects under Openstack.
Contributing to Openstack.
This was presented jointly by CB Ananth and Rahul at Dr. Dobb's Conference Bangalore on 12th Apr 2014.
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent
MQ, ETL and ESB middleware are often used as integration backbone between legacy applications, modern microservices and cloud services. This introduces several challenges and complexities like point-to-point integration or non-scalable architectures. This session discusses how to build a completely event-driven streaming platform leveraging Apache Kafka’s open source messaging, integration and streaming components to leverage distributed processing, fault-tolerance, rolling upgrades and the ability to reprocess events. Learn the differences between a event-driven streaming platform leveraging Apache Kafka and middleware like MQ, ETL and ESBs – including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
Alluxio Tech Talk
Dec 10, 2019
Chris Crosbie and Roderick Yao from the Google Dataproc team and Dipti Borkar of Alluxio will demo how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. They’ll also show how to run Dataproc Spark against a remote HDFS cluster.
For more Alluxio events: https://www.alluxio.io/events/
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
http://flink-forward.org/kb_sessions/no-shard-left-behind-dynamic-work-rebalancing-in-apache-beam/
The Apache Beam (incubating) programming model is designed to support several advanced data processing features such as autoscaling and dynamic work rebalancing. In this talk, we will first explain how dynamic work rebalancing not only provides a general and robust solution to the problem of stragglers in traditional data processing pipelines, but also how it allows autoscaling to be truly effective. We will then present how dynamic work rebalancing works as implemented in Google Cloud Dataflow and which path other Apache Beam runners link Apache Flink can follow to benefit from it.
Apache Beam (formerly Google Cloud Dataflow SDK) is an unified model and set of language-specific SDKs for defining and executing data processing workflows. You design pipelines, simplifying the mechanics of large-scale batch and streaming data processing and can run on a number of runtimes like Apache Flink, Apache Spark, and Google Cloud Dataflow (a cloud service).
This presentation introduces the Beam programming model, and how you can use it to design your pipelines, transporting PCollection and applying some PTransforms. You will see how the same code will be "translated" to a target runtimes thanks to a specific runner. You will also have an overview of the current roadmap, with the new interesting features.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...C4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/15ACXCw.
Tyler Akidau from Google demonstrates Google's Millwheel, a streaming system that promises low latency, strong consistency, and flexibility without relying on Lambda Architecture. Filmed at qconsf.com.
Tyler Akidau is a Senior Software Engineer at Google. The current Tech Lead for the MillWheel team, he’s spent five years working on massive-scale streaming data processing systems.
The Netflix Way to deal with Big Data ProblemsMonal Daxini
Netflix is a data driven company with a unique culture. Come take a holistic tour of the Big Data ecosystem, and how Netflix culture catalyzes the development of systems. Then ogle at how we quickly evolved and scaled the event pipeline to a 1 trillion events per day and over 1.4 PB of event data without service disruption, and a small team.
Go GC: Prioritizing Low Latency and SimplicityC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1IPjAIS.
Rick Hudson discusses the motivation, performance, and technical challenges of Go's low latency concurrent GC and why the approach fits Go well. Filmed at qconsf.com.
Rick Hudson is a member of Google’s Go team. Rick has published papers on language runtimes, memory management, concurrency, synchronization, memory models, and transactional memory.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2l2Rr6L.
Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack. He walks through the decisions they have made at each layer, covers the pros and cons of these decisions and discusses the tooling they have built. Filmed at qconsf.com.
Doug Daniels is a Director of Engineering at Datadog, where he works on high-scale data systems for monitoring, data science, and analytics. Prior to joining Datadog, he was CTO at Mortar Data and an architect and developer at Wireless Generation, where he designed data systems to serve more than 4 million students in 49 states.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/20SN0dP.
Tammer Saleh talks about the mistakes people make when building a microservices architecture. He also talks about: when microservices are appropriate, and where to draw the lines between services, dealing with performance issues, testing and debugging techniques, managing a polyglot landscape and the explosion of platforms, managing failure and graceful degradation. Filmed at qconlondon.com.
Tammer Saleh is a long time developer, leader, and author of the acclaimed book *Rails AntiPatterns*. Saleh is currently building the Cloud Foundry platform at Pivotal.
Google Cloud Next '22 Recap: Serverless & Data editionDaniel Zivkovic
See what's new in #Serverless and #Data at GCP. Our guest, Guillaume Blaquiere - Stack Overflow contributor & #GCP #Developer Expert from France, covered the best #GoogleCloudNext announcements, practically demoed how to benefit from #BigQuery Remote Functions and answered many questions.
The meetup recording with TOC for easy navigation is at https://youtu.be/AuZZTwHIcdY
P.S. For more interactive lectures like this, go to http://youtube.serverlesstoronto.org/ or sign up for our upcoming live events at https://www.meetup.com/Serverless-Toronto/events/
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk
Besides seeing the newest features in Splunk Enterprise and learning the best practices for data models and pivot, we will show you how to use a handful of search commands that will solve most search needs. Learn these well and become a ninja.
Aprendamos DAX - José Ahias López PortilloSpanishPASSVC
En esta sesión veremos los conceptos básicos de como programar consultas DAX y algunas de las funciones mas útiles que podrían salvarnos en un apocalipsis de generación de consultas complejas en modelos tabulares.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1OKo5FN.
Danny Yuan discusses how stream processing is used in Uber's real-time system to solve a wide range of problems, including but not limited to real-time aggregation and prediction on geospatial time series, data migration, monitoring and alerting, and extracting patterns from data streams. Yuan also presents the architecture of the stream processing pipeline. Filmed at qconsf.com.
Danny Yuan is a software engineer in Uber. He's currently working on streaming systems for Uber's logistics platform. Prior to joining Uber, he worked on building Netflix's cloud platform. His work includes predictive autoscaling, distributed tracing service, real-time data pipeline that scaled to process hundreds of billions of events every day, and Netflix's low-latency crypto services.
How to build an ETL pipeline with Apache Beam on Google Cloud DataflowLucas Arruda
Nowadays more and more companies are searching for insights with potential to grow their business by analyzing large amounts of data from many different systems. However, in order to reach this level of Big Data Analysis it's necessary to build an ETL pipeline that allows us to process raw data coming from different sources into an appropriate format that is possible to use against visualization tools such as Tableau.
This kind of data processing can be done by a variety of tools and in this presentation I show how to do it by using an unified programming model created by Google and open-sourced as the name of Apache Beam. We will build a simple pipeline that will be executed in the Cloud by a fully-managed service called Google Cloud Dataflow.
What's New in Apache Spark 2.3 & Why Should You CareDatabricks
The Apache Spark 2.3 release marks a big step forward in speed, unification, and API support.
This talk will quickly walk through what’s new and how you can benefit from the upcoming improvements:
* Continuous Processing in Structured Streaming.
* PySpark support for vectorization, giving Python developers the ability to run native Python code fast.
* Native Kubernetes support, marrying the best of container orchestration and distributed data processing.
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/39NIjLV.
Akhilesh Gupta does a technical deep-dive into how Linkedin uses the Play/Akka Framework and a scalable distributed system to enable live interactions like likes/comments at massive scale at extremely low costs across multiple data centers. Filmed at qconlondon.com.
Akhilesh Gupta is the technical lead for LinkedIn's Real-time delivery infrastructure and LinkedIn Messaging. He has been working on the revamp of LinkedIn’s offerings to instant, real-time experiences. Before this, he was the head of engineering for the Ride Experience program at Uber Technologies in San Francisco.
Next Generation Client APIs in Envoy MobileC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2x0Fav8.
Jose Nino guides the audience through the journey of Mobile APIs at Lyft. He focuses on how the team has reaped the benefits of API generation to experiment with the network transport layer. He also discusses recent developments the team has made with Envoy Mobile and the roadmap ahead. Filmed at qconlondon.com.
Jose Nino works as a Software Engineer at Lyft.
Software Teams and Teamwork Trends Report Q1 2020C4Media
How do we cope with an environment that has been radically disrupted, where people are suddenly thrust into remote work in a chaotic state? What are the emerging good practices and new ideas that are shaping the way in which software development teams work? What can we do to make the workplace a more secure and diverse one while increasing the productivity of our teams? This report aims to assist technical leaders in making mid- to long-term decisions that will have a positive impact on their organisations and teams and help individual contributors find the practices, approaches, tools, techniques, and frameworks that can help them get a better experience at work - irrespective of where they are working from.
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2QCmmJ0.
Mark Stoodley examines some of the strengths and weaknesses of the different Java compilation technologies, if one was to apply them in isolation. Stoodley discusses how production JVMs are assembling a combination of these tools that work together to provide excellent performance across the large spectrum of applications written in Java and JVM based languages. Filmed at qconsf.com.
Mark Stoodley joined IBM Canada to build Java JIT compilers for production use and led the team that delivered AOT compilation in the IBM SDK for Java 6. He spent the last five years leading the effort to open source nearly 4.3 million lines of source code from the IBM J9 Java Virtual Machine to create the two open source projects Eclipse OMR and Eclipse OpenJ9, and now co-leads both projects.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2y2yPiS.
Colin McCabe talks about the ongoing effort to replace the use of Zookeeper in Kafka: why they want to do it and how it will work. He discusses the limitations they have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. He talks about their progress, what work is remaining, and how contributors can help. Filmed at qconsf.com.
Colin McCabe is a Kafka committer at Confluent, working on the scalability and extensibility of Kafka. Previously, he worked on the Hadoop Distributed Filesystem and the Ceph Filesystem.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2SXXXiD.
Katharina Probst talks about what it means to act like an owner and why teams need ownership to be high-performing. When team members, regardless of whether they have a formal leadership role or not, act like owners, magical things can happen. She shares ideas that we can apply to our own work, and talks about how to recognize when we don’t live up to our own expectations of acting like an owner. Filmed at qconsf.com.
Katharina Probst is a Senior Engineering Leader, Kubernetes & SaaS at Google. Before this, she was leading engineering teams at Netflix, being responsible for the Netflix API, which helps bring Netflix streaming to millions of people around the world. Prior to joining Netflix, she was in the cloud computing team at Google, where she saw cloud computing from the provider side.
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2T04Lw4.
Sergey Kuksenko talks about the performance benefits inline types bring to Java and how to exploit them. Inline/value types are the key part of experimental project Valhalla, which should bring new abilities to the Java language. Filmed at qconsf.com.
Sergey Kuksenko is a Java Performance Engineer at Oracle working on a variety of Java and JVM performance enhancements. He started working as Java Engineer in 1996 and as Java Performance Engineer in 2005. He has had a passion for exploring how Java works on modern hardware.
Do you need service meshes in your tech stack?
This on-line guide aims to answer pertinent questions for software architects and technical leaders, such as: what is a service mesh?, do I need a service mesh?, how do I evaluate the different service mesh offerings? In software architecture, a service mesh is a dedicated infrastructure layer for facilitating service-to-service communications between microservices, often using a sidecar proxy.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2UgQ3BU.
Christie Wilson describes what to expect from CI/CD in 2019, and how Tekton is helping bring that to as many tools as possible, such as Jenkins X and Prow. Wilson talks about Tekton itself and performs a live demo that shows how cloud native CI/CD can help debug, surface and fix mistakes faster. Filmed at qconsf.com.
Christie Wilson is a software engineer at Google, currently leading the Tekton project. Over the past decade, she has worked in the mobile, financial and video game industries. Prior to working at Google she led a team of software developers to build load testing tools for AAA video game titles, and founded the Vancouver chapter of PyLadies.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S7lDiS.
Sasha Rosenbaum shows how a CI/CD pipeline for Machine Learning can greatly improve both productivity and reliability. Filmed at qconsf.com.
Sasha Rosenbaum is a Program Manager on the Azure DevOps engineering team, focused on improving the alignment of the product with open source software. She is a co-organizer of the DevOps Days Chicago and the DeliveryConf conferences, and recently published a book on Serverless computing in Azure with .NET.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/36epVKg.
Todd Montgomery discusses the techniques and lessons learned from implementing Aeron Cluster. His focus is on how Raft can be implemented on Aeron, minimizing the network round trip overhead, and comparing single process to a fully distributed cluster. Filmed at qconsf.com.
Todd Montgomery is a networking hacker who has researched, designed, and built numerous protocols, messaging-oriented middleware systems, and real-time data systems, done research for NASA, contributed to the IETF and IEEE, and co-founded two startups. He currently works as an independent consultant and is active in several open source projects.
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2FWc5Sk.
Ben Sigelman talks about "Deep Systems", their common properties and re-introduces the fundamentals of control theory from the 1960s, including the original conceptualizations of Observability & Controllability. He uses examples from Google & other companies to illustrate how deep systems have damaged people's ability to observe software, and what needs to be done in order to regain control. Filmed at qconsf.com.
Ben Sigelman is a co-founder and the CEO at LightStep, a co-creator of Dapper (Google’s distributed tracing system), and co-creator of the OpenTracing and OpenTelemetry projects (both part of the CNCF). His work and interests gravitate towards observability, especially where microservices, high transaction volumes, and large engineering organizations are involved.
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/39SddUL.
Victor Dibia provides a friendly introduction to machine learning, covers concrete steps on how front-end developers can create their own ML models and deploy them as part of web applications. He discusses his experience building Handtrack.js - a library for prototyping real time hand tracking interactions in the browser. Filmed at qconsf.com.
Victor Dibia is a Research Engineer with Cloudera’s Fast Forward Labs. Prior to this, he was a Research Staff Member at the IBM TJ Watson Research Center, New York. His research interests are at the intersection of human computer interaction, computational social science, and applied AI.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2s9T3Vl.
Colin Eberhardt looks at some of the internals of WebAssembly, explores how it works “under the hood”, and looks at how to create a (simple) compiler that targets this runtime. Filmed at qconsf.com.
Colin Eberhardt is the Technology Director at Scott Logic, a UK-based software consultancy where they create complex application for their financial services clients. He is an avid technology enthusiast, spending his evenings contributing to open source projects, writing blog posts and learning as much as he can.
User & Device Identity for Microservices @ Netflix ScaleC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2S9tOgy.
Satyajit Thadeshwar provides useful insights on how Netflix implemented a secure, token-agnostic, identity solution that works with services operating at a massive scale. He shares some of the lessons learned from this process, both from architectural diagrams and code. Filmed at qconsf.com.
Satyajit Thadeshwar is an engineer on the Product Edge Access Services team at Netflix, where he works on some of the most critical services focusing on user and device authentication. He has more than a decade of experience building fault-tolerant and highly scalable, distributed systems.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Ezs08q.
Justin Ryan talks about Netflix’ scalability issues and some of the ways they addressed it. He shares successes they’ve had from unintuitively partitioning computation into multiple services to get better runtime characteristics. He introduces us to useful probabilistic data structures, innovative bi-directional data passing, open-source projects available from Netflix that make this all possible. Filmed at qconsf.com.
Justin Ryan is Playback Edge Engineering at Netflix. He works on some of the most critical services at Netflix, specifically focusing on user and device authentication. Years of building developer tools has also given him a healthy set of opinions on developer productivity.
Make Your Electron App Feel at Home EverywhereC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Z4ZJjn.
Kilian Valkhof discusses the process of making an Electron app feel at home on all three platforms: Windows, MacOS and Linux, making devs aware of the pitfalls and how to avoid them. Filmed at qconsf.com.
Kilian Valkhof is a Front-end Developer & User-experience Designer at Firstversionist. He writes about various topics, from design to machine learning, on his personal website, kilianvalkhof.com and is a frequent contributer to open source software. He is part of the Electron governance team that oversees the development of the Electron framework.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/344PnB1.
Steve Klabnik goes over the deep details of how async/await works in Rust, covering concepts like coroutines, generators, stack-less vs stack-ful, "pinning", and more. Filmed at qconsf.com.
Steve Klabnik is on the core team of Rust, leads the documentation team, and is an author of "The Rust Programming Language." He is a frequent speaker at conferences and is a prolific open source contributor, previously working on projects such as Ruby and Ruby on Rails.
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2OUz6dt.
Chris Riccomini talks about the current state-of-the-art in data pipelines and data warehousing, and shares some of the solutions to current problems dealing with data streaming and warehousing. Filmed at qconsf.com.
Chris Riccomini works as a Software Engineer at WePay.
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2rm4hFD.
Yevgeniy Brikman talks about how to write automated tests for infrastructure code, including the code written for use with tools such as Terraform, Docker, Packer, and Kubernetes. Topics covered include: unit tests, integration tests, end-to-end tests, dependency injection, test parallelism, retries and error handling, static analysis, property testing and CI / CD for infrastructure code. Filmed at qconsf.com.
Yevgeniy Brikman is the co-founder of Gruntwork, a company that provides DevOps as a Service. He is the author of two books published by O'Reilly Media: Hello, Startup and Terraform: Up & Running. Previously, he worked as a software engineer at LinkedIn, TripAdvisor, Cisco Systems, and Thomson Financial.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/google-cloud-dataflow
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com
15. Google Dataflow SDK
Open Source SDK used to construct a Dataflow pipeline.
(Now Incubating as Apache Beam)
16. Computing Team Scores
// Collection of raw log lines
PCollection<String> raw = ...;
// Element-wise transformation into team/score
// pairs
PCollection<KV<String, Integer>> input =
raw.apply(ParDo.of(new ParseFn()))
// Composite transformation containing an
// aggregation
PCollection<KV<String, Integer>> output = input
.apply(Window.into(FixedWindows.of(Minutes(60))))
.apply(Sum.integersPerKey());
17. Google Cloud Dataflow
● Given code in Dataflow (incubating as Apache Beam)
SDK...
● Pipelines can run…
○ On your development machine
○ On the Dataflow Service on Google Cloud Platform
○ On third party environments like Spark or Flink.
18. Cloud Dataflow
A fully-managed cloud service and
programming model for batch and
streaming big data processing.
Google Cloud Dataflow
37. Goals:
1. No backlog growth
2. Short backlog time
3. Reasonable CPU utilization
Policy: making Decisions
38. Upscaling Policy: Keeping Up
Given M machines
For a stage, given:
average stage throughput T
average positive backlog growth G of stage
Machines needed for stage to keep up:
(T + G)
T
M’ = M
39. Upscaling Policy: Catching Up
Given M machines
Given R (time to reduce backlog)
For a stage, given:
average backlog time B
Extra machines to remove backlog:
B
R
Extra = M
40. Upscaling Policy: All Stages
Want all stages to:
1. keep up
2. have log backlog time
Pick Maximum over all stages of M’ + Extra
65. Downscaling from 4 to 2 Machines
S0
S2
S1
Machine 1
S0
S2
S1
Machine 2
Upsizing = Steps in Reverse
66. Granularity of Parallelism
As of March 2016, Google Cloud Dataflow:
• Splits Key Ranges initially Based on Max Machines
• At Max: 1 Logical Persistent Disk per Machine
Each disk has slice of key ranges from all stages
• Only (relatively) even Disk Distributions
• Results in Scaling Quanta
70. Next lower scaling quanta => M’ machines
Estimate future CPUM’ per machine:
If new CPUM’ < threshold (say 90%),
downscale to M’
Downscaling Policy
M
M’
CPUM’ = CPUM
73. Auto-Scaling Summary
Signals: throughput, backlog time,
backlog growth, CPU utilization
Policy: keep up, reduce backlog,
use CPUs
Mechanism: split key ranges,
migrate computations
74. • Experiment with non-uniform disk distributions to
address hot ranges
• Dynamically splitting ranges finer than initially done.
• Approximate model of #VM - throughput relation
Future Work
75. Questions?
Further reading on streaming model:
The world beyond batch: Streaming 101
The world beyond batch: Streaming 102
76. Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations/
google-cloud-dataflow