Data Engineering is a relatively new, but fast evolving discipline that spans multiple environments and technologies, from traditional data centers to hyper-scale cloud providers, a discipline that combines closed-source, homegrown and open source software to create scalable data pipelines and power incredible new product features.
In this presentation, we will go over the last 5-10 years of technology trends and advancements and bring all of that together in a story about modern day Data Engineering and the magic behind it.
Momentum provides easy to use platform for processing large volume of data streams in realtime. This is an ideal solution for IoT and click stream analytics
Momentum provides easy to use platform for processing large volume of data streams in realtime. This is an ideal solution for IoT and click stream analytics
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...PROIDEA
Speaker: Maciej Kuzniar
Language: English
The presentation highlights the foundation for a new project of Polish cloud: Oktawave Horizon. This project aims to provide access to efficient, scalable and cost-optimized set of services which form a platform allowing real-time stream data processing.
Visit our website: http://atmosphere-conference.com/
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
Slides from my talk at Codemotion Rome in March 2017. Development of analytic machine learning / deep learning models with R, Apache Spark ML, Tensorflow, H2O.ai, RapidMinder, KNIME and TIBCO Spotfire. Deployment to real time event processing / stream processing / streaming analytics engines like Apache Spark Streaming, Apache Flink, Kafka Streams, TIBCO StreamBase.
Glynn Bird - Building the "microservices way" involves breaking monolithic IT systems into small, decoupled services that each to one job well. This talk builds a practical microservices architecture during the talk using small Node.js apps that perform storage, analytics and visualisation tasks. Learn how to orchestrate your own microservice architecture using simple, easily-tested building blocks.
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Coburn Watson
Sebastien de Larquier from our Data Analytics and Engineering team discusses the tools and associated methodology we apply to tackle our cloud capacity planning needs at Netflix.
Near-real time reporting in Azure | 2017 Cloudbrew Radu VunvuleaRadu Vunvulea
One of the most common requirement on a projects nowadays is real time monitoring and reporting. Easy to say, expensive to implement and complex to maintain. In this session we'll take a look on the Azure Services that enable us to fulfil this requirements with minimal effort and with maximum benefits. We have on our radar services like Azure Time Series Insights, Analysis Services and PowerBI. After this session you will know what are the services can be used for near-real time and what does near-real time means in the real world.
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...PROIDEA
Speaker: Maciej Kuzniar
Language: English
The presentation highlights the foundation for a new project of Polish cloud: Oktawave Horizon. This project aims to provide access to efficient, scalable and cost-optimized set of services which form a platform allowing real-time stream data processing.
Visit our website: http://atmosphere-conference.com/
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
Slides from my talk at Codemotion Rome in March 2017. Development of analytic machine learning / deep learning models with R, Apache Spark ML, Tensorflow, H2O.ai, RapidMinder, KNIME and TIBCO Spotfire. Deployment to real time event processing / stream processing / streaming analytics engines like Apache Spark Streaming, Apache Flink, Kafka Streams, TIBCO StreamBase.
Glynn Bird - Building the "microservices way" involves breaking monolithic IT systems into small, decoupled services that each to one job well. This talk builds a practical microservices architecture during the talk using small Node.js apps that perform storage, analytics and visualisation tasks. Learn how to orchestrate your own microservice architecture using simple, easily-tested building blocks.
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Coburn Watson
Sebastien de Larquier from our Data Analytics and Engineering team discusses the tools and associated methodology we apply to tackle our cloud capacity planning needs at Netflix.
Near-real time reporting in Azure | 2017 Cloudbrew Radu VunvuleaRadu Vunvulea
One of the most common requirement on a projects nowadays is real time monitoring and reporting. Easy to say, expensive to implement and complex to maintain. In this session we'll take a look on the Azure Services that enable us to fulfil this requirements with minimal effort and with maximum benefits. We have on our radar services like Azure Time Series Insights, Analysis Services and PowerBI. After this session you will know what are the services can be used for near-real time and what does near-real time means in the real world.
Netcetera consultants Ronnie Brunner and Jason Brazile present the results of a year long study of existing and potential uses of cloud computing at the European Space Agency. Some unpublished internal material was removed. Queries can be directed to the contract's Technical Officer at ESA ESRIN.
Apache Spark Clusters for Everyone | AWS Public Sector Summit 2016Amazon Web Services
Easy Access to Amazon EMR Spark Clusters Using R and Python in the Browser Using systems like Apache Spark, big data analysis is becoming more accessible from high-level languages like R and Python. However, many analysts are unprepared for the challenges of setting up a big data analytical environment. In this talk, we outline a process that allows anyone in an organization to quickly spin up elastic Spark clusters, then analyze data through RStudio and SparkR, or alternatively pySpark and Jupyter Notebooks. The resulting system is affordable, powerful, and incredibly accessible: It takes just two clicks and a 15-minute wait time for analysts to each have their own cluster. This session will cover the following: the Amazon EMR bootstrap process for installations of high-level languages to work on top of Spark (specifically SparkR and pySpark); dynamic port forwarding with SSH and Foxy Proxy for browser access to RStudio and Jupyter; convenient data loading from Amazon S3 to EMR without leaving RStudio or Jupyter; and automating the startup process for nontechnical data analysts and researchers. Finally, we will share some short studies that demonstrate the power of Spark with EMR to interactively analyze massive datasets and discern public policy insights.
Stargate, the gateway for some multi-models data APIData Con LA
Data Con LA 2020
Description
Join us to learn about Stargate! Stargate is a data gateway deployed between client applications and a database. It's built with extensibility as a first-class citizen and makes it easy to use a database for any application workload by adding plugin support for new APIs, data types, and access methods. After detailing the architecture and ideas behind the frameworks we will demo the creation of REST and GraphQL APIs on top of Cassandra through simple configuration. Bring back home a working sample !
Speaker
Cedrick Lunven, Director of Developer Advocacy, Datastax
Data migration at a petabyte scale is now a simple service from AWS. You can easily move large volumes of data from onsite environments to the cloud, or quickly get started with the cloud as a backup target using data transfer services, like AWS Snowball or AWS Storage Gateway. Learn about the various data migration options available to you and understand which one is the right fit for your requirements.
We will cover the core AWS storage services, which include Amazon Simple Storage Service (Amazon S3), Amazon Glacier, Amazon Elastic File System (Amazon EFS), and Amazon Elastic Block Store (Amazon EBS). We also discuss data transfer services such as AWS Snowball, Snowball Edge, and AWS Snowmobile, and hybrid storage solutions such as AWS Storage Gateway.
This presentation was given to those who have absolutely no experiance with cloud computing and some dont even have good traditional corporate computing experiance
Making Earth observation data available by using Amazon S3 is accelerating scientific discovery and enabling the creation of new products. Attend and learn how the scale and performance of Amazon S3 lets earth scientists, researchers, startups, and GIS professionals gather and analyse planetary-scale data without worrying about limitations of bandwidth, storage, memory, or processing power. Co-presented with support of the Australian Geoscience Data Cube collaboration, DigitalGlobe’s Geospatial Big Data Platform and the developer of the popular ObservedEarth mobile app.
Speakers:
Craig Lawton, Public Sector Solutions Architect, Amazon Web Services
Lachlan Hurst, Observed Earth
Matt Paget, Senior Experimental Scientist, CSIRO
Dan Getman, Digital Globe
Similar to The Evolving Landscape of Data Engineering (20)
Recap on AWS Lambda after re:Invent 2015Andrei Savu
A quick presentation on what AWS Lambda is about and what was announced at AWS re:Invent 2015 Las Vegas. In see Lambda as a easy to define event handles that glue different AWS services together at a surprising scale.
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
The slide deck I presented at NYC Big Data Meetup just before Strata + Hadoop World 2015. It goes into details on what's different about running Hadoop in the cloud, main use case and some lessons learned from working with customers.
Introducing Cloudera Director at Big Data BashAndrei Savu
My slide deck for Big Data Bash. This is a quick introduction on Cloudera Director and it ends with a list of open questions around some interesting future problems we are planning to work on.
APIs & Underlying Protocols #APICraftSFAndrei Savu
My slides from a talk about APIs and their relationship to various network protocols, older and new ones and how that defines some of the characteristics that describe high quality implementation.
Challenges for running Hadoop on AWS - AdvancedAWS MeetupAndrei Savu
Nowadays we've got all the tools we need to spin-up and tear-down clusters with hundreds of nodes in minutes and this puts more pressure on the tools we use to configure and monitor our applications. This challenge is even more interesting when we have to deal with long running distributed data storage and processing systems like Hadoop. In this talk we will look into some of the challenges we need to deal with when creating and managing Hadoop clusters in AWS, we will discuss improvement opportunities in monitoring (e.g. detecting and dealing with instance failure, resource contention & noisy neighbors) and a bit about the future and how we should go about disconnecting workload dispatch from cluster lifecycle.
My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania
http://www.eurocloud.ro/en/events/all-there-is-to-know-about-big-data/#.UXZFaUDvlVI
Apache Provisionr (incubating) - Bucharest JUG 10Andrei Savu
My slides on Apache Provisionr (incubating) - a service that can be used to create and manage pools of virtual machines on multiple clouds.
http://provisionr.incubator.apache.org/
Creating pools of Virtual Machines - ApacheCon NA 2013Andrei Savu
My slides on creating pools of virtual machines for ApacheCon NA 2013 in Portland.
Provisionr Source code:
https://github.com/axemblr/axemblr-provisionr
Apache Incubator proposal:
https://github.com/axemblr/axemblr-provisionr/wiki/Provisionr-Proposal
Simple Service for Managing Pools of 10s or 100s of Virtual Machines
With Provisionr we want to solve the problem of cloud portability by hiding completely the API and only focusing on building a cluster that matches the same set of assumptions on all clouds, assumptions like: running a specific operating system (e.g. Ubuntu LTS), having the same set of pre-installed packages and binaries, sane dns settings (forward & reverse ip resolution - as needed for Hadoop), ntp settings, networking settings, ssh admin access, vpn access etc.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
When stars align: studies in data quality, knowledge graphs, and machine lear...
The Evolving Landscape of Data Engineering
1. The Evolving Landscape of
Data Engineering
Andrei Savu - Event Co-organizer
Staff Engineer @ Twitter
Follow me @andreisavu
2. Andrei Savu
Staff Engineer @ Twitter:
* MoPub Backend & Data Pipelines
* Mobile App Monetization
Co-organizer of the Data Engineering
Club.
Previously Tech Lead at Cloudera via
the Axemblr acquisition. Started the
Cloud engineering team.
3. The Past:
● OSS communities
● AWS history
● Google Cloud history
The Present: Patterns
The Future: Wish List
Topics
4. Weeks of Provisioning
Static Infrastructure
Commodity Hardware
Commodity Networking
Data Locality Important
Running in the Public
Cloud was unusual
CAPEX
The Past - OSS
6. Visionary Products
Fast iterations
Machine Learning as a key
use case
State of the Art data
platform
Last 3 years on fast
forward
Intelligent Billing
OPEX & Elastic
The Past - Google Cloud
7. The Present: Patterns
Weeks to Minutes to Seconds
Hadoop/Spark ecosystem is mature.
We have a broad set of options.
Big Data is much Bigger (e.g. x1e.32xlarge: 3TB
mem, 128 vCPUs, 14Gbps network)
Scale continues to be hard.
Cloud economics can be very disruptive
(especially for data workloads)
High-performance networks are common.
Storage can be decoupled from compute.
Cluster locality is important.
Service Endpoints (not clusters, aka serverless,
aka managed etc.).
Sophisticated Auto-scaling (batch & streaming,
spot vs. on-demand, multi-az).
Multi-DC and Multi-Region from Day 1.
Various flavors of containers.
8. The Future: Wish List
A Data Catalog product as the center of the
universe.
Data Monitoring Systems:
* statistical properties, anomaly detection,
schema changes, consumption patterns etc.
More intelligence at the data infrastructure level:
* data format migrations, intelligent caching
based on access patterns.
Declarative data transformation vs. explicit ETL.
Intelligent data sampling products. Scalability
has a cost.
9. Thanks!
Join the community on Meetup.com!
www.meetup.com/Data-Engineering-Club
www.dataeng.club
Do you want to present? Get in touch.
Feedback #dataengclub